---

### 🎓 **Professor**: Apostolos Filippas

### 📘 **Class**: E-Commerce

### 📋 **Topic**: Reputation Inflation Analysis with Python

🚫 **Note**: You are not allowed to share the contents of this notebook with anyone outside this class without written permission by the professor.

---


## Overview

Let's use our Python knowledge to see what happened to feedback scores over time on an online marketplace for jobs. This analysis will help us understand reputation inflation patterns in digital marketplaces.

**What we'll learn:**
- How to analyze time series data with pandas
- How to identify trends over time
- How to visualize temporal patterns
- Understanding reputation inflation in online markets


In [None]:
# Let's import the libraries we will use
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

# Load the ratings dataset
df_ratings = pd.read_csv("../data/ratings.csv")

# Convert date column to datetime if it's not already
df_ratings["date"] = pd.to_datetime(df_ratings["date"])

print(df_ratings.head())
print("Dataset loaded successfully!")
print(f"Dataset shape: {df_ratings.shape}")
print(f"Columns: {df_ratings.columns.tolist()}")

# This is ratings data simulated from a real marketplace
# We will now see what happened to feedback scores over time!


In [None]:
# Let's use pandas datetime functionality to add some useful info
df_ratings["date_month"] = df_ratings["date"].dt.month
df_ratings["date_year"] = df_ratings["date"].dt.year

print("Sample of processed data:")
print(df_ratings.head())
print(f"Date range: {df_ratings['date'].min()} to {df_ratings['date'].max()}")
print(f"Score range: {df_ratings['score'].min()} to {df_ratings['score'].max()}")


In [None]:
# Let's create time series analysis to see if ratings increased over time

# Compute summary statistics for the ratings grouped by year and month
df_ratings_evolution = (
    df_ratings.groupby(["date_year", "date_month"])
    .agg({"score": ["count", "mean", "var"]})
    .reset_index()
)

# Flatten column names
df_ratings_evolution.columns = [
    "date_year",
    "date_month", 
    "num_obs",
    "score_mean",
    "score_var",
]

print("Sample of evolution data:")
print(df_ratings_evolution.head())


In [None]:
# Plot the evolution of ratings over time
plt.figure(figsize=(12, 6))

# Create a simple time variable for plotting
df_ratings_evolution["t"] = 12 * (df_ratings_evolution["date_year"] - 2007) + df_ratings_evolution["date_month"]

# Plot the mean rating over time
plt.plot(
    df_ratings_evolution["t"],
    df_ratings_evolution["score_mean"],
    color="steelblue",
    linewidth=2,
    label="Mean Rating",
)

# Formatting
plt.xlabel("Year")
plt.ylabel("Average Feedback Scores")
plt.title("Evolution of Feedback Scores Over Time")

# Create custom x-axis labels (years)
year_positions = list(range(1, 127, 12))  # Every 12 months
year_labels = list(range(2007, 2018, 1))
plt.xticks(year_positions, year_labels, rotation=45)

plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.savefig("../temp/ratings_evolution.pdf", dpi=1000, bbox_inches="tight")
plt.close()

print("Plot saved to temp/ratings_evolution.pdf")


---

## 🎉 Summary

We analyzed reputation inflation in an online marketplace and found:
- **Time series data preprocessing** with pandas
- **Trend visualization** over time
- **Statistical analysis** of temporal patterns

The data shows that feedback scores have been increasing over time, indicating reputation inflation in digital marketplaces.

### Next:
We'll analyze pricing patterns and behavior

---
