---

### 🎓 **Professor**: Apostolos Filippas

### 📘 **Class**: E-Commerce

### 📋 **Topic**: Pricing Behavior Analysis with Python

🚫 **Note**: You are not allowed to share the contents of this notebook with anyone outside this class without written permission by the professor.

---


## Overview

Let's use our Python knowledge to see how often providers in a two-sided market change their prices. This analysis will help us understand pricing dynamics and seller behavior in digital marketplaces.

**What we'll learn:**
- How to analyze pricing behavior over time
- How to calculate price change frequencies
- How to visualize distribution patterns
- Understanding seller behavior in digital markets


In [None]:
# Let's import the libraries we will use
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

# Load the prices dataset
df_prices = pd.read_csv("../data/prices.csv")

print("Dataset loaded successfully!")
print(f"Dataset shape: {df_prices.shape}")
print(f"Columns: {df_prices.columns.tolist()}")

# This is price data simulated from a real marketplace
# We want to figure out how often providers change their prices!

print("Sample of price data:")
print(df_prices.head(10))

print(f"Time range: {df_prices['time_m'].min()} to {df_prices['time_m'].max()}")
print(f"Number of unique cars: {df_prices['car'].nunique()}")
print(f"Total observations: {len(df_prices)}")


In [None]:
# Calculate price changes per car
# For each car, we want to know how many times they changed their price
# relative to how long they were in the dataset

df_changes = (
    df_prices.groupby("car").agg({"time_m": ["count", "min", "max"]}).reset_index()
)

# Flatten column names
df_changes.columns = ["car", "n", "start_t", "end_t"]

# Calculate price changes rate
df_changes["price_changes"] = df_changes["n"] / (df_changes["end_t"] - df_changes["start_t"] + 1)

print("Sample of price changes data:")
print(df_changes.head())

print(f"Average observations per car: {df_changes['n'].mean():.2f}")
print(f"Average price changes per period: {df_changes['price_changes'].mean():.3f}")


In [None]:
# Create bins for price change frequency
# Define bin edges for categorizing cars by their price change frequency
bin_edges = [0, 1, 2, 3, 4, 5, 6, 10, 15, np.ceil(df_changes["price_changes"].max())]

# Create bins
df_changes["score_bins"] = pd.cut(
    df_changes["price_changes"], bins=bin_edges, include_lowest=True, right=False
)

print("Price change bins:")
print(df_changes["score_bins"].value_counts().sort_index())

# Calculate summary statistics for visualization
df_bins = (
    df_changes.groupby("score_bins", observed=True)
    .agg({"car": "count"})
    .rename(columns={"car": "num_obs"})
    .reset_index()
)

# Calculate percentages
total_obs = df_bins["num_obs"].sum()
df_bins["total"] = total_obs
df_bins["pct"] = df_bins["num_obs"] / total_obs
df_bins["ecdf"] = df_bins["pct"].cumsum()

print("Binned data for visualization:")
print(df_bins)


In [None]:
# Create the main visualization
plt.figure(figsize=(12, 8))

# Create bar chart
bars = plt.bar(
    range(len(df_bins)), df_bins["pct"], color="lightgray", edgecolor="black", alpha=0.8
)

# Add percentage labels on top of bars
for i, (bar, pct) in enumerate(zip(bars, df_bins["pct"])):
    plt.text(
        bar.get_x() + bar.get_width() / 2,
        bar.get_height() + 0.01,
        f"{pct:.1%}",
        ha="center",
        va="bottom",
        fontsize=10,
        fontweight="bold",
    )

# Add cumulative distribution line
plt.plot(
    range(len(df_bins)),
    df_bins["ecdf"],
    color="red",
    alpha=0.7,
    linestyle="--",
    linewidth=2,
    marker="o",
    label="Cumulative Distribution",
)

# Formatting
plt.xlabel("Price Changes per Time Period", fontsize=14)
plt.ylabel("Percentage", fontsize=14)
plt.title("Distribution of Price Change Frequency", fontsize=16, fontweight="bold")

# Format y-axis as percentages
from matplotlib.ticker import PercentFormatter
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
plt.ylim(0, 1.05)

# Set x-axis labels
bin_labels = [str(interval) for interval in df_bins["score_bins"]]
plt.xticks(range(len(bin_labels)), bin_labels, rotation=45, ha="right")

plt.grid(True, alpha=0.3, axis="y")
plt.legend()
plt.tight_layout()
plt.savefig("../temp/price_changes_distribution.pdf", dpi=1000, bbox_inches="tight")
plt.close()

print("Main distribution plot saved to temp/price_changes_distribution.pdf")


In [None]:
# Additional analysis: Price change patterns

# Histogram of price changes
plt.figure(figsize=(10, 6))
plt.hist(
    df_changes["price_changes"], bins=30, alpha=0.7, color="skyblue", edgecolor="black"
)
plt.xlabel("Price Changes per Time Period")
plt.ylabel("Number of Cars")
plt.title("Histogram of Price Change Frequency")
plt.grid(True, alpha=0.3, axis="y")

# Add vertical line for mean
mean_changes = df_changes["price_changes"].mean()
plt.axvline(
    mean_changes,
    color="red",
    linestyle="--",
    linewidth=2,
    label=f"Mean: {mean_changes:.3f}",
)
plt.legend()
plt.tight_layout()
plt.savefig("../temp/price_changes_histogram.pdf", dpi=1000, bbox_inches="tight")
plt.close()

print("Histogram saved to temp/price_changes_histogram.pdf")


In [None]:
# Summary statistics
print("\n" + "=" * 50)
print("SUMMARY STATISTICS")
print("=" * 50)

print(f"Total number of cars analyzed: {len(df_changes):,}")
print(f"Average price changes per period: {df_changes['price_changes'].mean():.3f}")
print(f"Median price changes per period: {df_changes['price_changes'].median():.3f}")
print(f"Standard deviation: {df_changes['price_changes'].std():.3f}")
print(f"Minimum price changes per period: {df_changes['price_changes'].min():.3f}")
print(f"Maximum price changes per period: {df_changes['price_changes'].max():.3f}")

# Percentiles
percentiles = [25, 50, 75, 90, 95, 99]
print("\nPercentiles of price changes per period:")
for p in percentiles:
    value = np.percentile(df_changes["price_changes"], p)
    print(f"{p}th percentile: {value:.3f}")

# Categories analysis
print("\nPrice change frequency categories:")
total_cars = len(df_changes)
for category, count in df_changes["score_bins"].value_counts().sort_index().items():
    percentage = (count / total_cars) * 100
    print(f"{category}: {count:,} cars ({percentage:.1f}%)")


---

## 🎉 Summary

We analyzed pricing behavior in an online marketplace and found:
- **Price change frequency** analysis by seller
- **Distribution patterns** of pricing behavior (most sellers change prices infrequently)
- **Seller behavior insights** in digital markets
- **Statistical analysis** of pricing patterns and market dynamics

**Key findings:**
- Most sellers change prices infrequently
- Distribution is heavily right-skewed
- Market activity varies over time
- Clear relationship between time in market and pricing behavior

This type of analysis helps understand:
- Seller pricing strategies
- Market efficiency
- Competitive dynamics

### Next:
We'll learn about randomization and experimental design

---
