---

### 🎓 **Professor**: Apostolos Filippas

### 📘 **Class**: E-Commerce

### 📋 **Topic**: Analyzing and Visualizing Experiment Results

🚫 **Note**: You are not allowed to share the contents of this notebook with anyone outside this class without written permission by the professor.

---


## Overview

Let's use our Python knowledge to analyze and visualize the results of an experiment.

**Experiment Setup:**
- Control group: "status quo" pricing algorithm
- Treatment group: "new" pricing algorithm that we want to evaluate  
- Outcome: Earnings at the end of a three month period

**What we'll learn:**
- How to analyze experimental results
- Statistical comparison between groups
- Visualization of treatment effects
- Understanding experimental outcomes


In [None]:
# Let's import the libraries we will use
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

# Load experiment data
df_users = pd.read_csv("../data/earnings.csv")

print("Dataset loaded successfully!")
print(f"Dataset shape: {df_users.shape}")
print(f"Columns: {df_users.columns.tolist()}")

print("Sample of experiment data:")
print(df_users.head())

print("Treatment group distribution:")
print(df_users["treatment"].value_counts())

print("Basic statistics:")
print(df_users.describe())


In [None]:
# Distribution visualization
# Plot distributions of earnings for control and treatment groups
plt.figure(figsize=(10, 6))
sns.histplot(data=df_users, x="earnings", hue="treatment", alpha=0.6, bins=30, kde=True)
plt.xlabel("Earnings")
plt.ylabel("Distribution")
plt.title("Distribution of Earnings by Treatment Group")
plt.legend(title="Treatment Group")
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("../temp/earnings_distribution.pdf", dpi=1000, bbox_inches="tight")
plt.close()

print("Distribution plot saved to temp/earnings_distribution.pdf")


In [None]:
# Calculate treatment effect and create point estimates
# Compare earnings between treatment and control groups

df_stats = (
    df_users.groupby("treatment")
    .agg({"earnings": ["mean", "var", "count"]})
    .round(2)
)

# Flatten column names
df_stats.columns = ["sample_mean", "sample_var", "sample_size"]
df_stats = df_stats.reset_index()

# Calculate standard error
df_stats["sample_se"] = np.sqrt(df_stats["sample_var"]) / np.sqrt(df_stats["sample_size"])

print("\nDetailed statistics by treatment group:")
print(df_stats)

# Calculate treatment effect
treatment_mean = df_stats[df_stats["treatment"] == "Treatment"]["sample_mean"].iloc[0]
control_mean = df_stats[df_stats["treatment"] == "Control"]["sample_mean"].iloc[0]
treatment_effect = treatment_mean - control_mean

print(f"\nTreatment Effect:")
print(f"Treatment group average: ${treatment_mean:.2f}")
print(f"Control group average: ${control_mean:.2f}")
print(f"Treatment effect: ${treatment_effect:.2f}")

# Create point estimate plot with confidence intervals
plt.figure(figsize=(8, 6))
plt.errorbar(
    x=range(len(df_stats)),
    y=df_stats["sample_mean"],
    yerr=1.96 * df_stats["sample_se"],  # 95% confidence intervals
    fmt="o",
    capsize=5,
    capthick=2,
    elinewidth=2,
    markersize=8,
    color="red",
    alpha=0.7,
)

plt.xticks(range(len(df_stats)), df_stats["treatment"])
plt.xlabel("Experimental Groups")
plt.ylabel("Mean Estimates")
plt.title("Treatment Effect with 95% Confidence Intervals")
plt.grid(True, alpha=0.3, axis="y")
plt.tight_layout()
plt.savefig("../temp/earnings_point_estimates.pdf", dpi=1000, bbox_inches="tight")
plt.close()

print("Point estimates plot saved to temp/earnings_point_estimates.pdf")


In [None]:
# Statistical significance testing
from scipy import stats

treatment_earnings = df_users[df_users["treatment"] == "Treatment"]["earnings"]
control_earnings = df_users[df_users["treatment"] == "Control"]["earnings"]

# Perform two-sample t-test
ttest_result = stats.ttest_ind(treatment_earnings, control_earnings)

print(f"\nStatistical Test Results:")
print(f"T-statistic: {ttest_result.statistic:.4f}")
print(f"P-value: {ttest_result.pvalue:.6f}")

# Calculate effect size (Cohen's d)
pooled_std = np.sqrt(
    (
        (len(treatment_earnings) - 1) * treatment_earnings.var()
        + (len(control_earnings) - 1) * control_earnings.var()
    )
    / (len(treatment_earnings) + len(control_earnings) - 2)
)

cohens_d = (treatment_earnings.mean() - control_earnings.mean()) / pooled_std

print(f"Cohen's d (effect size): {cohens_d:.4f}")

# Calculate confidence interval for the difference in means
diff_means = treatment_earnings.mean() - control_earnings.mean()
se_diff = np.sqrt(
    treatment_earnings.var() / len(treatment_earnings)
    + control_earnings.var() / len(control_earnings)
)

ci_lower = diff_means - 1.96 * se_diff
ci_upper = diff_means + 1.96 * se_diff

print(f"\nTreatment Effect Analysis:")
print(f"Difference in means: {diff_means:.2f}")
print(f"95% Confidence Interval: [{ci_lower:.2f}, {ci_upper:.2f}]")

# Percentage change
pct_change = (diff_means / control_earnings.mean()) * 100
print(f"Percentage change: {pct_change:.1f}%")


In [None]:
# Additional visualizations

# Box plot comparison
plt.figure(figsize=(8, 6))
sns.boxplot(data=df_users, x="treatment", y="earnings", palette="Set2")
plt.xlabel("Treatment Group")
plt.ylabel("Earnings")
plt.title("Earnings Distribution by Treatment Group")
plt.grid(True, alpha=0.3, axis="y")
plt.tight_layout()
plt.savefig("../temp/earnings_boxplot.pdf", dpi=1000, bbox_inches="tight")
plt.close()

print("Box plot saved to temp/earnings_boxplot.pdf")

# Results interpretation
print(f"\n" + "=" * 60)
print("EXPERIMENT RESULTS INTERPRETATION")
print("=" * 60)

significance_level = 0.05
is_significant = ttest_result.pvalue < significance_level

print(f"\n1. Statistical Significance:")
print(f"   P-value: {ttest_result.pvalue:.6f}")
print(f"   Significant at α = {significance_level}: {'Yes' if is_significant else 'No'}")

print(f"\n2. Effect Size:")
if abs(cohens_d) < 0.2:
    effect_interpretation = "Small"
elif abs(cohens_d) < 0.5:
    effect_interpretation = "Medium"
elif abs(cohens_d) < 0.8:
    effect_interpretation = "Large"
else:
    effect_interpretation = "Very Large"

print(f"   Cohen's d: {cohens_d:.4f} ({effect_interpretation} effect)")

print(f"\n3. Practical Significance:")
print(f"   Treatment {'increases' if diff_means > 0 else 'decreases'} earnings by ${abs(diff_means):.2f}")
print(f"   This represents a {abs(pct_change):.1f}% {'increase' if diff_means > 0 else 'decrease'}")

print(f"\n4. Confidence in Results:")
print(f"   95% CI: [${ci_lower:.2f}, ${ci_upper:.2f}]")

if ci_lower > 0:
    print("   We can be 95% confident the treatment has a positive effect")
elif ci_upper < 0:
    print("   We can be 95% confident the treatment has a negative effect")
else:
    print("   The confidence interval includes zero - effect direction is uncertain")


---

## 🎉 Summary

We analyzed experimental results comprehensively:
- **Distribution visualization** and comparison
- **Point estimates** with confidence intervals
- **Statistical significance testing** (t-tests)
- **Effect size calculation** (Cohen's d)
- **Treatment effect interpretation** (statistical vs practical significance)

**Key concepts learned:**
- Statistical vs practical significance
- Confidence intervals for treatment effects
- Effect size interpretation
- Robust experimental analysis

Experimental analysis helps us understand causal effects and make data-driven decisions.

### Next:
We'll explore advanced statistical concepts like Law of Large Numbers and Central Limit Theorem

---
