---

### ðŸŽ“ **Professor**: Apostolos Filippas

### ðŸ“˜ **Class**: E-Commerce

### ðŸ“‹ **Topic**: Advanced Statistical Concepts - LLN and CLT

ðŸš« **Note**: You are not allowed to share the contents of this notebook with anyone outside this class without written permission by the professor.

---


## Overview

Let's use our Python knowledge to perform simulations that will help us understand statistical concepts better. We'll explore the Law of Large Numbers and Central Limit Theorem.

**What we'll learn:**
- Law of Large Numbers (LLN) through simulation
- Central Limit Theorem (CLT) demonstration
- Confidence intervals and their meaning
- Statistical foundations of experimental analysis


In [None]:
# Let's import the libraries we will use
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from scipy import stats


## Law of Large Numbers - Multiple Distributions

Let's see how LLN works with different types of distributions:


In [None]:
# Uniform distribution example
np.random.seed(42)
N = 10000

# All numbers from 0 to 6, with uniform probability
my_sample = np.random.uniform(0, 6, N)
sample_number = np.arange(1, N + 1)
cumulative_mean = np.cumsum(my_sample) / sample_number

plt.figure(figsize=(10, 6))
plt.plot(sample_number, cumulative_mean, linewidth=1.5, color="darkgreen")
plt.xlabel("Number of samples")
plt.ylabel("Sample mean")
plt.ylim(0, 6)
plt.axhline(y=3, color="red", alpha=0.7, linestyle="--", label="True mean = 3")
plt.title("Law of Large Numbers: Uniform Distribution")
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.savefig("../temp/LLN_uniform_variable.pdf", dpi=1000, bbox_inches="tight")
plt.close()

print(f"Final sample mean after {N} samples: {cumulative_mean[-1]:.4f}")
print("Uniform distribution plot saved to temp/LLN_uniform_variable.pdf")


## Central Limit Theorem (CLT)

The CLT states that the sample average will follow a normal distribution with mean equal to the "true" mean, and standard deviation = "true" std / sqrt(N), regardless of the underlying distribution:


In [None]:
# CLT with small sample size (n=5)
N = 100000
sample_size = 5

sample_means = []
for i in range(N):
    sample = np.random.uniform(0, 1, sample_size)
    sample_means.append(np.mean(sample))

plt.figure(figsize=(10, 6))
sns.histplot(sample_means, bins=50, kde=True, alpha=0.7, color="skyblue")
plt.xlabel("Sample mean")
plt.ylabel("Frequency")
plt.xlim(0, 1)
plt.axvline(x=0.5, color="red", alpha=0.7, linestyle="--", label="True mean = 0.5")
plt.title(f"Central Limit Theorem: Uniform Distribution (n={sample_size})")
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("../temp/CLT_uniform_variable_5.pdf", dpi=1000, bbox_inches="tight")
plt.close()

print(f"CLT demonstration (n={sample_size}) saved to temp/CLT_uniform_variable_5.pdf")


In [None]:
# CLT with larger sample size (n=50)
N = 100000
sample_size = 50

sample_means = []
for i in range(N):
    sample = np.random.uniform(0, 1, sample_size)
    sample_means.append(np.mean(sample))

plt.figure(figsize=(10, 6))
sns.histplot(sample_means, bins=50, kde=True, alpha=0.7, color="lightcoral")
plt.xlabel("Sample mean")
plt.ylabel("Frequency")
plt.xlim(0, 1)
plt.axvline(x=0.5, color="red", alpha=0.7, linestyle="--", label="True mean = 0.5")
plt.title(f"Central Limit Theorem: Uniform Distribution (n={sample_size})")
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("../temp/CLT_uniform_variable_50.pdf", dpi=1000, bbox_inches="tight")
plt.close()

print(f"CLT demonstration (n={sample_size}) saved to temp/CLT_uniform_variable_50.pdf")
print("Notice how the distribution gets tighter (smaller standard deviation) with larger sample size!")


## Confidence Intervals

Let's create 95% confidence intervals and see how many contain the true mean:


In [None]:
# Create 95% confidence intervals for sample size = 50, 100 times
sample_size = 50
num_experiments = 100

confidence_intervals = []

for i in range(num_experiments):
    sample = np.random.uniform(0, 1, sample_size)
    sample_mean = np.mean(sample)
    sample_var = np.var(sample, ddof=1)  # Sample variance with Bessel's correction

    confidence_intervals.append(
        {"experiment_id": i + 1, "sample_mean": sample_mean, "sample_var": sample_var}
    )

df_confidence_intervals = pd.DataFrame(confidence_intervals)

# Compute standard errors
df_confidence_intervals["sample_se"] = np.sqrt(df_confidence_intervals["sample_var"]) / np.sqrt(sample_size)

# Create confidence interval plot
plt.figure(figsize=(12, 8))

x_pos = df_confidence_intervals["experiment_id"]
y_means = df_confidence_intervals["sample_mean"]
errors = 2 * df_confidence_intervals["sample_se"]  # 95% CI (â‰ˆ 2 standard errors)

# Plot points and error bars
plt.errorbar(
    x_pos,
    y_means,
    yerr=errors,
    fmt="o",
    alpha=0.6,
    capsize=3,
    capthick=1,
    elinewidth=1,
    markersize=4,
)

plt.axhline(y=0.5, color="red", alpha=0.7, linestyle="--", linewidth=2, label="True mean = 0.5")
plt.xlabel("Experiment Number")
plt.ylabel("Sample Mean")
plt.title(f"95% Confidence Intervals (Sample Size = {sample_size})")
plt.ylim(0, 1)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("../temp/confidence_intervals_50.pdf", dpi=1000, bbox_inches="tight")
plt.close()

# Calculate how many CIs contain the true mean
true_mean = 0.5
contains_true_mean = (
    df_confidence_intervals["sample_mean"] - 2 * df_confidence_intervals["sample_se"] <= true_mean
) & (
    df_confidence_intervals["sample_mean"] + 2 * df_confidence_intervals["sample_se"] >= true_mean
)

coverage_rate = contains_true_mean.mean()
print(f"Coverage rate for 95% CIs: {coverage_rate:.1%} (Expected: ~95%)")
print("Confidence intervals plot saved to temp/confidence_intervals_50.pdf")


In [None]:
# Statistical concepts verification
print("\n" + "=" * 70)
print("STATISTICAL CONCEPTS VERIFICATION")
print("=" * 70)

# For uniform distribution [0,1]: theoretical mean = 0.5, theoretical variance = 1/12
theoretical_mean = 0.5
theoretical_var = 1 / 12
theoretical_std = np.sqrt(theoretical_var)

print(f"\nUniform Distribution [0,1] - Theoretical Properties:")
print(f"Mean: {theoretical_mean}")
print(f"Variance: {theoretical_var:.4f}")
print(f"Standard Deviation: {theoretical_std:.4f}")

# CLT predictions for different sample sizes
print(f"\nCentral Limit Theorem Predictions:")
for n in [5, 50, 100, 500]:
    clt_std = theoretical_std / np.sqrt(n)
    print(f"n={n}: Sample mean distribution should have std = {clt_std:.4f}")

# Monte Carlo verification
def monte_carlo_verification(n_samples, n_experiments=10000):
    """Verify CLT predictions with Monte Carlo simulation"""
    sample_means = []

    for _ in range(n_experiments):
        sample = np.random.uniform(0, 1, n_samples)
        sample_means.append(np.mean(sample))

    empirical_mean = np.mean(sample_means)
    empirical_std = np.std(sample_means, ddof=1)
    theoretical_std = np.sqrt(1 / 12) / np.sqrt(n_samples)

    return empirical_mean, empirical_std, theoretical_std

print(f"\nMonte Carlo Verification (10,000 experiments each):")
print(f"{'n':<10}{'Emp. Mean':<12}{'Emp. Std':<12}{'Theory Std':<12}{'Difference':<12}")
print("-" * 60)

for n in [5, 10, 25, 50, 100]:
    emp_mean, emp_std, theory_std = monte_carlo_verification(n)
    diff = abs(emp_std - theory_std)
    print(f"{n:<10}{emp_mean:<12.4f}{emp_std:<12.4f}{theory_std:<12.4f}{diff:<12.4f}")


In [None]:
# Law of Large Numbers (LLN) Demonstration
# The law of large numbers states that the sample average will converge
# to the true average, as the number of samples/observations grow.

# Set random seed for reproducibility
np.random.seed(42)
N = 10000

# Simulate coin tosses (0 or 1, each with 50% probability)
my_sample = np.random.choice([0, 1], N, replace=True)

# Calculate cumulative mean (running average)
cumulative_mean = np.cumsum(my_sample) / np.arange(1, N + 1)

print(f"True probability: 0.5")
print(f"Sample average after {N} tosses: {cumulative_mean[-1]:.4f}")
print(f"Difference from true value: {abs(cumulative_mean[-1] - 0.5):.4f}")

# Plot the convergence
plt.figure(figsize=(10, 6))
plt.plot(range(1, N + 1), cumulative_mean, linewidth=1, alpha=0.8)
plt.axhline(y=0.5, color='red', linestyle='--', linewidth=2, label='True probability (0.5)')
plt.xlabel('Number of tosses')
plt.ylabel('Cumulative average')
plt.title('Law of Large Numbers: Coin Toss Example')
plt.legend()
plt.grid(True, alpha=0.3)
plt.savefig("../temp/lln_demonstration.pdf", dpi=1000, bbox_inches="tight")
plt.close()

print("LLN plot saved to temp/lln_demonstration.pdf")


---

## ðŸŽ‰ Summary

We explored fundamental statistical concepts through simulations:

**Law of Large Numbers:**
- Sample means converge to true population means
- Works regardless of underlying distribution
- Convergence rate depends on sample size

**Central Limit Theorem:**
- Sample means are normally distributed
- Mean of sample means = population mean
- Standard deviation = population std / sqrt(n)
- Works for any underlying distribution (with finite variance)

**Confidence Intervals:**
- Provide range of plausible values for population parameter
- 95% of CIs should contain the true parameter
- Width decreases with larger sample sizes
- Foundation for statistical inference

These concepts are fundamental to:
- Experimental design and analysis
- A/B testing in e-commerce
- Survey sampling and estimation
- Statistical quality control

---
