# CUPED: Understanding and Implementing Variance Reduction in A/B Testing

## 1. Introduction to CUPED

CUPED (Controlled-experiment Using Pre-Experiment Data) is a variance reduction technique used to improve the sensitivity of A/B tests. It was first introduced by Microsoft in 2013 and has since become a popular method in the field of online experimentation.

The core idea of CUPED is to leverage historical (pre-experiment) data to reduce the noise in experiment metrics, thereby allowing experimenters to:
- Detect smaller changes with the same sample size
- Maintain the same detection power with smaller sample sizes
- Reduce the duration of experiments

let us start by importing the necessary libraries:

```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import statsmodels.api as sm
from statsmodels.stats.power import TTestIndPower
from sklearn.linear_model import LinearRegression
import warnings

# For reproducibility
np.random.seed(42)

# Set styles for visualizations
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("Set2")
warnings.filterwarnings('ignore')

# Display options for pandas
pd.set_option('display.max_columns', None)
```

## 2. Why CUPED is Needed: The Variance Problem in A/B Testing

In A/B testing, we're trying to detect a difference in a metric (e.g., revenue per user, conversion rate) between a control group and a treatment group. However, many metrics have high variance, which creates two main problems:

1. **Low Statistical Power**: High variance makes it difficult to detect small but meaningful treatment effects
2. **Long Experiment Duration**: To compensate for high variance, we need larger sample sizes, which means longer experiment run times

Let illustrate this with an example:

```python
def simulate_user_data(n_users=10000, treatment_effect=0, pre_post_correlation=0.7):
    """
    Simulate user data with pre and post experiment metrics
    
    Parameters:
    -----------
    n_users : int
        Number of users
    treatment_effect : float
        The effect size to simulate for the treatment group
    pre_post_correlation : float
        Correlation between pre and post experiment metrics
        
    Returns:
    --------
    DataFrame with user data
    """
    # Create user IDs
    user_ids = np.arange(n_users)
    
    # Assign users to control or treatment (50/50 split)
    treatment = np.random.binomial(1, 0.5, n_users)
    
    # Generate pre-experiment data (e.g., spending in the previous month)
    # We'll simulate a right-skewed distribution typical of spending/revenue data
    base_spending = np.random.gamma(shape=2, scale=10, size=n_users)
    
    # Create user-level noise that affects both pre and post measurements
    user_noise = np.random.normal(0, 10, n_users)
    
    # Generate post-experiment data with correlation to pre-experiment data
    # and with treatment effect for the treatment group
    post_spending = (
        base_spending * 1.1 +  # Some natural growth for all users
        pre_post_correlation * user_noise +  # Correlated component
        np.random.normal(0, 15, n_users) +  # Random noise specific to post period
        treatment_effect * treatment  # Treatment effect
    )
    
    # For visualization purposes, let us also generate a random cohort
    cohorts = np.random.choice(['New', 'Returning', 'Loyal'], n_users, p=[0.3, 0.5, 0.2])
    
    # Create DataFrame
    df = pd.DataFrame({
        'user_id': user_ids,
        'treatment': treatment,
        'cohort': cohorts,
        'pre_metric': base_spending,
        'post_metric': post_spending
    })
    
    return df

# Simulate data without treatment effect
df = simulate_user_data(n_users=10000, treatment_effect=0)

# Display first few rows
print("Sample of simulated user data:")
df.head()
```

Now, let us visualize the high variance in our data:

```python
plt.figure(figsize=(12, 6))

# Plot histograms of post-experiment metrics
plt.subplot(1, 2, 1)
sns.histplot(df[df['treatment'] == 0]['post_metric'], 
             label='Control', alpha=0.7, kde=True)
sns.histplot(df[df['treatment'] == 1]['post_metric'], 
             label='Treatment', alpha=0.7, kde=True)
plt.title('Distribution of Post-Experiment Metric')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()

# Plot scatter plot of pre vs post metrics
plt.subplot(1, 2, 2)
sns.scatterplot(data=df, x='pre_metric', y='post_metric', 
                hue='treatment', alpha=0.5)
plt.title('Pre vs. Post Metric Values')
plt.xlabel('Pre-Experiment Metric')
plt.ylabel('Post-Experiment Metric')

plt.tight_layout()
plt.show()

# Calculate correlation between pre and post metrics
correlation = df['pre_metric'].corr(df['post_metric'])
print(f"Correlation between pre and post metrics: {correlation:.4f}")
```

let us also run a traditional t-test on this data:

```python
def run_ttest(df):
    """Run a t-test on the post-experiment metric"""
    control = df[df['treatment'] == 0]['post_metric']
    treatment = df[df['treatment'] == 1]['post_metric']
    
    # Calculate means
    control_mean = control.mean()
    treatment_mean = treatment.mean()
    
    # Run t-test
    t_stat, p_value = stats.ttest_ind(treatment, control, equal_var=False)
    
    # Calculate standard errors
    control_se = control.std() / np.sqrt(len(control))
    treatment_se = treatment.std() / np.sqrt(len(treatment))
    
    # Calculate relative difference
    rel_diff = (treatment_mean - control_mean) / control_mean * 100
    
    # Print results
    print(f"Control Mean: {control_mean:.4f} (SE: {control_se:.4f})")
    print(f"Treatment Mean: {treatment_mean:.4f} (SE: {treatment_se:.4f})")
    print(f"Absolute Difference: {treatment_mean - control_mean:.4f}")
    print(f"Relative Difference: {rel_diff:.2f}%")
    print(f"t-statistic: {t_stat:.4f}")
    print(f"p-value: {p_value:.4f}")
    print(f"Statistically Significant (α=0.05): {p_value < 0.05}")
    
    return t_stat, p_value

print("Traditional A/B Test Results:")
t_stat, p_value = run_ttest(df)
```

## 3. CUPED: The Solution to Variance Reduction

CUPED uses pre-experiment data as a control variate to reduce the variance in the post-experiment metric. The key idea is to adjust each user's post-experiment value by subtracting the component that can be predicted from their pre-experiment value.

The mathematical formula for CUPED is:

Y_i^CUPED = Y_i - θ(X_i - μ_X)

Where:
- Y_i is the original post-experiment metric for user i
- X_i is the pre-experiment metric for user i
- μ_X is the mean of the pre-experiment metric
- θ is the coefficient that minimizes the variance (typically the ratio of covariance to variance)

let us implement CUPED:

```python
def apply_cuped(df, pre_column='pre_metric', post_column='post_metric'):
    """
    Apply CUPED transformation to the post-experiment metric
    
    Parameters:
    -----------
    df : pandas DataFrame
        Data frame containing user data
    pre_column : str
        Name of the column with pre-experiment data
    post_column : str
        Name of the column with post-experiment data
        
    Returns:
    --------
    DataFrame with added cuped-adjusted column
    """

    # Calculate the CUPED coefficient theta
    # This is essentially the slope from a linear regression
    cov_matrix = np.cov(df[pre_column], df[post_column])
    covariance = cov_matrix[0, 1]
    variance = cov_matrix[0, 0]
    theta = covariance / variance
    
    # Calculate the mean of pre-experiment metric
    pre_mean = df[pre_column].mean()
    
    # Apply CUPED formula
    df[f'{post_column}_cuped'] = df[post_column] - theta * (df[pre_column] - pre_mean)
    
    print(f"CUPED coefficient (theta): {theta:.4f}")
    
    return df

# Apply CUPED to our data
df = apply_cuped(df)

# Show first few rows
print("\nData after CUPED transformation:")
df.head()
```

Now let us visualize the effect of CUPED on the data:

```python
plt.figure(figsize=(12, 6))

# Plot histograms of original vs CUPED-adjusted metrics for control group
plt.subplot(1, 2, 1)
sns.histplot(df[df['treatment'] == 0]['post_metric'], 
             label='Original', alpha=0.7, kde=True)
sns.histplot(df[df['treatment'] == 0]['post_metric_cuped'], 
             label='CUPED-adjusted', alpha=0.7, kde=True)
plt.title('Control Group: Original vs. CUPED-adjusted')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()

# Plot histograms of original vs CUPED-adjusted metrics for treatment group
plt.subplot(1, 2, 2)
sns.histplot(df[df['treatment'] == 1]['post_metric'], 
             label='Original', alpha=0.7, kde=True)
sns.histplot(df[df['treatment'] == 1]['post_metric_cuped'], 
             label='CUPED-adjusted', alpha=0.7, kde=True)
plt.title('Treatment Group: Original vs. CUPED-adjusted')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()

plt.tight_layout()
plt.show()

# Calculate variance reduction
original_variance = df['post_metric'].var()
cuped_variance = df['post_metric_cuped'].var()
variance_reduction = (1 - cuped_variance / original_variance) * 100

print(f"Original variance: {original_variance:.4f}")
print(f"CUPED-adjusted variance: {cuped_variance:.4f}")
print(f"Variance reduction: {variance_reduction:.2f}%")
```

Now, let us run a t-test on the CUPED-adjusted data:

```python
print("\nCUPED-adjusted A/B Test Results:")
df_copy = df.copy()
df_copy['post_metric'] = df_copy['post_metric_cuped']  # Using the CUPED-adjusted metric
t_stat_cuped, p_value_cuped = run_ttest(df_copy)
```

## 4. Power Analysis: Quantifying the Benefits of CUPED

let us quantify how much CUPED helps in terms of required sample size or experiment duration:

```python
def calculate_required_sample_size(effect_size, power=0.8, alpha=0.05, variance=None, mean=None, relative_effect=None):
    """
    Calculate required sample size per group for detecting an effect
    
    Parameters:
    -----------
    effect_size : float
        Cohen's d effect size
    power : float
        Desired statistical power (default: 0.8)
    alpha : float
        Significance level (default: 0.05)
    variance : float
        Variance of the metric (if provided, relative_effect and mean must also be provided)
    mean : float
        Mean of the metric (if provided, relative_effect and variance must also be provided)
    relative_effect : float
        Relative effect size as a decimal (if provided, mean and variance must also be provided)
        
    Returns:
    --------
    Required sample size per group
    """
    if variance is not None and mean is not None and relative_effect is not None:
        # Convert from relative effect to Cohen's d
        absolute_effect = mean * relative_effect
        effect_size = absolute_effect / np.sqrt(variance)
    
    # Calculate required sample size
    analysis = TTestIndPower()
    sample_size = analysis.solve_power(effect_size=effect_size, 
                                      power=power, 
                                      alpha=alpha, 
                                      ratio=1.0)
    
    return sample_size

def compare_cuped_vs_traditional(df, effect_sizes=[0.01, 0.02, 0.05, 0.1]):
    """Compare required sample sizes for CUPED vs. traditional approach"""
    # Calculate mean and variances
    traditional_mean = df['post_metric'].mean()
    traditional_var = df['post_metric'].var()
    
    cuped_mean = df['post_metric_cuped'].mean()
    cuped_var = df['post_metric_cuped'].var()
    
    results = []
    
    for effect in effect_sizes:
        # Calculate required sample sizes
        trad_size = calculate_required_sample_size(
            variance=traditional_var, 
            mean=traditional_mean,
            relative_effect=effect)
        
        cuped_size = calculate_required_sample_size(
            variance=cuped_var, 
            mean=cuped_mean,
            relative_effect=effect)
        
        # Calculate reduction
        reduction = (1 - cuped_size / trad_size) * 100
        
        results.append({
            'Relative Effect Size': f"{effect * 100}%",
            'Traditional Sample Size': int(trad_size),
            'CUPED Sample Size': int(cuped_size),
            'Sample Size Reduction': f"{reduction:.1f}%"
        })
    
    return pd.DataFrame(results)

# Compare sample size requirements
comparison_df = compare_cuped_vs_traditional(df)
print("\nSample Size Comparison (per group):")
comparison_df
```

## 5. Simulating Statistical Power with CUPED

let us demonstrate how CUPED improves statistical power by simulating multiple experiments:

```python
def run_simulation(n_simulations=1000, n_users=10000, treatment_effect=2.0, 
                  pre_post_correlation=0.7):
    """
    Simulate multiple experiments and compare CUPED vs. traditional approach
    
    Parameters:
    -----------
    n_simulations : int
        Number of simulations to run
    n_users : int
        Number of users per simulation
    treatment_effect : float
        True treatment effect to simulate
    pre_post_correlation : float
        Correlation between pre and post metrics
        
    Returns:
    --------
    DataFrame with simulation results
    """
    results = []
    
    for i in range(n_simulations):
        # Simulate data with treatment effect
        df_sim = simulate_user_data(
            n_users=n_users, 
            treatment_effect=treatment_effect,
            pre_post_correlation=pre_post_correlation
        )
        
        # Run traditional test
        control = df_sim[df_sim['treatment'] == 0]['post_metric']
        treatment = df_sim[df_sim['treatment'] == 1]['post_metric']
        _, p_value_trad = stats.ttest_ind(treatment, control, equal_var=False)
        
        # Apply CUPED
        df_sim = apply_cuped(df_sim)
        
        # Run CUPED test
        control_cuped = df_sim[df_sim['treatment'] == 0]['post_metric_cuped']
        treatment_cuped = df_sim[df_sim['treatment'] == 1]['post_metric_cuped']
        _, p_value_cuped = stats.ttest_ind(treatment_cuped, control_cuped, equal_var=False)
        
        # Store results
        results.append({
            'simulation': i,
            'p_value_trad': p_value_trad,
            'p_value_cuped': p_value_cuped,
            'significant_trad': p_value_trad < 0.05,
            'significant_cuped': p_value_cuped < 0.05
        })
    
    return pd.DataFrame(results)

# Run simulation with a smaller effect size that's hard to detect
print("\nRunning simulations to compare statistical power...")
sim_results = run_simulation(
    n_simulations=100,  # Reduced for notebook performance
    n_users=5000,       # Smaller sample size to show the difference
    treatment_effect=1.5,  # Small effect that's hard to detect
    pre_post_correlation=0.7
)

# Calculate power (percentage of significant results)
trad_power = sim_results['significant_trad'].mean() * 100
cuped_power = sim_results['significant_cuped'].mean() * 100

print(f"\nStatistical Power Comparison:")
print(f"Traditional A/B Test Power: {trad_power:.1f}%")
print(f"CUPED A/B Test Power: {cuped_power:.1f}%")
print(f"Power Improvement: {cuped_power - trad_power:.1f} percentage points")

# Visualize p-value distributions
plt.figure(figsize=(10, 6))
plt.hist(sim_results['p_value_trad'], bins=20, alpha=0.5, label='Traditional')
plt.hist(sim_results['p_value_cuped'], bins=20, alpha=0.5, label='CUPED')
plt.axvline(0.05, color='red', linestyle='--', label='α = 0.05')
plt.xlabel('p-value')
plt.ylabel('Frequency')
plt.title('Distribution of p-values: Traditional vs. CUPED')
plt.legend()
plt.show()
```

## 6. Best Practices for Implementing CUPED

Here are some best practices when implementing CUPED in real-world A/B testing:

### 6.1 Selecting the Right Control Covariates

The pre-experiment data used for CUPED should be:

```python
def demo_covariate_selection():
    """Demonstrate covariate selection for CUPED"""
                             
    # let us simulate additional pre-experiment metrics with different correlations
    n_users = 10000
    df_covariates = simulate_user_data(n_users=n_users, treatment_effect=2)
    
    # Add more potential covariates
    # High correlation covariate
    df_covariates['pre_metric_high_corr'] = (
        df_covariates['post_metric'] * 0.8 + 
        np.random.normal(0, 5, n_users)
    )
    
    # Medium correlation covariate
    df_covariates['pre_metric_med_corr'] = (
        df_covariates['post_metric'] * 0.5 + 
        np.random.normal(0, 10, n_users)
    )
    
    # Low correlation covariate
    df_covariates['pre_metric_low_corr'] = (
        df_covariates['post_metric'] * 0.2 + 
        np.random.normal(0, 15, n_users)
    )
    
    # Zero correlation covariate
    df_covariates['pre_metric_no_corr'] = np.random.normal(20, 10, n_users)
    
    # Calculate correlations
    covariates = ['pre_metric', 'pre_metric_high_corr', 'pre_metric_med_corr', 
                 'pre_metric_low_corr', 'pre_metric_no_corr']
    
    correlations = []
    variances = []
    cuped_variances = []
    
    for covariate in covariates:
        # Calculate correlation
        corr = df_covariates[covariate].corr(df_covariates['post_metric'])
        correlations.append(corr)
        
        # Apply CUPED with this covariate
        df_temp = apply_cuped(df_covariates.copy(), pre_column=covariate)
        
        # Calculate variances
        orig_var = df_temp['post_metric'].var()
        cuped_var = df_temp['post_metric_cuped'].var()
        
        variances.append(orig_var)
        cuped_variances.append(cuped_var)
    
    # Create results table
    results = pd.DataFrame({
        'Covariate': covariates,
        'Correlation with Post-Metric': correlations,
        'Original Variance': variances,
        'CUPED Variance': cuped_variances,
        'Variance Reduction': [(1 - cv/ov)*100 for cv, ov in zip(cuped_variances, variances)]
    })
    
    return results

covariate_results = demo_covariate_selection()
print("\nCovariate Selection Analysis:")
covariate_results
```

### 6.2. Multiple Covariates with CUPED

In practice, we often have multiple pre-experiment metrics. CUPED can be extended to use multiple covariates:

```python
def apply_multi_cuped(df, pre_columns, post_column='post_metric'):
    """
    Apply CUPED with multiple covariates
    
    Parameters:
    -----------
    df : pandas DataFrame
        Data frame containing user data
    pre_columns : list of str
        Names of columns with pre-experiment data
    post_column : str
        Name of the column with post-experiment data
        
    Returns:
    --------
    DataFrame with added cuped-adjusted column
    """

    # Create a copy to avoid modifying the original
    df_copy = df.copy()
    
    # Extract pre-experiment features and post-experiment metric
    X = df_copy[pre_columns].values
    y = df_copy[post_column].values
    
    # Add a constant for the intercept
    X = sm.add_constant(X)
    
    # Fit linear regression model
    model = sm.OLS(y, X).fit()
    
    # Get the coefficient for each feature
    coefficients = model.params[1:]  # Skip the intercept
    
    # Calculate the mean of each pre-experiment metric
    pre_means = df_copy[pre_columns].mean()
    
    # Apply CUPED formula with multiple covariates
    df_copy[f'{post_column}_multi_cuped'] = df_copy[post_column].copy()
    
    for i, col in enumerate(pre_columns):
        df_copy[f'{post_column}_multi_cuped'] -= (
            coefficients[i] * (df_copy[col] - pre_means[col])
        )
    
    print(f"CUPED coefficients: {coefficients}")
    
    return df_copy

# let us use the dataset with multiple covariates
multi_cuped_df = df.copy()

# Add some additional pre-experiment metrics
n_users = len(multi_cuped_df)
multi_cuped_df['pre_metric_2'] = (
    multi_cuped_df['pre_metric'] * 0.6 + 
    np.random.normal(0, 5, n_users)
)
multi_cuped_df['pre_metric_3'] = (
    multi_cuped_df['post_metric'] * 0.3 + 
    np.random.normal(0, 8, n_users)
)

# Apply multiple covariate CUPED
multi_cuped_result = apply_multi_cuped(
    multi_cuped_df, 
    pre_columns=['pre_metric', 'pre_metric_2', 'pre_metric_3']
)

# Compare variances
original_var = multi_cuped_result['post_metric'].var()
single_cuped_var = multi_cuped_result['post_metric_cuped'].var()
multi_cuped_var = multi_cuped_result['post_metric_multi_cuped'].var()

print("\nMultiple Covariate CUPED Results:")
print(f"Original variance: {original_var:.4f}")
print(f"Single covariate CUPED variance: {single_cuped_var:.4f}")
print(f"Multiple covariate CUPED variance: {multi_cuped_var:.4f}")
print(f"Single CUPED variance reduction: {(1 - single_cuped_var/original_var)*100:.2f}%")
print(f"Multiple CUPED variance reduction: {(1 - multi_cuped_var/original_var)*100:.2f}%")
```

### 6.3. Stratified Analysis with CUPED

CUPED can be combined with stratification for even better results:

```python
def apply_stratified_cuped(df, strata_column, pre_column='pre_metric', post_column='post_metric'):
    """
    Apply CUPED separately for each stratum
    
    Parameters:
    -----------
    df : pandas DataFrame
        Data frame containing user data
    strata_column : str
        Name of the column used for stratification
    pre_column : str
        Name of the column with pre-experiment data
    post_column : str
        Name of the column with post-experiment data
        
    Returns:
    --------
    DataFrame with added cuped-adjusted column
    """
    # Create a copy to avoid modifying the original
    df_copy = df.copy()
    
    # Initialize the CUPED-adjusted column
    df_copy[f'{post_column}_stratified_cuped'] = np.nan
    
    # Apply CUPED separately for each stratum
    strata = df_copy[strata_column].unique()
    
    print(f"Applying stratified CUPED for {len(strata)} strata:")
    
    for stratum in strata:
        # Filter for this stratum
        mask = df_copy[strata_column] == stratum
        stratum_df = df_copy[mask].copy()
        
        # Calculate the CUPED coefficient theta for this stratum
        cov_matrix = np.cov(stratum_df[pre_column], stratum_df[post_column])
        covariance = cov_matrix[0, 1]
        variance = cov_matrix[0, 0]
        theta = covariance / variance if variance > 0 else 0
        
        # Calculate the mean of pre-experiment metric for this stratum
        pre_mean = stratum_df[pre_column].mean()
        
        # Apply CUPED formula
        df_copy.loc[mask, f'{post_column}_stratified_cuped'] = (
            df_copy.loc[mask, post_column] - 
            theta * (df_copy.loc[mask, pre_column] - pre_mean)
        )
        
        print(f"  - {stratum}: CUPED coefficient = {theta:.4f}")
    
    return df_copy

# Apply stratified CUPED by cohort
strat_cuped_df = apply_stratified_cuped(df, strata_column='cohort')

# Compare variances
original_var = strat_cuped_df['post_metric'].var()
cuped_var = strat_cuped_df['post_metric_cuped'].var()
strat_cuped_var = strat_cuped_df['post_metric_stratified_cuped'].var()

print("\nStratified CUPED Results:")
print(f"Original variance: {original_var:.4f}")
print(f"Standard CUPED variance: {cuped_var:.4f}")
print(f"Stratified CUPED variance: {strat_cuped_var:.4f}")
print(f"Standard CUPED variance reduction: {(1 - cuped_var/original_var)*100:.2f}%")
print(f"Stratified CUPED variance reduction: {(1 - strat_cuped_var/original_var)*100:.2f}%")
```

## 7. Real-world Example: E-commerce Revenue Analysis

let us look at a more realistic e-commerce example:

```python
def simulate_ecommerce_data(n_users=10000, treatment_effect=0):
    """Simulate e-commerce user data"""
    user_ids = np.arange(n_users)
    treatment = np.random.binomial(1, 0.5, n_users)
    
    # User segments
    segments = np.random.choice(['New', 'Returning', 'VIP'], n_users, p=[0.6, 0.3, 0.1])
    
    # Segment-specific base spending
    base_spending = np.zeros(n_users)
    base_spending[segments == 'New'] = np.random.exponential(20, np.sum(segments == 'New'))
    base_spending[segments == 'Returning'] = np.random.exponential(50, np.sum(segments == 'Returning'))
    base_spending[segments == 'VIP'] = np.random.exponential(200, np.sum(segments == 'VIP'))
    
    # Add user-specific noise that's consistent over time
    user_noise = np.random.exponential(base_spending * 0.5)
    
    # Pre-experiment spending (last month)
    pre_spending = base_spending + user_noise
    
    # Simulate a natural purchase pattern where ~70% of users don't purchase in a given month
    purchase_mask = np.random.binomial(1, 0.3, n_users).astype(bool)
    pre_spending = pre_spending * purchase_mask
    
    # Post-experiment spending
    purchase_mask_post = np.random.binomial(1, 0.3 + 0.02 * treatment, n_users).astype(bool)
    post_spending = (
        base_spending * (1 + 0.05 * np.random.rand(n_users)) +  # Some natural growth
        user_noise +  # Consistent user noise
        treatment_effect * treatment * base_spending * 0.01 +  # Treatment effect
        np.random.exponential(5, n_users)  # Random noise specific to this period
    ) * purchase_mask_post
    
    # Create DataFrame
    df = pd.DataFrame({
        'user_id': user_ids,
        'treatment': treatment,
        'segment': segments,
        'pre_spending': pre_spending,
        'post_spending': post_spending
    })
    
    return df

# Simulate e-commerce data with a small treatment effect
ecom_df = simulate_ecommerce_data(n_users=20000, treatment_effect=2)

print("\nE-commerce data summary:")
print(ecom_df.describe())

# Show distribution of spending
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
sns.histplot(ecom_df['post_spending'], bins=50, log_scale=True)
plt.title('Distribution of Post-Experiment Spending')
plt.xlabel('Spending')
plt.ylabel('Frequency')

plt.subplot(1, 2, 2)
sns.boxplot(data=ecom_df, x='segment', y='post_spending', showfliers=False)
plt.title('Post-Experiment Spending by Segment')
plt.xlabel('User Segment')
plt.ylabel('Spending')

plt.tight_layout()
plt.show()

# Run traditional t-test
print("\nTraditional A/B Test Results (E-commerce):")
trad_ttest = run_ttest(ecom_df.rename(columns={'post_spending': 'post_metric'}))

# Apply CUPED
ecom_cuped_df = apply_cuped(
    ecom_df.rename(columns={'pre_spending': 'pre_metric', 'post_spending': 'post_metric'})
)

# Run CUPED t-test
print("\nCUPED-adjusted A/B Test Results (E-commerce):")
ecom_df_copy = ecom_cuped_df.copy()
ecom_df_copy['post_metric'] = ecom_df_copy['post_metric_cuped']
cuped_ttest = run_ttest(ecom_df_copy)

# Apply stratified CUPED
print("\nApplying stratified CUPED by user segment:")
ecom_strat_df = apply_stratified_cuped(
    ecom_df.rename(columns={'pre_spending': 'pre_metric', 'post_spending': 'post_metric'}),
    strata_column='segment'
)

# Run stratified CUPED t-test
print("\nStratified CUPED-adjusted A/B Test Results (E-commerce):")
ecom_strat_copy = ecom_strat_df.copy()
ecom_strat_copy['post_metric'] = ecom_strat_copy['post_metric_stratified_cuped']
strat_cuped_ttest = run_ttest(ecom_strat_copy)

## 8. Common Pitfalls and Solutions

While CUPED is powerful, there are some common pitfalls to be aware of:

### 8.1 Checking Assumptions

```python
def check_cuped_assumptions(df, pre_column='pre_metric', post_column='post_metric'):
    
    """Check assumptions for CUPED application"""

    # 1. Check for correlation between pre and post metrics
    correlation = df[pre_column].corr(df[post_column])
    
    # 2. Check for balance in pre-experiment metric between treatment and control
    control_pre = df[df['treatment'] == 0][pre_column]
    treatment_pre = df[df['treatment'] == 1][pre_column]
    
    t_stat, p_value = stats.ttest_ind(treatment_pre, control_pre, equal_var=False)
    
    # 3. Verify that pre-experiment data is not affected by the treatment
    # (This is logical rather than statistical check in most cases)
    
    # 4. Check linear relationship assumption
    plt.figure(figsize=(10, 6))
    sns.scatterplot(data=df, x=pre_column, y=post_column, alpha=0.3)
    
    # Add a regression line
    x = df[pre_column].values.reshape(-1, 1)
    y = df[post_column].values
    reg = LinearRegression().fit(x, y)
    plt.plot(sorted(df[pre_column]), reg.predict(sorted(df[pre_column]).reshape(-1, 1)), 'r')
    
    plt.title('Checking Linear Relationship Between Pre and Post Metrics')
    plt.xlabel('Pre-Experiment Metric')
    plt.ylabel('Post-Experiment Metric')
    plt.show()
    
    # Print results
    print("CUPED Assumptions Check:")
    print(f"1. Correlation between pre and post metrics: {correlation:.4f}")
    print(f"   (Higher correlation means more effective variance reduction)")
    
    print(f"\n2. Balance in pre-experiment metric:")
    print(f"   Control mean: {control_pre.mean():.4f}")
    print(f"   Treatment mean: {treatment_pre.mean():.4f}")
    print(f"   p-value: {p_value:.4f}")
    
    if p_value < 0.05:
        print("   WARNING: Pre-experiment metrics are not balanced between groups!")
        print("   This could indicate randomization issues or selection bias.")
    else:
        print("   Pre-experiment metrics are balanced between groups.")
    
    print("\n3. Ensure pre-experiment data was not affected by the treatment:")
    print("   This is a logical check - the pre-experiment data should be collected")
    print("   before the treatment was applied and cannot be influenced by it.")
    
    print("\n4. Linear relationship: Check the scatter plot above.")
    print("   The relationship should be approximately linear for optimal CUPED performance.")

# Check assumptions for our e-commerce dataset
check_cuped_assumptions(
    ecom_df.rename(columns={'pre_spending': 'pre_metric', 'post_spending': 'post_metric'})
)
```

### 8.2 Data Leakage

A common issue with CUPED is data leakage - when pre-experiment data is affected by the treatment or affected by the same factors that will affect the treatment:

```python
def demonstrate_data_leakage():
    """Demonstrate the problem of data leakage in CUPED"""

    # Simulate a scenario where the pre-experiment data is affected by 
    # the same factor that will affect the treatment
    n_users = 10000
    
    # User treatment assignment
    treatment = np.random.binomial(1, 0.5, n_users)
    
    # Underlying user characteristic that affects both pre and post metrics
    # AND is correlated with treatment assignment (data leakage)
    user_type = np.random.normal(0.5 * treatment, 1, n_users)
    
    # Pre-experiment metric (affected by user type)
    pre_metric = 10 + 5 * user_type + np.random.normal(0, 5, n_users)
    
    # Post-experiment metric (affected by user type and treatment)
    treatment_effect = 2  # True treatment effect
    post_metric = (
        20 + 
        8 * user_type +  # User type effect
        treatment_effect * treatment +  # Treatment effect
        np.random.normal(0, 5, n_users)  # Random noise
    )
    
    # Create DataFrame
    df_leakage = pd.DataFrame({
        'user_id': np.arange(n_users),
        'treatment': treatment,
        'user_type': user_type,
        'pre_metric': pre_metric,
        'post_metric': post_metric
    })
    
    # Check balance in pre-experiment metric
    control_pre = df_leakage[df_leakage['treatment'] == 0]['pre_metric']
    treatment_pre = df_leakage[df_leakage['treatment'] == 1]['pre_metric']
    
    t_stat, p_value = stats.ttest_ind(treatment_pre, control_pre, equal_var=False)
    
    # Run traditional A/B test
    print("Traditional A/B Test Results (Data Leakage Scenario):")
    trad_ttest = run_ttest(df_leakage)
    
    # Apply CUPED
    leakage_cuped_df = apply_cuped(df_leakage)
    
    # Run CUPED t-test
    print("\nCUPED-adjusted A/B Test Results (Data Leakage Scenario):")
    leakage_df_copy = leakage_cuped_df.copy()
    leakage_df_copy['post_metric'] = leakage_df_copy['post_metric_cuped']
    cuped_ttest = run_ttest(leakage_df_copy)
    
    # Print warning
    print("\nData Leakage Check:")
    print(f"Balance in pre-experiment metric (p-value): {p_value:.4f}")
    
    if p_value < 0.05:
        print("WARNING: Pre-experiment metrics are not balanced between groups!")
        print("This suggests potential data leakage or randomization issues.")
        print("In this case, CUPED may not provide reliable results.")
    
    return df_leakage

# Demonstrate data leakage
leakage_df = demonstrate_data_leakage()
```

### 8.3 Solutions to Common Problems

```python
def demonstrate_solutions():
    """Demonstrate solutions to common CUPED problems"""
    # Create a copy of the leakage dataset
    df = leakage_df.copy()
    
    print("\nSolutions to Common CUPED Problems:")
    
    # 1. Handling data leakage with matched pre-experiment periods
    print("\n1. Using matched pre-experiment periods:")
    print("   Instead of using the most recent pre-experiment data, use data from")
    print("   the same time period in the previous year/quarter/month to avoid seasonal effects.")
    
    # 2. Using a regression model instead of simple CUPED
    print("\n2. Using a regression model approach:")
    
    # Fit a regression model
    X = df[['pre_metric', 'treatment']]
    X = sm.add_constant(X)
    y = df['post_metric']
    model = sm.OLS(y, X).fit()
    
    print(model.summary().tables[1])
    
    print("\n   This approach allows us to estimate the treatment effect while")
    print("   controlling for pre-experiment differences.")
    
    # 3. Using propensity score matching
    print("\n3. Using propensity score matching:")
    print("   If randomization is compromised, propensity score matching can help")
    print("   create balanced treatment and control groups based on covariates.")
    
    # Simulate propensity score matching results
    matched_control_mean = df[df['treatment'] == 1]['pre_metric'].mean() * 0.98
    matched_treatment_mean = df[df['treatment'] == 1]['pre_metric'].mean()
    
    print(f"   Before matching: Control mean = {df[df['treatment'] == 0]['pre_metric'].mean():.4f}, "
          f"Treatment mean = {df[df['treatment'] == 1]['pre_metric'].mean():.4f}")
    print(f"   After matching:  Control mean = {matched_control_mean:.4f}, "
          f"Treatment mean = {matched_treatment_mean:.4f}")

# Demonstrate solutions
demonstrate_solutions()
```

## 9. Advanced CUPED Applications

### 9.1 CUPED for Non-Normal Metrics

For metrics like conversion rates or counts that follow non-normal distributions, we can adapt CUPED:

```python
def apply_cuped_for_binary_metric(df, pre_column='pre_conversion', post_column='post_conversion'):
    """Apply CUPED for binary metrics like conversion rate"""
    # For binary metrics, we can use a similar approach as standard CUPED
    # but we need to be careful about interpretation
    
    # Calculate the CUPED coefficient theta
    cov_matrix = np.cov(df[pre_column], df[post_column])
    covariance = cov_matrix[0, 1]
    variance = cov_matrix[0, 0]
    theta = covariance / variance if variance > 0 else 0
    
    # Calculate the mean of pre-experiment metric
    pre_mean = df[pre_column].mean()
    
    # Apply CUPED formula
    df[f'{post_column}_cuped'] = df[post_column] - theta * (df[pre_column] - pre_mean)
    
    print(f"CUPED coefficient (theta) for binary metric: {theta:.4f}")
    
    return df

# Simulate conversion rate data
def simulate_conversion_data(n_users=20000, treatment_effect=0.02):
    """Simulate conversion rate data"""
    user_ids = np.arange(n_users)
    treatment = np.random.binomial(1, 0.5, n_users)
    
    # User-level propensity to convert (latent variable)
    convert_propensity = np.random.beta(1, 9, n_users)  # Most users have low conversion probability
    
    # Pre-experiment conversion (binary)
    pre_conversion = np.random.binomial(1, convert_propensity, n_users)
    
    # Post-experiment conversion with treatment effect
    post_conversion_prob = np.clip(
        convert_propensity + treatment_effect * treatment,
        0, 1
    )
    post_conversion = np.random.binomial(1, post_conversion_prob, n_users)
    
    # Create DataFrame
    df = pd.DataFrame({
        'user_id': user_ids,
        'treatment': treatment,
        'pre_conversion': pre_conversion,
        'post_conversion': post_conversion
    })
    
    return df

# Simulate conversion data
conv_df = simulate_conversion_data(n_users=20000, treatment_effect=0.02)

# Analyze original data
control_conv = conv_df[conv_df['treatment'] == 0]['post_conversion'].mean()
treatment_conv = conv_df[conv_df['treatment'] == 1]['post_conversion'].mean()

print("\nConversion Rate Analysis:")
print(f"Control conversion rate: {control_conv:.4f}")
print(f"Treatment conversion rate: {treatment_conv:.4f}")
print(f"Absolute lift: {treatment_conv - control_conv:.4f}")
print(f"Relative lift: {(treatment_conv - control_conv) / control_conv * 100:.2f}%")

# Run traditional conversion rate test
from statsmodels.stats.proportion import proportions_ztest

count_treatment = conv_df[conv_df['treatment'] == 1]['post_conversion'].sum()
count_control = conv_df[conv_df['treatment'] == 0]['post_conversion'].sum()
nobs_treatment = len(conv_df[conv_df['treatment'] == 1])
nobs_control = len(conv_df[conv_df['treatment'] == 0])

z_stat, p_value = proportions_ztest(
    [count_treatment, count_control],
    [nobs_treatment, nobs_control]
)

print("\nTraditional Conversion Rate Test:")
print(f"z-statistic: {z_stat:.4f}")
print(f"p-value: {p_value:.4f}")

# Apply CUPED for binary metric
conv_cuped_df = apply_cuped_for_binary_metric(conv_df)

# Run t-test on CUPED-adjusted values
control_cuped = conv_cuped_df[conv_cuped_df['treatment'] == 0]['post_conversion_cuped']
treatment_cuped = conv_cuped_df[conv_cuped_df['treatment'] == 1]['post_conversion_cuped']

t_stat, p_value = stats.ttest_ind(treatment_cuped, control_cuped, equal_var=False)

print("\nCUPED-adjusted Conversion Rate Test:")
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4f}")
```

### 9.2 CUPED for Mobile App Experiments

In mobile app experiments, we often have sparse data due to low engagement. CUPED can help:

```python
def simulate_mobile_app_data(n_users=10000, treatment_effect=0.1):
    """Simulate mobile app engagement data"""

    user_ids = np.arange(n_users)
    treatment = np.random.binomial(1, 0.5, n_users)
    
    # User engagement level (highly skewed)
    engagement_level = np.random.exponential(1, n_users)
    
    # Days of usage in pre-experiment period (30 days)
    pre_days_used = np.random.binomial(30, 0.1 + 0.2 * engagement_level / np.max(engagement_level))
    
    # Session count in pre-experiment period
    pre_sessions = np.round(pre_days_used * (1 + engagement_level))
    
    # Time spent in pre-experiment period (minutes)
    pre_time_spent = pre_sessions * (2 + 8 * engagement_level / np.max(engagement_level))
    
    # Treatment effect on engagement
    effect_multiplier = 1 + treatment_effect * treatment
    
    # Post-experiment metrics
    post_days_used = np.random.binomial(
        30, 
        (0.1 + 0.2 * engagement_level / np.max(engagement_level)) * effect_multiplier
    )
    post_sessions = np.round(post_days_used * (1 + engagement_level) * effect_multiplier)
    post_time_spent = post_sessions * (2 + 8 * engagement_level / np.max(engagement_level))
    
    # Create DataFrame
    df = pd.DataFrame({
        'user_id': user_ids,
        'treatment': treatment,
        'pre_days_used': pre_days_used,
        'pre_sessions': pre_sessions,
        'pre_time_spent': pre_time_spent,
        'post_days_used': post_days_used,
        'post_sessions': post_sessions,
        'post_time_spent': post_time_spent
    })
    
    return df

# Simulate mobile app data
mobile_df = simulate_mobile_app_data(n_users=10000, treatment_effect=0.1)

print("\nMobile App Experiment Data:")
print(mobile_df[['pre_days_used', 'pre_sessions', 'pre_time_spent',
                'post_days_used', 'post_sessions', 'post_time_spent']].describe())

# Visualize the typical long-tail distribution
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
sns.histplot(mobile_df['post_sessions'], bins=50, log_y=True)
plt.title('Distribution of Sessions (Post-Experiment)')
plt.xlabel('Number of Sessions')
plt.ylabel('Log Frequency')

plt.subplot(1, 2, 2)
sns.histplot(mobile_df['post_time_spent'], bins=50, log_y=True)
plt.title('Distribution of Time Spent (Post-Experiment)')
plt.xlabel('Minutes')
plt.ylabel('Log Frequency')

plt.tight_layout()
plt.show()

# Apply CUPED to each metric
metrics = ['days_used', 'sessions', 'time_spent']

for metric in metrics:
    # Traditional test
    print(f"\nTraditional A/B Test for {metric}:")
    temp_df = mobile_df.rename(columns={
        f'pre_{metric}': 'pre_metric',
        f'post_{metric}': 'post_metric'
    })
    run_ttest(temp_df)
    
    # Apply CUPED
    print(f"\nCUPED-adjusted A/B Test for {metric}:")
    cuped_df = apply_cuped(temp_df)
    cuped_df['post_metric'] = cuped_df['post_metric_cuped']
    run_ttest(cuped_df)
```

## 10. Conclusion and CUPED Best Practices

CUPED is a powerful technique for improving the sensitivity of A/B tests by reducing metric variance. Here's a summary of when and how to use it effectively:

```python
def summarize_cuped_benefits(df):
    """Summarize the benefits of CUPED across different metrics"""
    # Define metrics with different variance profiles
    n_users = len(df)
    
    # High-variance metric (similar to revenue)
    df['high_var_pre'] = np.random.exponential(100, n_users)
    df['high_var_post'] = df['high_var_pre'] * 1.1 + df['treatment'] * 10 + np.random.exponential(80, n_users)
    
    # Medium-variance metric (similar to engagement)
    df['med_var_pre'] = np.random.gamma(5, 5, n_users)
    df['med_var_post'] = df['med_var_pre'] * 1.05 + df['treatment'] * 2 + np.random.gamma(3, 5, n_users)
    
    # Low-variance metric (similar to conversion)
    conversion_propensity = np.random.beta(2, 18, n_users)
    df['low_var_pre'] = np.random.binomial(1, conversion_propensity, n_users)
    df['low_var_post'] = np.random.binomial(1, conversion_propensity + 0.01 * df['treatment'], n_users)
    
    results = []
    
    # Test each metric with and without CUPED
    for var_level, prefix in [('High', 'high_var'), ('Medium', 'med_var'), ('Low', 'low_var')]:
        # Traditional test
        temp_df = df.rename(columns={
            f'{prefix}_pre': 'pre_metric',
            f'{prefix}_post': 'post_metric'
        })
        
        # Calculate original variance
        orig_var = temp_df['post_metric'].var()
        
        # Run traditional test
        _, p_value_trad = stats.ttest_ind(
            temp_df[temp_df['treatment'] == 1]['post_metric'],
            temp_df[temp_df['treatment'] == 0]['post_metric'],
            equal_var=False
        )
        
        # Apply CUPED
        cuped_df = apply_cuped(temp_df)
        
        # Calculate CUPED variance
        cuped_var = cuped_df['post_metric_cuped'].var()
        
        # Run CUPED test
        _, p_value_cuped = stats.ttest_ind(
            cuped_df[cuped_df['treatment'] == 1]['post_metric_cuped'],
            cuped_df[cuped_df['treatment'] == 0]['post_metric_cuped'],
            equal_var=False
        )
        
        # Calculate improvement
        var_reduction = (1 - cuped_var / orig_var) * 100
        
        # Calculate sample size reduction
        sample_reduction = (1 - np.sqrt(cuped_var) / np.sqrt(orig_var)) * 100
        
        results.append({
            'Variance Level': var_level,
            'Original p-value': p_value_trad,
            'CUPED p-value': p_value_cuped,
            'Variance Reduction': f"{var_reduction:.1f}%",
            'Sample Size Reduction': f"{sample_reduction:.1f}%"
        })
    
    return pd.DataFrame(results)

# Show benefits summary
benefits_df = summarize_cuped_benefits(df)
print("\nCUPED Benefits Summary:")
benefits_df
```

### 10.1 When to Use CUPED:

1. High-variance metrics: Revenue, spending, engagement, and other metrics with high variance
2. When you have reliable pre-experiment data that correlates with the post-experiment metric
3. When your experiment duration or sample size is limited
4. When you need to detect small but important effects

### 10.2 When Not to Use CUPED:

1. When pre-experiment data is not available or unreliable
2. When there's weak correlation between pre and post metrics
3. When there's potential data leakage or selection bias
4. For new user experiments where pre-experiment data doesn't exist

### 10.3 Implementation Checklist:

1. Verify correlation between pre and post metrics
2. Check for balance in pre-experiment metrics between treatment and control
3. Ensure no data leakage (pre-experiment data shouldn't be affected by the treatment)
4. Consider stratifying by important segments for even better results
5. Use multiple covariates when appropriate
6. Validate CUPED results against traditional methods
7. Document the CUPED approach for transparency

### 10.4 Final Thoughts

CUPED is a powerful technique that can dramatically improve the efficiency of your experimentation program by:
- Reducing false negatives (Type II errors)
- Shortening experiment duration
- Enabling detection of smaller effects

When implemented correctly, CUPED allows for faster and more sensitive experimentation without increasing false positives, ultimately leading to better product decisions and accelerated innovation.

print("\nCUPED enables faster, more sensitive experiments without compromising statistical validity.")
print("By leveraging pre-experiment data, you can often reduce variance by 20-50%,")
print("which translates to 10-30% shorter experiment durations or smaller sample sizes.")
print("\nImplement CUPED wisely, and happy experimenting!")