# CUPED - Reducing Variance in A/B Testing


## Introduction to CUPED
- CUPED (Controlled-experiment Using Pre-Experiment Data) is a variance reduction technique that improves the efficiency of experiments.
- It leverages pre-experiment data (historical performance of users) to reduce noise in experimental results.
- CUPED does not change randomization but adjusts for variance, making experiments more powerful without increasing sample size.
- CUPED is widely used in A/B testing platforms (e.g., Microsoft, Airbnb, LinkedIn, and Google) to increase experiment efficiency without needing larger sample sizes.

## Why is Variance Reduction Important in Experimentation?
- Improves statistical power, detecting true effects with smaller sample sizes.
- Reduces noise, leading to more precise results.
- Speeds up experimentation, making insights actionable faster.

## Math Behind CUPED
- CUPED adjusts the observed outcome Y using pre-experiment covariates X to reduce variance.
- CUPED-adjusted metric:
$$Y^* = Y - \theta (X - \bar{X})$$
where:
    - Y* = Variance-reduced metric
    - Y = Observed post-experiment outcome
    - X = Pre-experiment metric
    - θ = Covariance adjustment coefficient:
$$
\theta = \frac{\text{Var}(X)}{\text{Cov}(Y, X)}
$$

- The goal is to subtract the predictable portion of Y using pre-experiment data  X, making the remaining variance lower.


## Implementing CUPED in Python

### Step 1: Simulate A/B Experiment Data

In [8]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(42)
n = 1000  # Sample size

# Pre-experiment metric (e.g., past revenue)
X = np.random.normal(loc=50, scale=10, size=n)

# Treatment effect (+2 increase in revenue for treatment group)
treatment = np.random.choice([0, 1], size=n)  # 0 = Control, 1 = Treatment
Y = X + (2 * treatment) + np.random.normal(0, 5, size=n)  # Post-experiment revenue

df = pd.DataFrame({'Pre_Exp_Revenue': X, 'Post_Exp_Revenue': Y, 'Treatment': treatment})
df.head()


Unnamed: 0,Pre_Exp_Revenue,Post_Exp_Revenue,Treatment
0,54.967142,53.140534,0
1,48.617357,51.540759,1
2,56.476885,49.741254,0
3,65.230299,62.372228,1
4,47.658466,53.660536,0


### Step 2: Compute CUPED Adjustment

In [9]:
# Compute theta
theta = np.cov(df['Post_Exp_Revenue'], df['Pre_Exp_Revenue'])[0, 1] / np.var(df['Pre_Exp_Revenue'])

# Apply CUPED adjustment
df['CUPED_Adjusted_Revenue'] = df['Post_Exp_Revenue'] - theta * (df['Pre_Exp_Revenue'] - df['Pre_Exp_Revenue'].mean())

# Compare variance reduction
original_var = np.var(df['Post_Exp_Revenue'])
adjusted_var = np.var(df['CUPED_Adjusted_Revenue'])

print(f"Original Variance: {original_var:.2f}")
print(f"CUPED Adjusted Variance: {adjusted_var:.2f} ({(1 - adjusted_var/original_var) * 100:.2f}% reduction)")


Original Variance: 118.91
CUPED Adjusted Variance: 25.30 (78.72% reduction)


### Step 3: Compare Standard A/B Test vs CUPED-Adjusted A/B Test

In [10]:
# Standard A/B test
control = df[df['Treatment'] == 0]['Post_Exp_Revenue']
treatment = df[df['Treatment'] == 1]['Post_Exp_Revenue']
t_stat, p_val = stats.ttest_ind(treatment, control)

# CUPED-adjusted A/B test
control_cuped = df[df['Treatment'] == 0]['CUPED_Adjusted_Revenue']
treatment_cuped = df[df['Treatment'] == 1]['CUPED_Adjusted_Revenue']
t_stat_cuped, p_val_cuped = stats.ttest_ind(treatment_cuped, control_cuped)

print(f"Standard A/B Test p-value: {p_val:.5f}")
print(f"CUPED-Adjusted A/B Test p-value: {p_val_cuped:.5f}")


Standard A/B Test p-value: 0.22720
CUPED-Adjusted A/B Test p-value: 0.00000


#### Observation:
- CUPED reduces variance, making the test more statistically significant.
- The adjusted p-value is lower, meaning we detect treatment effects more efficiently.

### When to Use CUPED? 
- When pre-experiment data is highly correlated with post-experiment outcomes.
- When sample size is limited, and variance needs to be reduced.
- When running online experiments where users have historical engagement data.

### When NOT to use CUPED?
- If pre-experiment data is weakly correlated with the outcome.
- If randomization is not properly balanced.
- If treatment effects change user behavior significantly, making historical data irrelevant.