# Synthetic A/B Experiment Data Generation

## Purpose
This notebook generates a synthetic but realistic A/B experiment dataset used to demonstrate:
- Experiment readouts
- Minimum Detectable Effect (MDE) & power analysis
- CUPED variance reduction

## Experimental Design
- Experimental unit: individual user
- Randomized 50/50 treatment assignment
- Small positive treatment effect on the primary outcome
- Pre-period metric correlated with outcome (for CUPED)
- Guardrail metric unaffected by treatment

## Output
The generated dataset is saved to:
`data/ab_synthetic.csv`


In [2]:
import numpy as np
import pandas as pd

np.random.seed(42)

In [3]:
N = 10_000
treatment_effect = 0.05

user_id = np.arange(1, N + 1)
treatment = np.random.binomial(1, 0.5, size=N)


In [4]:
pre_metric = np.random.normal(loc=100, scale=20, size=N)

noise = np.random.normal(0, 10, size=N)
outcome = (
    200
    + treatment * treatment_effect * 200
    + 0.6 * pre_metric
    + noise
)

guardrail = np.random.normal(loc=50, scale=5, size=N)


In [5]:
df = pd.DataFrame({
    "user_id": user_id,
    "treatment": treatment,
    "pre_metric": pre_metric,
    "outcome": outcome,
    "guardrail": guardrail
})

df.head()


Unnamed: 0,user_id,treatment,pre_metric,outcome,guardrail
0,1,0,70.242766,235.984178,53.91841
1,2,1,77.496278,252.57102,50.037693
2,3,1,107.776378,264.347561,53.274066
3,4,1,76.522533,247.020998,44.928415
4,5,0,122.252686,277.695381,50.716931


In [6]:
import os
os.makedirs("data", exist_ok=True)

df.to_csv("data/ab_synthetic.csv", index=False)
print("Saved: data/ab_synthetic.csv")


Saved: data/ab_synthetic.csv
