# CUPED and CUPAC Tutorial

This tutorial demonstrates variance reduction techniques for A/B testing using covariate adjustment methods available in HypEx.

**CUPED** (Controlled Experiments Using Pre-Experiment Data) uses historical features to reduce variance in your target metrics through linear regression adjustment.

**CUPAC** (Covariate-Updated Pre-Analysis Correction) extends CUPED by using multiple pre-experiment covariates to predict pre-experiment target values, then subtracting these predictions from current experiment targets. This approach supports different regression models (linear, ridge, lasso, catboost) and avoids data leakage by never using experiment data to predict experiment outcomes.

Both methods help you:
- Detect smaller effects with the same sample size
- Reduce sample size needed to detect the same effect
- Increase statistical power of your experiments

## Table of Contents
<ul>
  <li><a href="#data-preparation">Data Preparation</a></li>
  <li><a href="#baseline-ab-test">Baseline AB Test</a></li>
  <li><a href="#cuped-implementation">CUPED Implementation</a></li>
  <li><a href="#cupac-implementation">CUPAC Implementation</a></li>
  <li><a href="#best-practices">Best Practices</a></li>
</ul>

## Data Preparation

For CUPAC to work correctly with the new features_mapping format, we need:
1. **Target metrics**: The metrics you want to analyze (e.g., spends, revenue)
2. **Historical target features**: Lagged versions of your targets from different time periods
3. **Pre-experiment covariates**: Features measured before the experiment that correlate with outcomes

The new CUPAC implementation supports **multilevel models** - it can automatically create models for each available time period transition. For this tutorial, we'll use **2 time periods**:
- Period 2 → Period 1: `y0_lag_2 ~ X1_lag2 + X2_lag2`  
- Period 1 → Current: `y0_lag_1 ~ X1_lag1 + X2_lag1`

Each period uses its own set of covariates, making the temporal structure clearer.

Let's generate synthetic data using the built-in DataGenerator:

In [1]:
from hypex import ABTest
from hypex.dataset import (
    Dataset,
    FeatureRole,
    InfoRole,
    PreTargetRole,
    TargetRole,
    TreatmentRole,
)
from hypex.utils.tutorial_data_creation import DataGenerator

In [2]:
# Generate synthetic data with 2 historical periods using built-in DataGenerator
gen = DataGenerator(
    n_samples=1_000,
    distributions={
        "X1": {"type": "normal", "mean": 1, "std": 1},
        "X2": {"type": "bernoulli", "p": 0.5},
        "y0": {"type": "normal", "mean": 1, "std": 5},
    },
    time_correlations={"X1": 0.2, "X2": 0.1, "y0": 0.8},
    effect_size=0.1,
    seed=42
)

df = gen.generate()
# Keep only the columns we need for 2-period CUPAC
df = df.drop(columns=['y0', 'z', 'U', 'D', 'y1'])
df = df.rename(columns={'y0_lag_1': 'y_lag1', 'y0_lag_2': 'y_lag2'})

In [3]:
data = Dataset(
    roles = {
    "d": TreatmentRole(),
    "y": TargetRole(cofounders=["X1", "X2"]),

    "y_lag1": PreTargetRole(parent="y", lag=1),
    "X1_lag1": FeatureRole(parent="X1", lag=1),
    "X2_lag1": FeatureRole(parent="X2", lag=1),

    "y_lag2": PreTargetRole(parent="y", lag=2),
    "X1_lag2": FeatureRole(parent="X1", lag=2),
    "X2_lag2": FeatureRole(parent="X2", lag=2),
    },
    data=df,
    default_role=InfoRole(),
)

## Baseline AB Test

First, let's run a standard AB test without any variance reduction to establish our baseline:

In [4]:
# Standard AB test without covariate adjustment
test_baseline = ABTest()
result_baseline = test_baseline.execute(data)

result_baseline.resume

Unnamed: 0,feature,group,control mean,test mean,difference,difference %,TTest pass,TTest p-value
0,y,1,0.948518,1.590543,0.642025,67.687127,NOT OK,0.064382


In [5]:
result_baseline.sizes

Unnamed: 0,control size,test size,control size %,test size %,group
1,653,347,65.3,34.7,1


## CUPED Implementation

CUPED uses a single historical feature to adjust the target variable. In HypEx, specify the `cuped_features` parameter:

**Note**: For this dataset, we'll use the period 1 lagged features for CUPED since it's the closest to the current target.

In [6]:
# CUPED with single covariate (using closest lagged feature)
test_cuped = ABTest(cuped_features={'y': 'y_lag1'})
result_cuped = test_cuped.execute(data)

result_cuped.resume

Unnamed: 0,feature,group,control mean,test mean,difference,difference %,TTest pass,TTest p-value
0,y,1,0.948518,1.590543,0.642025,67.687127,NOT OK,0.064382
1,y_cuped,1,0.981674,1.52815,0.546476,55.667761,OK,0.009859


In [7]:
# Check variance reduction achieved by CUPED
result_cuped.variance_reduction_report

Unnamed: 0,Transformed Metric Name,Variance Reduction (%)
0,y_cuped,62.728655


## CUPAC Implementation

The new CUPAC implementation uses `features_mapping` format and automatically creates multilevel models. The `features_mapping` is already configured in our Dataset above.

Key advantages of the new multilevel approach:
- **Sequential modeling**: Each time period predicts the next period
- **Better temporal relationships**: Captures changing correlations over time  
- **Multiple targets**: Different targets can have different numbers of periods
- **Automatic model selection**: Chooses best performing models via cross-validation

**Example with 3 periods**: For more complex scenarios, you can use 3 or more periods:
- Period 3 → Period 2: `target_lag_3 ~ covariates_lag3`
- Period 2 → Period 1: `target_lag_2 ~ covariates_lag2`  
- Period 1 → Current: `target_lag_1 ~ covariates_lag1`

In [8]:
# Multilevel CUPAC with linear regression
test_cupac_linear = ABTest(
    enable_cupac=True,
    cupac_models='ridge'
)
result_cupac_linear = test_cupac_linear.execute(data)

result_cupac_linear.resume

Unnamed: 0,feature,group,control mean,test mean,difference,difference %,TTest pass,TTest p-value
0,y,1,0.948518,1.590543,0.642025,67.687127,NOT OK,0.064382
1,y_cupac,1,1.007879,1.478836,0.470957,46.727496,OK,0.026723


In [9]:
# Multilevel CUPAC with automatic model selection
test_cupac_auto = ABTest(
    enable_cupac=True,
    cupac_models=['linear', 'ridge', 'lasso', 'catboost']  # Will select best performing model for each transition
)
result_cupac_auto = test_cupac_auto.execute(data)

result_cupac_auto.resume

Unnamed: 0,feature,group,control mean,test mean,difference,difference %,TTest pass,TTest p-value
0,y,1,0.948518,1.590543,0.642025,67.687127,NOT OK,0.064382
1,y_cupac,1,1.007892,1.478812,0.47092,46.72325,OK,0.026736


In [10]:
# Check variance reduction for CUPAC methods
result_cupac_auto.cupac.variance_reductions

Unnamed: 0,target,best_model,variance_reduction_cv,variance_reduction_real,control_mean_bias,test_mean_bias
0,y,linear,66.035506,62.472566,-0.059373,0.111732


### Feature Importances

CUPAC models learn which historical covariates best predict target values. Feature importances help you understand:
- **Which features matter most** for variance reduction
- **Linear models** (linear, ridge, lasso): Show regression coefficients - positive values mean the feature increases with the target
- **CatBoost**: Shows feature importance scores - higher values indicate stronger predictive power

**Important:** Feature importances are computed as **averages across cross-validation folds**, providing:
- More stable and reliable estimates than single-model fits
- Better generalization to unseen data
- Computational efficiency (no extra model training needed)

The importances are shown per target and include both the lagged target features and covariate features used in the model.

In [11]:
# Check feature importances - which covariates contributed most to variance reduction
result_cupac_auto.cupac.feature_importances

Unnamed: 0,target,feature,importance,model
0,y,X2_lag2,0.113172,linear
1,y,X1_lag2,0.139829,linear
2,y,y_lag2,0.829126,linear


### Virtual Target

Virtual targets allow you to test CUPAC on scenarios where the current period target doesn't exist yet (e.g., forecasting future outcomes). In this case:
- No current target column exists (only historical lags)
- CUPAC trains models on historical transitions
- Only CV variance reduction is available (no real variance reduction)
- Feature importances still show which historical features are most predictive

In [12]:
gen = DataGenerator(
    n_samples=2000,
    distributions={
        "X1": {"type": "normal", "mean": 0, "std": 1},
        "X2": {"type": "bernoulli", "p": 0.5},
        "y0": {"type": "normal", "mean": 5, "std": 1},
    },
    time_correlations={"X1": 0.2, "X2": 0.1, "y0": 0.6},
    effect_size=2.0,
    seed=42
)

df = gen.generate()
# Keep only the columns we need for 2-period CUPAC
df = df.drop(columns=['y0', 'z', 'U', 'D', 'y1', 'y'])
df = df.rename(columns={'y0_lag_1': 'y_lag1', 'y0_lag_2': 'y_lag2'})

In [13]:
data = Dataset(
    roles = {
    "d": TreatmentRole(),

    "y_lag1": PreTargetRole(parent="y", cofounders=["X1", "X2"], lag=1),
    "X1_lag1": FeatureRole(parent="X1", lag=1),
    "X2_lag1": FeatureRole(parent="X2", lag=1),

    "y_lag2": PreTargetRole(parent="y", lag=2),
    "X1_lag2": FeatureRole(parent="X1", lag=2),
    "X2_lag2": FeatureRole(parent="X2", lag=2),
    },
    data=df,
    default_role=InfoRole(),
)

In [14]:
test_cupac_linear = ABTest(
    enable_cupac=True,
    cupac_models='linear'
)
result_cupac_linear = test_cupac_linear.execute(data)

result_cupac_linear.cupac.variance_reductions

Unnamed: 0,target,best_model,variance_reduction_cv,variance_reduction_real,control_mean_bias,test_mean_bias
0,y,linear,35.445326,,,


In [15]:
# Feature importances for virtual target
result_cupac_linear.cupac.feature_importances

Unnamed: 0,target,feature,importance,model
0,y,X2_lag2,0.030907,linear
1,y,X1_lag2,-0.008524,linear
2,y,y_lag2,0.603387,linear


## Best Practices

When using CUPAC for variance reduction in your experiments:

1. **CUPAC Results Access**: All CUPAC-specific outputs are organized under `result.cupac`:
   - `result.cupac.feature_importances` - Feature importance scores
   - `result.cupac.variance_reductions` - Variance reduction metrics
   
2. **Feature Importances**: Use `result.cupac.feature_importances` to understand which historical covariates drive variance reduction
   - High importance features are the most valuable for reducing variance
   - Can guide feature selection for future experiments
   
3. **Model Selection**: 
   - Start with `'linear'` for interpretability and speed
   - Use multiple models `['linear', 'ridge', 'lasso', 'catboost']` when you have complex, non-linear relationships
   - Check `result.cupac.variance_reductions` to see which model was selected
   
4. **Temporal Structure**:
   - Include multiple lags when available (lag 2 → lag 1 → current)
   - Each lag period can use different sets of covariates
   - Virtual targets work for forecasting scenarios
   
5. **Cofounder Selection**:
   - Include features that correlate with your target
   - Use historical versions of the same features (lagged covariates)
   - Feature importances help identify which cofounders matter most
   
6. **Variance Reduction**:
   - CV variance reduction: How well the model generalizes
   - Real variance reduction: Actual improvement on experiment data
   - Target for >40% variance reduction for meaningful power gains