This notebook exemplifies how the library supports absolute and relative effects in MDE analysis and experiment analysis in OLS.

The steps include:
1. Create dataframe at customer level
2. Run MDE analysis with absolute and relative effects, comparing the two of them
3. Run experiment analysis with absolute and relative effects, comparing the two of them

Dataframe creation

In [1]:
import numpy as np
import pandas as pd

from copy import deepcopy

from cluster_experiments import AnalysisPlan, NormalPowerAnalysis

def get_user_df(n_users=10_000):
    df = pd.DataFrame(
        {
            "customer_id": np.arange(n_users),
            "orders_pre": np.random.poisson(10, n_users),
            "_treatment": np.random.rand(n_users) > 0.5,
            "X1": np.random.poisson(1, n_users),
            "X2": np.random.poisson(2, n_users),
        }
    )
    df = df.assign(**{"_treatment": df["_treatment"].astype(int), "orders": lambda x: x["orders_pre"] + 2 * x["X1"] + x["X2"] + 0.1 * x["_treatment"]})
    df = df.assign(
        **{
            "center_X1": lambda x: x["X1"] - x["X1"].mean(),
            "center_X2": lambda x: x["X2"] - x["X2"].mean(),
        }
    )
    df["_treatment"] = df["_treatment"].map({0: "A", 1: "B"})
    return df


user_df = get_user_df()

### MDE analysis with absolute and relative effects

First we create NormalPowerAnalysis objects for absolute and relative effects

In [2]:
config_relative = {
    "analysis": "ols",
    "perturbator": "constant",
    "splitter": "non_clustered",
    "relative_effect": True,
    "target_col": "orders",
    "covariates": ["X1"]
}

config_vainilla = deepcopy(config_relative)
config_vainilla["relative_effect"] = False

pw_relative = NormalPowerAnalysis.from_dict(config_relative)
pw_vainilla = NormalPowerAnalysis.from_dict(config_vainilla)



Calculate relative MDE

In [3]:
pw_relative.mde(
    user_df,
    n_simulations=1
)

0.013677535401090029

Calculate absolute MDE, shows different output

In [4]:
pw_vainilla.mde(
    user_df,
    n_simulations=1
)

0.19339686081968005

When dividing by baseline to get relative MDE, it is slightly lower, because this would ignore the variance in the baseline.

In [5]:
float(
    pw_vainilla.mde(
        user_df,
        n_simulations=1
    ) / user_df["orders"].mean()
)

0.013747224124560945

### Experiment analysis with absolute and relative effects

First we create AnalysisPlan objects for absolute and relative effects

In [6]:
relative_plan_config = {
        "metrics": [
            {"alias": "Orders", "name": "orders"},
        ],
        "variants": [
            {"name": "A", "is_control": True},
            {"name": "B", "is_control": False},
        ],
        "analysis_type": "ols",
        "variant_col": "_treatment",
        "analysis_config": {"relative_effect": True, "covariates": ["X1"]},
    }
vainilla_plan_config = deepcopy(relative_plan_config)
vainilla_plan_config["analysis_config"] = {"covariates": ["X1"]}

relative_plan = AnalysisPlan.from_metrics_dict(relative_plan_config)
vainilla_plan = AnalysisPlan.from_metrics_dict(vainilla_plan_config)

Now we run the analysis for both plans

In [7]:
results_rel = relative_plan.analyze(user_df)
results_vainilla = vainilla_plan.analyze(user_df)


Results are obviously different, as one is absolute and the other relative.

In [8]:
results_rel.to_dataframe()

Unnamed: 0,metric_alias,control_variant_name,treatment_variant_name,control_variant_mean,treatment_variant_mean,analysis_type,ate,ate_ci_lower,ate_ci_upper,p_value,std_error,dimension_name,dimension_value,alpha
0,Orders,A,B,13.970359,14.177092,ols,0.012935,0.003189,0.022681,0.009289,0.004973,__total_dimension,total,0.05


In [9]:
results_vainilla.to_dataframe()

Unnamed: 0,metric_alias,control_variant_name,treatment_variant_name,control_variant_mean,treatment_variant_mean,analysis_type,ate,ate_ci_lower,ate_ci_upper,p_value,std_error,dimension_name,dimension_value,alpha
0,Orders,A,B,13.970359,14.177092,ols,0.180705,0.045427,0.315983,0.008841,0.069021,__total_dimension,total,0.05


When dividing by baseline to get relative effect and confidence intervals, the variance in the baseline is ignored, leading to slightly narrower intervals.

In [10]:
control_mean = user_df.query("_treatment == 'A'").orders.mean()
results_df = results_vainilla.to_dataframe()
results_df[["ate", "ate_ci_lower", "ate_ci_upper", "std_error"]] /= control_mean
results_df

Unnamed: 0,metric_alias,control_variant_name,treatment_variant_name,control_variant_mean,treatment_variant_mean,analysis_type,ate,ate_ci_lower,ate_ci_upper,p_value,std_error,dimension_name,dimension_value,alpha
0,Orders,A,B,13.970359,14.177092,ols,0.012935,0.003252,0.022618,0.008841,0.004941,__total_dimension,total,0.05
