# A/B Test (Simulated) for Segment-Aware Experimentation

This notebook demonstrates how to design and analyze an A/B test using a customer segmentation context.

Because the dataset is offline and does not contain real experiment logs, the A/B test here is **simulated** to illustrate:
- how to define a measurable outcome,
- how to assign treatment/control fairly (randomized),
- how to compare groups statistically,
- and how segmentation can support targeted decision-making.

The goal is methodological: showcasing experimental thinking rather than claiming real business impact.

In [5]:
import pandas as pd
df = pd.read_csv("data/customers_with_clusters.csv")

In [6]:
target_cluster = 1 
df_t = df[df["Cluster"] == target_cluster].copy()

In [10]:
import numpy as np

np.random.seed(42)

df_t["group"] = np.random.choice(["control", "treatment"], size=len(df_t))

# Simulated outcome: baseline spending + treatment effect + noise
baseline = df_t["Spending Score (1-100)"].values
noise = np.random.normal(loc=0, scale=5, size=len(df_t))

treatment_effect = 3
df_t["outcome"] = baseline + noise + np.where(df_t["group"]=="treatment", treatment_effect, 0)

In [9]:
df_t.groupby("group")["outcome"].agg(["count","mean","std"])

Unnamed: 0_level_0,count,mean,std
group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
control,20,80.055095,11.030143
treatment,20,83.737255,11.09962


In [11]:
def bootstrap_ci(a, b, n_boot=5000, alpha=0.05):
    a, b = np.array(a), np.array(b)
    diffs = []
    for _ in range(n_boot):
        diffs.append(np.mean(np.random.choice(a, size=len(a), replace=True)) -
                     np.mean(np.random.choice(b, size=len(b), replace=True)))
    lo, hi = np.quantile(diffs, [alpha/2, 1-alpha/2])
    return lo, hi

ci_lo, ci_hi = bootstrap_ci(treat, control)
print(f"95% CI for lift: [{ci_lo:.2f}, {ci_hi:.2f}]")

95% CI for lift: [-2.90, 10.33]


The estimated lift is positive, but the confidence interval is wide and crosses zero, suggesting insufficient statistical power rather than evidence of no effect.

In [8]:
from scipy import stats

control = df_t[df_t["group"]=="control"]["outcome"]
treat = df_t[df_t["group"]=="treatment"]["outcome"]

t_stat, p_value = stats.ttest_ind(treat, control, equal_var=False)

lift = treat.mean() - control.mean()
lift_pct = lift / control.mean()

print(f"Control mean: {control.mean():.2f}")
print(f"Treatment mean: {treat.mean():.2f}")
print(f"Absolute lift: {lift:.2f}")
print(f"Relative lift: {lift_pct:.2%}")
print(f"p-value: {p_value:.4f}")


Control mean: 80.06
Treatment mean: 83.74
Absolute lift: 3.68
Relative lift: 4.60%
p-value: 0.2993


### Interpretation
In this simulated example, the treatment group shows a higher average outcome than control, but the difference is not statistically significant at conventional levels.

Given the small sample size within a single segment and the added noise, this result highlights the importance of statistical power and uncertainty quantification (e.g., confidence intervals) in experiment-driven decisions.

In a real product setting, segmentation can be used to prioritize high-value groups and to run stratified experiments to reduce variance and improve decision confidence.


### Conclusion
This simulated A/B test demonstrates how customer segmentation can be incorporated into experimental design and analysis.

While the estimated treatment effect is positive, the result is not statistically significant, and the confidence interval is wide due to the small sample size within the segment.

In practice, this highlights the importance of adequate sample size, longer experiment duration, or stratified designs when running segment-aware experiments.
