# Cookie Cats A/B Test — Should we move the first gate from level 30 → 40?

**Goal:** Decide if moving the first gate later (from level 30 to 40) improves long-term engagement.

- **Group A** = `gate_30` (control)  
- **Group B** = `gate_40` (treatment)  
- **Primary metric:** 7-day retention (`retention_7`)  
- **Guardrail metric:** 1-day retention (`retention_1`)  
- **Test:** two-proportion z-test, 95% CIs  


In [3]:
import sys, os
sys.path.append(os.path.abspath(".."))


In [4]:
import pandas as pd
from src.metrics import sanity_checks, ab_summary, mde_or_sample
from src.viz import bar_with_ci

pd.set_option("display.float_format", lambda v: f"{v:.4f}")


In [5]:
DATA_PATH = "../data/cookie_cats.csv"  # notebook is inside notebooks/, so .. goes to project root

assert os.path.exists(DATA_PATH), "cookie_cats.csv not found in /data"

df_raw = pd.read_csv(DATA_PATH)
df, split = sanity_checks(df_raw)  # lowercases, maps gate_30→A, gate_40→B

print("Rows:", len(df))
print("Group split (proportion):")
print(split.round(3))

df.head()


Rows: 90189
Group split (proportion):
group
B   0.5040
A   0.4960
Name: proportion, dtype: float64


Unnamed: 0,userid,version,sum_gamerounds,retention_1,retention_7,group
0,116,gate_30,3,False,False,A
1,337,gate_30,38,True,False,A
2,377,gate_40,165,True,False,B
3,483,gate_40,1,False,False,B
4,488,gate_40,179,True,True,B


In [6]:
res_d1 = ab_summary(df, "retention_1")
res_d7 = ab_summary(df, "retention_7")

print("Day-1 retention:", res_d1)
print("Day-7 retention:", res_d7)  # primary metric


Day-1 retention: {'A_success': 20034, 'A_total': 44700, 'A_rate': 0.4481879194630872, 'A_CI': (0.44358236514774474, 0.45280237833639625), 'B_success': 20119, 'B_total': 45489, 'B_rate': 0.44228274967574577, 'B_CI': (0.43772374727049607, 0.44685149948007), 'abs_lift_pp': -0.5905169787341458, 'rel_lift_%': -1.317565585974659, 'z': -1.7840862247974725, 'p': 0.07440965529691913}
Day-7 retention: {'A_success': 8502, 'A_total': 44700, 'A_rate': 0.19020134228187918, 'A_CI': (0.18658979684366292, 0.19386613051747012), 'B_success': 8279, 'B_total': 45489, 'B_rate': 0.18200004396667327, 'B_CI': (0.17848120097823003, 0.18557259139286594), 'abs_lift_pp': -0.8201298315205913, 'rel_lift_%': -4.311903489646016, 'z': -3.164358912748191, 'p': 0.0015542499756143289}


In [7]:
def summarize_row(name, res):
    return {
        "Metric": name,
        "A rate": f"{res['A_rate']*100:.2f}%",
        "B rate": f"{res['B_rate']*100:.2f}%",
        "Abs lift (pp)": f"{res['abs_lift_pp']:.2f}",
        "Rel lift (%)": f"{res['rel_lift_%']:.2f}",
        "p-value": f"{res['p']:.4f}",
        "A 95% CI": f"[{res['A_CI'][0]*100:.2f}, {res['A_CI'][1]*100:.2f}]",
        "B 95% CI": f"[{res['B_CI'][0]*100:.2f}, {res['B_CI'][1]*100:.2f}]"
    }

summary_df = pd.DataFrame([
    summarize_row("Day-1 retention (guardrail)", res_d1),
    summarize_row("Day-7 retention (primary)", res_d7),
])
summary_df


Unnamed: 0,Metric,A rate,B rate,Abs lift (pp),Rel lift (%),p-value,A 95% CI,B 95% CI
0,Day-1 retention (guardrail),44.82%,44.23%,-0.59,-1.32,0.0744,"[44.36, 45.28]","[43.77, 44.69]"
1,Day-7 retention (primary),19.02%,18.20%,-0.82,-4.31,0.0016,"[18.66, 19.39]","[17.85, 18.56]"


In [8]:
os.makedirs("../results/figures", exist_ok=True)

bar_with_ci(
    groups=["A","B"],
    rates=[res_d1["A_rate"], res_d1["B_rate"]],
    cis=[res_d1["A_CI"], res_d1["B_CI"]],
    title="Day-1 Retention by Group",
    ylabel="Retention rate",
    outpath="../results/figures/day1_retention.png"
)

bar_with_ci(
    groups=["A","B"],
    rates=[res_d7["A_rate"], res_d7["B_rate"]],
    cis=[res_d7["A_CI"], res_d7["B_CI"]],
    title="Day-7 Retention by Group (Primary)",
    ylabel="Retention rate",
    outpath="../results/figures/day7_retention.png"
)

"Saved to results/figures/"


'Saved to results/figures/'

In [9]:
# With current sample size and baseline, what absolute MDE could we detect (approx)?
p0 = res_d7["A_rate"]
n_per_group = int(df.groupby("group").size().mean())
mde_est = mde_or_sample(p_baseline=p0, n=n_per_group)  # returns ~minimum detectable effect
mde_est


{'n_per_group': 45094, 'mde_abs': 0.0073224222652388034}

## Recommendation

- **Primary (Day-7 retention):** B vs A = **-0.82 pp** (relative -4.31%, p = 0.0016).  
  - Group A (gate_30): 19.02% [18.66, 19.39]  
  - Group B (gate_40): 18.20% [17.85, 18.56]  
  - Interpretation: Statistically significant drop in 7-day retention when moving the gate later.

- **Guardrail (Day-1 retention):** B vs A = **-0.59 pp** (relative -1.32%, p = 0.0744).  
  - Group A: 44.82% [44.36, 45.28]  
  - Group B: 44.23% [43.77, 44.69]  
  - Interpretation: Slight, non-significant decline in 1-day retention.

**Decision:** ❌ **Do not ship moving the gate to level 40.**  

**Reasoning:** Although Day-1 retention (onboarding) isn’t meaningfully harmed, the primary metric (Day-7 retention) shows a statistically significant decrease. Shipping this change risks reducing long-term engagement, which outweighs any potential short-term benefit.
