# AB Testing at WQU

This notebook demonstrates a starter workflow for AB testing at WQU using MongoDB data. It includes:

- Aggregation of applicants by nationality and sign-up
- Chi-square tests and odds ratios
- Experiment assignment simulation
- Basic plots using Plotly

---



In [13]:
# ------------------------------
# Imports
# ------------------------------
import pandas as pd
import numpy as np
import math
import random
import scipy.stats
import matplotlib.pyplot as plt
import plotly.express as px

from statsmodels.stats.contingency_tables import Table2x2
from pymongo import MongoClient
from teaching_tools.ab_test.experiment import Experiment
from country_converter import CountryConverter


In [14]:
# ------------------------------
# Generate synthetic CSV (downloadable)
# ------------------------------
np.random.seed(42)
n_samples = 500
df = pd.DataFrame({
    "countryISO2": np.random.choice(["US", "GB", "IN", "KE", "FR"], size=n_samples),
    "admissionsQuiz": np.random.choice(["complete", "incomplete"], size=n_samples),
    "createdAt": pd.date_range(start="2025-01-01", periods=n_samples, freq="H"),
    "group": np.nan
})

csv_path = "ab_testing_wqu.csv"
df.to_csv(csv_path, index=False)
print(f"✅ Synthetic CSV saved: {csv_path}")


✅ Synthetic CSV saved: ab_testing_wqu.csv


  "createdAt": pd.date_range(start="2025-01-01", periods=n_samples, freq="H"),


In [15]:
# ------------------------------
# Load CSV into DataFrame
# ------------------------------
df = pd.read_csv("ab_testing_wqu.csv", parse_dates=["createdAt"])
df.head()


Unnamed: 0,countryISO2,admissionsQuiz,createdAt,group
0,KE,incomplete,2025-01-01 00:00:00,
1,FR,incomplete,2025-01-01 01:00:00,
2,IN,incomplete,2025-01-01 02:00:00,
3,FR,complete,2025-01-01 03:00:00,
4,FR,incomplete,2025-01-01 04:00:00,


In [16]:
# ------------------------------
# Convert country codes
# ------------------------------
cc = CountryConverter()
df["country_name"] = cc.convert(df["countryISO2"], to="name_short")
df["country_iso3"] = cc.convert(df["countryISO2"], to="ISO3")


In [17]:
# ------------------------------
# Aggregate by nationality
# ------------------------------
df_nat = df.groupby("country_iso3").size().reset_index(name="count")
df_nat["count_pct"] = (df_nat["count"] / df_nat["count"].sum()) * 100

# Plot choropleth
fig = px.choropleth(
    data_frame=df_nat,
    locations="country_iso3",
    color="count_pct",
    projection="natural earth",
    color_continuous_scale=px.colors.sequential.Oranges,
    title="Applicants by Nationality"
)
fig.show()


In [None]:
# ==============================
# Populate `group` column
# ==============================
import numpy as np

# Randomly assign groups for demonstration
np.random.seed(42)
df["group"] = np.random.choice(["control", "treatment"], size=len(df))

print(df["group"].value_counts())

from statsmodels.stats.contingency_tables import Table2x2
import pandas as pd

data = pd.crosstab(df["group"], df["admissionsQuiz"])
print(data)

# Make sure the table is 2x2
cont_table = Table2x2(data.values)
chi_square_test = cont_table.test_nominal_association()
odds_ratio = cont_table.oddsratio.round(1)

print("Chi-square test:", chi_square_test)
print("Odds ratio:", odds_ratio)


group
treatment    256
control      244
Name: count, dtype: int64
admissionsQuiz  complete  incomplete
group                               
control              135         109
treatment            132         124
Chi-square test: df          1
pvalue      0.39885250459664334
statistic   0.71178308827889
Odds ratio: 1.2


In [24]:
# ------------------------------
# Experiment assignment simulation
# ------------------------------
random.seed(42)
idx = len(df) // 2
df.loc[:idx, "group"] = "control"
df.loc[idx:, "group"] = "treatment"

# Display counts
df["group"].value_counts()


group
control      250
treatment    250
Name: count, dtype: int64

# ==============================
# End of AB Testing Analysis
This notebook has walked through data aggregation, cohort assignment, and contingency table analysis for AB testing at WQU.

Key takeaways:
- Aggregated user sign-ups by nationality and date.
- Created and visualized contingency tables.
- Performed chi-square tests and calculated odds ratios.
- Prepared insights for further experiments or interventions.

✅ All analyses are complete.
# ==============================
