### Growth Analytics Experiment â€” Metrics & Experiment Setup

This notebook prepares transactional data for growth experimentation by:
- Cleaning and validating raw data
- Creating order- and customer-level metrics
- Assigning experiment groups
- Defining pre/post periods


1. Load & Prepare Raw Data

In [47]:
import pandas as pd
import numpy as np

# Load raw data
df = pd.read_csv(
    "../data/raw/online_retail.csv",
    parse_dates=["InvoiceDate"]
)

# Basic cleaning
df = df.dropna(subset=["CustomerID"])
returns = df[df["Quantity"] < 0].copy() # For return-rate analysis if necessary
df["CustomerID"] = df["CustomerID"].astype(int)
df = df[df["Quantity"] > 0]
df = df[df["UnitPrice"] > 0]



# Sort for time-based logic
df = df.sort_values("InvoiceDate").reset_index(drop=True)


2. Feature Engineering (GMV + Order Level)

In [48]:
# GMV
df["GMV"] = df["Quantity"] * df["UnitPrice"]

# Order-level aggregation
orders = (
    df.groupby(["InvoiceNo", "CustomerID", "InvoiceDate"])
      .agg(
          order_gmv=("GMV", "sum"),
          items=("Quantity", "sum")
      )
      .reset_index()
)


3. Customer-Level Metrics (Core Growth Table)

In [49]:
customers = (
    orders.groupby("CustomerID")
    .agg(
        total_orders=("InvoiceNo", "nunique"),
        total_gmv=("order_gmv", "sum"),
        avg_order_value=("order_gmv", "mean"),
        first_purchase=("InvoiceDate", "min"),
        last_purchase=("InvoiceDate", "max")
    )
    .reset_index()
)

# Customer lifetime (days)
customers["customer_lifetime_days"] = (
    customers["last_purchase"] - customers["first_purchase"]
).dt.days


4. Synthetic Experiment Assignment (Control vs Treatment)

In [50]:
customers["experiment_group"] = customers["CustomerID"].apply(
    lambda x: "treatment" if hash(x) % 2 == 0 else "control"
)


5. Pre / Post Period Split (Time-Based Growth Analysis)

In [51]:
split_date = df["InvoiceDate"].quantile(0.7)

orders["period"] = np.where(
    orders["InvoiceDate"] <= split_date,
    "pre",
    "post"
)


6. Experiment Metrics Table (Affirm-Style)

In [52]:
experiment_metrics = (
    orders.merge(customers[["CustomerID", "experiment_group"]], on="CustomerID")
    .groupby(["experiment_group", "period"])
    .agg(
        orders=("InvoiceNo", "nunique"),
        gmv=("order_gmv", "sum"),
        avg_order_value=("order_gmv", "mean")
    )
    .reset_index()
)


7. Save Processed Outputs (Production Discipline)

In [53]:
customers.to_csv("../data/processed/customers_growth_metrics.csv", index=False)
orders.to_csv("../data/processed/orders_enriched.csv", index=False)
experiment_metrics.to_csv("../data/processed/experiment_metrics.csv", index=False)
