# 03 â€” Build Snapshot Features (30-day churn window)

This notebook builds **customer-level features** at a single **snapshot time** for churn modeling.

## What it does
- Loads `data/processed/interactions.parquet`
- Picks a **snapshot time** = (max event_time - 30 days)
- Builds features using the **shared feature builder**:
  `src.features.build_features.build_customer_features`
- Saves:
  `data/features/customer_features_snapshot_30d.parquet`

## Why this matters
We compute features using *only history up to the snapshot*, so labels can be defined using the *future window* separately.


In [None]:
from pathlib import Path
import sys
import pandas as pd

# Project root (assumes notebook is inside ./notebooks)
PROJECT_ROOT = Path.cwd().parent
sys.path.append(str(PROJECT_ROOT))

from src.features.build_features import build_customer_features

DATA_PATH = PROJECT_ROOT / "data" / "processed" / "interactions.parquet"
OUT_PATH = PROJECT_ROOT / "data" / "features" / "customer_features_snapshot_30d.parquet"

print("PROJECT_ROOT:", PROJECT_ROOT)
print("DATA_PATH   :", DATA_PATH)
print("OUT_PATH    :", OUT_PATH)


## Load interactions (processed)

This file is produced by Notebook 01 / pipeline `01_make_processed`.


In [None]:
df = pd.read_parquet(DATA_PATH)
df.shape, df.head()


## Choose snapshot time (30-day churn window)

We set:
- `snapshot_time = max(event_time) - 30 days`

This snapshot is used to build **historical features only** (no future leakage).


In [None]:
CHURN_WINDOW_DAYS = 30

df["event_time"] = pd.to_datetime(df["event_time"], errors="coerce")
max_time = df["event_time"].max()
snapshot_time = max_time - pd.Timedelta(days=CHURN_WINDOW_DAYS)

max_time, snapshot_time


## Build customer features at snapshot

Uses the unified builder from `src/features/build_features.py`.


In [None]:
features = build_customer_features(df, snapshot_time=snapshot_time)
features.shape, features.head()


## Quick sanity checks
- number of customers (rows)
- missingness overview (top 15)


In [None]:
print("rows:", len(features))
print("unique customers:", features["external_customerkey"].nunique())
features.isna().mean().sort_values(ascending=False).head(15)


## Save features parquet


In [None]:
OUT_PATH.parent.mkdir(parents=True, exist_ok=True)
features.to_parquet(OUT_PATH, index=False)
print("Wrote:", OUT_PATH, "rows:", len(features), "cols:", features.shape[1])


## Read-back check


In [None]:
check = pd.read_parquet(OUT_PATH)
check.shape, check.head()
