# 03 — Stability / Drift-Aware Checks

This notebook helps answer:
- Do feature distributions shift over time?
- Are segments stable (volume, target rates)?

It uses PSI (Population Stability Index) and a KS-test heuristic.

In [None]:
from src.profiling import load_csv
from src.stability import numeric_drift_report, segment_stability

DATA_PATH = "../data/sample.csv"
df = load_csv(DATA_PATH)
df.head()

## Configure time split

`split_time` should be a date that separates earlier vs later data.
If you have a natural event timestamp, use it.

In [None]:
TIME_COL = "timestamp"   # change me
SPLIT_TIME = "2025-07-01" # change me

if TIME_COL in df.columns:
    drift = numeric_drift_report(df, time_col=TIME_COL, split_time=SPLIT_TIME)
    drift.head(30)
else:
    print("TIME_COL not found — skipping drift report")

## Segment stability

Pick a business-relevant segment column (e.g., store_id, region, device_type).
If you have a binary target, you can also check target rates by segment.

In [None]:
SEGMENT_COL = None  # e.g., "region"
TARGET_COL = None   # e.g., "target"

if SEGMENT_COL and SEGMENT_COL in df.columns:
    segment_stability(df, segment_col=SEGMENT_COL, target_col=TARGET_COL).head(20)
else:
    print("Set SEGMENT_COL to run segment stability")

## Interpretation guide

- PSI < 0.10: typically stable
- PSI 0.10–0.25: warning (monitor / investigate)
- PSI > 0.25: likely drift (risk for production reliability)

Record drift risks and any features that might need monitoring or exclusion.