# 06 — Postprocess Churn Scores (Risk Buckets & Lists)

This notebook converts **raw churn probabilities** into **business-ready outputs**.

## Inputs
- `outputs/churn_scores_v1.csv` (from inference)

## Outputs
- Scores with risk buckets
- Top-N highest risk customers
- Critical & high-risk customer lists


In [None]:
from pathlib import Path
import pandas as pd

PROJECT_ROOT = Path.cwd().parent
SCORES_PATH = PROJECT_ROOT / "outputs" / "churn_scores_v1.csv"

OUT_WITH_BUCKETS = PROJECT_ROOT / "outputs" / "churn_scores_v1_with_buckets.csv"
OUT_TOP5000 = PROJECT_ROOT / "outputs" / "churn_scores_v1_top5000.csv"
OUT_CRITICAL = PROJECT_ROOT / "outputs" / "churn_scores_v1_critical.csv"
OUT_HIGH = PROJECT_ROOT / "outputs" / "churn_scores_v1_high.csv"

print("SCORES_PATH:", SCORES_PATH)


## Load churn scores

In [None]:
scores = pd.read_csv(SCORES_PATH)
scores.shape, scores.head()

## Assign risk buckets

Risk buckets are based on churn probability:
- **low**: < 0.50  
- **medium**: 0.50–0.70  
- **high**: 0.70–0.85  
- **critical**: ≥ 0.85  


In [None]:
scores["risk_bucket"] = pd.cut(
    scores["churn_probability"],
    bins=[0, 0.5, 0.7, 0.85, 1.0],
    labels=["low", "medium", "high", "critical"],
    include_lowest=True
)

scores["risk_bucket"].value_counts()

## Sort by churn risk (descending)

In [None]:
scores_sorted = scores.sort_values(
    "churn_probability", ascending=False
)

scores_sorted.head()

## Write outputs for operations

In [None]:
# Scores with buckets
scores_sorted.to_csv(OUT_WITH_BUCKETS, index=False)

# Top 5000 highest-risk customers
scores_sorted.head(5000).to_csv(OUT_TOP5000, index=False)

# Critical-risk customers
scores_sorted[scores_sorted["risk_bucket"] == "critical"].to_csv(
    OUT_CRITICAL, index=False
)

# High-risk customers
scores_sorted[scores_sorted["risk_bucket"] == "high"].to_csv(
    OUT_HIGH, index=False
)

print("Wrote:")
print("-", OUT_WITH_BUCKETS)
print("-", OUT_TOP5000)
print("-", OUT_CRITICAL)
print("-", OUT_HIGH)
