# SaaS Churn Analysis
### Separating true cancellations from plan changes

This notebook walks through the key findings from the churn analytics pipeline.  
All data generated via `sql/00_setup.sql` (synthetic, 500 accounts).

**The core problem:** the billing system records plan upgrades as a cancellation + new subscription.  
Naive churn queries count the cancellation — which inflates churn rate.  
This pipeline detects and excludes those plan-change rows.

---


In [None]:
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import matplotlib.patches as mpatches
import numpy as np
import pandas as pd

# color palette
DARK   = "#1a1a2e"
ACCENT = "#e94560"
BLUE   = "#0f3460"
TEAL   = "#16213e"
LIGHT  = "#f5f5f5"
GRAY   = "#9e9e9e"
GREEN  = "#2ecc71"
PLAN_COLORS = {"Enterprise": "#0f3460", "Growth": "#e94560", "Starter": "#f39c12"}

plt.rcParams.update({
    "figure.facecolor": DARK, "axes.facecolor": TEAL,
    "axes.edgecolor": "#2a2a4a", "axes.labelcolor": LIGHT,
    "xtick.color": GRAY, "ytick.color": GRAY, "text.color": LIGHT,
    "grid.color": "#2a2a4a", "grid.linewidth": 0.8,
    "font.family": "sans-serif", "axes.titlesize": 13,
    "axes.titleweight": "bold", "axes.titlepad": 12,
})
print("Ready.")

## Data
Output from `sql/02_task1_monthly_churn.sql` and `sql/03_task2_churn_by_plan.sql`.  
In production, replace the hardcoded values below with a live database connection.


In [None]:
# Monthly churn — from sql/02_task1_monthly_churn.sql
monthly_data = [
    ("Mar 2025", 237, 2, 0.84), ("Apr 2025", 254, 3, 1.18),
    ("May 2025", 270, 1, 0.37), ("Jun 2025", 295, 0, 0.00),
    ("Jul 2025", 312, 2, 0.64), ("Aug 2025", 334, 2, 0.60),
    ("Sep 2025", 352, 6, 1.70), ("Oct 2025", 370, 3, 0.81),
    ("Nov 2025", 395, 5, 1.27), ("Dec 2025", 418, 1, 0.24),
    ("Jan 2026", 433, 3, 0.69), ("Feb 2026", 444, 2, 0.45),
]
df = pd.DataFrame(monthly_data, columns=["month", "active_start", "churned", "churn_rate"])
df["rolling_avg"] = df["churn_rate"].rolling(3, min_periods=1).mean().round(2)

# Churn by plan — from sql/03_task2_churn_by_plan.sql
plan_data = [
    ("Sep 2025","Enterprise",110,0,0.00), ("Sep 2025","Growth",112,2,1.79),  ("Sep 2025","Starter",130,4,3.08),
    ("Oct 2025","Enterprise",117,0,0.00), ("Oct 2025","Growth",121,2,1.65),  ("Oct 2025","Starter",132,1,0.76),
    ("Nov 2025","Enterprise",126,1,0.79), ("Nov 2025","Growth",132,2,1.52),  ("Nov 2025","Starter",137,2,1.46),
    ("Dec 2025","Enterprise",133,1,0.75), ("Dec 2025","Growth",143,0,0.00),  ("Dec 2025","Starter",142,0,0.00),
    ("Jan 2026","Enterprise",135,0,0.00), ("Jan 2026","Growth",153,2,1.31),  ("Jan 2026","Starter",145,1,0.69),
    ("Feb 2026","Enterprise",141,1,0.71), ("Feb 2026","Growth",162,1,0.62),  ("Feb 2026","Starter",141,0,0.00),
]
dfp = pd.DataFrame(plan_data, columns=["month","plan","active","churned","churn_rate"])

print(f"Loaded {len(df)} months of data, {len(dfp)} plan-month rows")
df.head()

## 1. Monthly Churn Rate Trend

The 3-month rolling average smooths noise. Use the point values for precision, the rolling line for trend direction.

Key observation: September 2025 spike (1.70%) was driven by 6 accounts — worth a CS investigation into what changed that month.


In [None]:
fig, ax = plt.subplots(figsize=(12, 5))
fig.patch.set_facecolor(DARK)

bars = ax.bar(df["month"], df["churn_rate"], color=ACCENT, alpha=0.55, width=0.6, zorder=2)
ax.plot(df["month"], df["rolling_avg"], color=ACCENT, linewidth=2.5,
        marker="o", markersize=5, zorder=3, label="3-month rolling avg")

for bar, val in zip(bars, df["churn_rate"]):
    if val > 0:
        ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.04,
                f"{val}%", ha="center", va="bottom", fontsize=8.5, color=LIGHT, alpha=0.85)

ax.yaxis.set_major_formatter(mtick.PercentFormatter())
ax.set_ylabel("Churn Rate")
ax.set_title("Monthly Churn Rate — Rolling 12 Months")
ax.set_ylim(0, max(df["churn_rate"]) * 1.4)
ax.tick_params(axis="x", rotation=35)
ax.legend(framealpha=0, fontsize=9)
ax.grid(axis="y", linestyle="--", alpha=0.4)
ax.set_axisbelow(True)
ax.annotate("Sep spike: 6 accounts\nworth investigating",
            xy=(6, 1.70), xytext=(7.5, 1.55), fontsize=8, color=GRAY,
            arrowprops=dict(arrowstyle="->", color=GRAY, lw=1))

plt.tight_layout()
plt.savefig("chart1_monthly_churn.png", dpi=150, bbox_inches="tight")
plt.show()
print(f"Avg monthly churn rate: {df.churn_rate.mean():.2f}%")
print(f"Peak month: {df.loc[df.churn_rate.idxmax(), 'month']} at {df.churn_rate.max():.2f}%")

## 2. Churn by Plan Tier

Starter historically shows the highest churn — no annual commitment, easiest to cancel.  
Enterprise is the most stable segment. The Sep 2025 Starter spike (3.08%) is the outlier to watch.


In [None]:
plan_months = ["Sep 2025","Oct 2025","Nov 2025","Dec 2025","Jan 2026","Feb 2026"]
x = np.arange(len(plan_months))
width = 0.25

fig, ax = plt.subplots(figsize=(12, 5))
fig.patch.set_facecolor(DARK)

for i, plan in enumerate(["Enterprise","Growth","Starter"]):
    vals = [dfp[(dfp.month==m) & (dfp.plan==plan)]["churn_rate"].values[0] for m in plan_months]
    offset = (i - 1) * width
    b = ax.bar(x + offset, vals, width=width, label=plan,
               color=PLAN_COLORS[plan], alpha=0.85, zorder=2)
    for bar, v in zip(b, vals):
        if v > 0:
            ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.04,
                    f"{v}%", ha="center", va="bottom", fontsize=7.5, color=LIGHT, alpha=0.8)

ax.set_xticks(x)
ax.set_xticklabels(plan_months, rotation=25)
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
ax.set_ylabel("Churn Rate")
ax.set_title("Churn Rate by Plan Tier — Last 6 Months")
ax.set_ylim(0, 4.5)
ax.legend(framealpha=0.1, fontsize=9, loc="upper right")
ax.grid(axis="y", linestyle="--", alpha=0.4)
ax.set_axisbelow(True)

plt.tight_layout()
plt.savefig("chart2_churn_by_plan.png", dpi=150, bbox_inches="tight")
plt.show()

# avg by plan
for plan in ["Enterprise","Growth","Starter"]:
    avg = dfp[dfp.plan==plan]["churn_rate"].mean()
    print(f"{plan} avg churn rate: {avg:.2f}%")

## 3. Plan-Change Contamination

This is the core finding. 37.5% of rows with `status = 'cancelled'` are NOT true churn — they're plan upgrades or downgrades that the billing system records as a cancellation + new subscription pair.

Without this correction, every upgrade would inflate the churn count.


In [None]:
fig, ax = plt.subplots(figsize=(6, 6))
fig.patch.set_facecolor(DARK)
ax.set_facecolor(DARK)

sizes   = [50, 30]
labels  = ["True Churn\n50 rows (62.5%)", "Plan Changes\n30 rows (37.5%)"]
colors  = [ACCENT, BLUE]

wedges, texts, autotexts = ax.pie(
    sizes, colors=colors, explode=(0.03, 0.03),
    autopct="%1.0f%%", startangle=90,
    wedgeprops=dict(width=0.55, edgecolor=DARK, linewidth=2),
    pctdistance=0.75,
)
for at in autotexts:
    at.set_color(LIGHT); at.set_fontsize(13); at.set_fontweight("bold")

ax.text(0, 0.08, "80", ha="center", va="center", fontsize=26, fontweight="bold", color=LIGHT)
ax.text(0, -0.18, "total cancelled\nrows", ha="center", va="center", fontsize=9, color=GRAY)

patches = [mpatches.Patch(color=c, label=l) for c, l in zip(colors, labels)]
ax.legend(handles=patches, loc="lower center", bbox_to_anchor=(0.5, -0.08),
          framealpha=0, fontsize=9)
ax.set_title("Plan-Change Contamination\n37.5% of cancellations were NOT true churn", pad=14, fontsize=11)

plt.tight_layout()
plt.savefig("chart3_contamination.png", dpi=150, bbox_inches="tight")
plt.show()

## 4. Active Account Growth

Account base grew from 237 to 444 (+87%) over the 12-month window.  
Healthy growth trend — churn is not offsetting acquisition.


In [None]:
fig, ax = plt.subplots(figsize=(12, 4))
fig.patch.set_facecolor(DARK)

ax.fill_between(df["month"], df["active_start"], alpha=0.15, color="#4fc3f7")
ax.plot(df["month"], df["active_start"], color="#4fc3f7", linewidth=2.5,
        marker="o", markersize=5, zorder=3)

start, end = df.active_start.iloc[0], df.active_start.iloc[-1]
ax.annotate(f"{start}", xy=(0, start), xytext=(0.3, start - 18), fontsize=9, color=GRAY)
ax.annotate(f"{end} (+{end-start}, +{round((end-start)/start*100)}%)",
            xy=(11, end), xytext=(9.2, end + 10), fontsize=9, color=GREEN)

ax.set_ylabel("Active Accounts")
ax.set_title("Active Account Base — 12-Month Growth")
ax.tick_params(axis="x", rotation=35)
ax.grid(axis="y", linestyle="--", alpha=0.4)
ax.set_axisbelow(True)
ax.set_ylim(180, 510)

plt.tight_layout()
plt.savefig("chart4_active_growth.png", dpi=150, bbox_inches="tight")
plt.show()

---
## Summary

| Finding | Value |
|---|---|
| Plan-change contamination | 37.5% of cancelled rows |
| Avg monthly churn rate (corrected) | 0.72% |
| Peak churn month | Sep 2025 — 1.70% |
| Highest-churn plan | Starter (avg ~1.5% in active months) |
| Lowest-churn plan | Enterprise (avg ~0.4%) |
| Account base growth (12 mo) | +87% |

**Next steps for production:**
- Connect directly to Postgres instead of hardcoded data (replace data cells with `psycopg2` or `sqlalchemy` connection)
- Add MRR churn chart (dollar-weighted, not logo count)
- Build cohort retention heatmap using `started_at` as cohort month
