In [0]:
# Optimizing Customer Retention Spend Using AI-Driven Churn Risk Intelligence

## Overview
Customer churn is one of the biggest silent revenue losses for e-commerce businesses.  
While many systems predict *who* might churn, they fail to answer the more critical business question:

**Which customers should we act on, and how, given limited retention budgets?**

This project builds an **end-to-end Databricks Lakehouse solution** that not only predicts customer churn but also **translates predictions into prioritized, cost-aware retention actions**.

The solution is designed to be **production-ready, automated, governed, and explainable**.

---

## Business Problem
E-commerce platforms often:
- Lose customers without early warning
- Spend heavily on blanket discounts and campaigns
- Lack clarity on which customers are worth saving

### Objective
Build an AI system that:
- Identifies churn-prone customers
- Estimates revenue at risk
- Recommends targeted retention actions
- Helps businesses optimize retention spend

---

## Why AI (Not Rule-Based Systems)
Rule-based churn detection fails because:
- Customer behavior is non-linear
- Purchase patterns vary across customers
- Fixed thresholds cannot adapt to changing trends

Machine Learning learns behavioral patterns from historical data and supports **data-driven decision-making** instead of static rules.

---

## Solution Architecture
The project follows the **Databricks Lakehouse Medallion Architecture**:

### ðŸ¥‰ Bronze Layer
- Raw transactional data ingestion
- Immutable, auditable Delta tables
- Ingestion metadata for lineage

### ðŸ¥ˆ Silver Layer
- Cleaned and validated transactions
- Customer-level feature engineering
- Business-driven churn labeling

### ðŸ¥‡ Gold Layer
- Business-ready churn insights
- Risk segmentation
- Actionable retention recommendations

### ðŸ¤– ML Layer
- Churn prediction model
- MLflow experiment tracking
- Predictions stored back to Delta tables

---

## Feature Engineering
Customer behavior is captured using:
- **Recency** â€“ Days since last purchase
- **Frequency** â€“ Number of purchases
- **Monetary Value** â€“ Total spend
- **Average Order Value** â€“ Spending behavior

### Churn Definition
A customer is labeled as churned if they have not made a purchase in the last **90 days**  
(This is a business assumption and can be tuned.)

---

## Machine Learning Approach
- **Problem Type**: Binary Classification
- **Model Used**: Logistic Regression
- **Reasoning**:
  - Interpretable
  - Efficient baseline model
  - Suitable for business explanations

### Evaluation
- Train/Test split (80/20)
- ROC-AUC used as primary metric
- MLflow used for:
  - Parameter tracking
  - Metric logging
  - Model versioning

---

## From Prediction to Business Decisions (Key Differentiator)
Instead of stopping at churn prediction, the system:
- Segments customers into **Low / Medium / High Risk**
- Estimates **revenue at risk**
- Recommends **targeted retention actions**

### Example Actions
- **High Risk + High Value** â†’ Discount incentive
- **Medium Risk** â†’ Re-engagement email
- **Low Risk** â†’ Loyalty rewards

This ensures retention efforts are **focused, cost-aware, and impactful**.

---

## Analytics & Insights
SQL-based analytics answer key business questions:
- How many customers are at high churn risk?
- How much revenue is at risk?
- Who should be prioritized for retention?

These insights can directly support marketing and business teams.

---

## Orchestration
- End-to-end pipeline automated using **Databricks Jobs**
- Tasks executed in sequence:
  - Bronze â†’ Silver â†’ Gold â†’ ML Training
- Pipeline is retrainable and production-ready

---

## Governance
- **Unity Catalog** used for data organization
- Logical separation of:
  - Raw data
  - Processed data
  - Business insights
  - ML artifacts
- Designed with access control and data lineage in mind

---

## Limitations & Future Enhancements
- Churn threshold is a business assumption
- Cost-sensitive modeling can further optimize decisions
- A/B testing can validate retention strategies
- Real-time inference can be added
- Additional features (customer tenure, category diversity) can improve accuracy

---

## Reproducibility
- Modular notebooks
- Delta Lake storage
- Automated orchestration
- Clear data flow and documentation

---

## Conclusion
This project demonstrates how AI can move beyond prediction and support **real business decisions**.  
By combining Databricks Lakehouse architecture with machine learning and analytics, the system helps businesses **retain the right customers at the right cost**.
