
# Crisis_recovery_Model_Training_and_Lifecycle

## Purpose

This notebook manages the **machine learning lifecycle**
for the **Crisis Recovery Lakehouse**.

It is responsible for:
- Training churn and risk prediction models
- Validating model performance
- Registering models into MLflow
- Managing model versions and stages (Staging → Production)

This notebook converts **analytical features**
into **predictive intelligence** used for proactive crisis response.

---

## Business Context

During a crisis, **reactive dashboards are not enough**.

Leadership and operations teams need:
- Early warnings about customer churn
- Signals of worsening customer experience
- Prioritization of intervention before losses occur

Machine learning enables the organization to:
- Predict customer risk instead of reacting late
- Target recovery actions efficiently
- Measure crisis impact quantitatively

This notebook operationalizes that intelligence.

---

## Inputs and Outputs

### Inputs (from Gold / Feature Engineering Layer)

| Source Table | Purpose |
|-------------|--------|
| `ml_churn_features` | Model-ready customer features |
| `gold_customer_churn_risk` (optional baseline) | Benchmark comparison |

All features are:
- Cleaned
- Aggregated
- Free of raw text
- Safe for ML consumption

---

### Outputs

| Artifact | Business Purpose |
|--------|------------------|
| Trained ML model | Predict customer churn risk |
| MLflow experiment | Track model runs & metrics |
| Registered model | Enable deployment & governance |
| Model version history | Auditability & rollback |

---

## ML Design Principles

- Models must be **interpretable**
- Features must be **business-explainable**
- Performance must be **measured on unseen data**
- Every model must be **tracked and versioned**
- No model training occurs on raw or Silver data

In [0]:

%sql
CREATE VOLUME IF NOT EXISTS workspace.food_delivery.mlflow_models_temp;

## 1. Feature Definition

### Business Problem

To predict customer churn during a crisis, the model must rely on
**behavioral, experiential, and engagement signals** rather than raw events.

Poor or irrelevant features lead to:
- Unstable models
- False churn predictions
- Low trust from business teams

---

### Approach

We explicitly define a **curated feature set** that captures:
- Customer activity (`total_orders`)
- Experience quality (`avg_star_rating`, `late_order_ratio`)
- Crisis sensitivity (`crisis_exposure_index`, `sentiment_velocity`)
- Value & engagement (`rfm_score`, `segment_index`)

These features are already:
- Aggregated
- Normalized upstream
- Approved for ML usage

## 2. Load Training Dataset

### Business Problem

Machine learning models must be trained on
**clean, stable, and business-aligned data**.

Training directly from Silver or raw tables would:
- Mix analytical and ML concerns
- Increase leakage risk
- Reduce reproducibility

---

### Approach

We load data from the **ML feature layer** (`ml_churn_features`),
which represents the **single source of truth for model training**.

This table is:
- Feature-engineered upstream
- Validated for nulls and ranges
- Safe for repeated model experiments


## 3. Train–Test Split

### Business Problem

A model that performs well only on historical data
cannot be trusted in production.

Without a proper split:
- Metrics become misleading
- Overfitting goes undetected
- Real churn risk is underestimated

---

### Approach

We split the dataset into:
- **80% training data** → model learning
- **20% test data** → unbiased evaluation

A fixed seed ensures:
- Reproducibility
- Comparable experiment runs

## 4. Feature Vector Assembly

### Business Problem

Spark ML algorithms require features to be represented
as a **single vector column**.

Passing raw columns directly to the model
is not supported and breaks pipeline execution.

---

### Approach

We use `VectorAssembler` to:
- Combine all selected feature columns
- Produce a unified `features` vector

This step is **technical, not analytical** —
no feature transformation logic happens here.

## 5. Model Definition

### Business Problem

Churn behavior during crises is:
- Non-linear
- Influenced by multiple interacting factors
- Sensitive to recent negative experiences

A simple linear model may fail to capture these patterns.

---

### Approach

We use a **Gradient-Boosted Trees (GBT) classifier** because it:
- Handles non-linear relationships well
- Is robust to feature interactions
- Performs reliably on tabular behavioral data

Tree depth is limited to control overfitting.

## 6. ML Pipeline Construction

### Business Problem

Manually managing multiple ML steps increases:
- Operational complexity
- Risk of inconsistent transformations
- Deployment errors

---

### Approach

We build a Spark ML **Pipeline** that:
- Assembles features
- Trains the classifier
- Applies transformations consistently

This ensures the **same logic** is used during:
- Training
- Evaluation
- Future inference

## 7. MLflow Artifact Storage Configuration

### Business Problem

Distributed Spark models require a
**shared and accessible storage location**
for logging artifacts.

Incorrect paths cause:
- Logging failures
- Incomplete model registration
- Broken lineage

---

### Approach

We define a Unity Catalog–compatible
temporary storage path used by MLflow
to stage model artifacts before upload.


## 8. MLflow Experiment Setup

### Business Problem

Without structured experiment tracking:
- Model results cannot be compared
- Improvements cannot be justified
- Governance audits become difficult

---

### Approach

We explicitly set the MLflow experiment so that:
- All runs are grouped logically
- Metrics are comparable across versions
- Model history remains auditable

## 9. Model Training Run

### Business Problem

Each training attempt must be:
- Traceable
- Reproducible
- Measurable

Ad-hoc model fitting without tracking
breaks the ML lifecycle.

---

### Approach

We start an MLflow run to:
- Train the pipeline
- Generate predictions on unseen data
- Capture metrics and parameters

This run represents **one model version candidate**.

## 10. Model Evaluation (Recall Focus)

### Business Problem

In churn prediction, **missing a churner**
is more costly than incorrectly flagging a loyal customer.

Therefore, recall on churned users is critical.

---

### Approach

We evaluate the model using:
- **Recall for label = 1 (churn)**

This directly measures the model’s ability to:
- Detect at-risk customers
- Support proactive retention actions


## 11. MLflow Logging

### Business Problem

Model metrics without context
cannot be trusted or reused.

We need to know:
- How the model performed
- Which features were used
- Under which configuration it was trained

---

### Approach

We log:
- Recall metric for churn detection
- Feature set description as a parameter

This metadata enables:
- Model comparison
- Governance review
- Informed promotion decisions


In [0]:
from pyspark.ml import Pipeline
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml.classification import GBTClassifier
import mlflow
import mlflow.spark

# 1. Define Features
feature_cols = [
    "total_orders",
    "avg_star_rating",
    "late_order_ratio",
    "segment_index",
    "crisis_exposure_index",
    "sentiment_velocity",
    "rfm_score"
]

# 2. Load Data
df = spark.table("food_delivery.ml_churn_features")

# 3. Train/Test Split
# Splitting data: 80% for training, 20% for testing. 
# seed=42 ensures reproducibility
train_df, test_df = df.randomSplit([0.8, 0.2], seed=42)

# 4. Feature Engineering
# The VectorAssembler takes the list of columns and combines them into 
# a single vector column named "features", which Spark ML models require.
assembler = VectorAssembler(
    inputCols=feature_cols,
    outputCol="features"
)

# 5. Define Model
# Using Spark's native Gradient-Boosted Trees (GBT) classifier.
# maxDepth=5 limits the depth of each tree to prevent overfitting.
gbt = GBTClassifier(
    labelCol="label",
    featuresCol="features",
    maxDepth=5
)

# 6. Build Pipeline
# Chains the assembler and the model together. 
# When we call fit(), data flows through assembler -> gbt.
pipeline = Pipeline(stages=[assembler, gbt])

# 7. Define MLflow Storage Path
# This path is required for distributed logging to Unity Catalog Volumes.
uc_model_temp_path = "/Volumes/workspace/food_delivery/mlflow_models_temp"

# 8. Set Experiment
# Sets the context so all runs are grouped under this experiment name.
mlflow.set_experiment("/Shared/QuickBite_Churn_Prediction")

# 9. Start Training Run
with mlflow.start_run(run_name="GBT_Crisis_Aware_Pipeline"):

    # Train the pipeline on the training data
    pipeline_model = pipeline.fit(train_df)
    
    # Generate predictions on the test data (automatically transforms features)
    predictions = pipeline_model.transform(test_df)

    # 10. Evaluation
    # We use MulticlassEvaluator to calculate Recall specifically for label 1.
    # churn=1 is the "positive" class we care about detecting.
    evaluator = MulticlassClassificationEvaluator(
        labelCol="label",
        metricName="recallByLabel"
    )

    # The metricLabel parameter ensures we get recall for '1' (Churn), 
    # not the weighted average or the recall for '0'.
    recall = evaluator.evaluate(
        predictions,
        {evaluator.metricLabel: 1}
    )

    # 11. Logging
    # Log the numeric metric
    mlflow.log_metric("recall_churn", recall)
    
    # Log a parameter to describe the features used in this run
    mlflow.log_param("feature_set", "baseline + crisis + rfm")
    
    # Log the trained pipeline model. 
    # dfs_tmpdir is used to stage the model artifact before upload.
    mlflow.spark.log_model(
        pipeline_model, 
        "churn_pipeline_model",
        dfs_tmpdir=uc_model_temp_path
    )

    # print(f"Recall (churn=1): {recall:.3f}")

## Model Signature & Production Logging

### Business Problem

A trained model is not production-ready unless downstream systems know:
- What inputs the model expects
- What outputs it produces
- How to validate requests at inference time

Without a model signature:
- Inference pipelines break silently
- Feature mismatches cause runtime errors
- Governance and review processes fail

### Production Logging

The trained Spark ML pipeline is logged to MLflow with:
- Model artifact (`churn_pipeline_model`)
- Inferred signature
- Input example for testing and documentation

Artifacts are staged through a Unity Catalog–compatible
temporary volume to support distributed logging.

This step finalizes the model as a **deployable, governed asset**.

In [0]:
# Select a small sample of feature columns from the training data for input example
sample_input = train_df.select(feature_cols).limit(5)

# Generate model predictions on the sample input to obtain output example
sample_output = pipeline_model.transform(sample_input).select("prediction")

# Import MLflow's signature inference utility
from mlflow.models.signature import infer_signature

# Define the temporary storage path for MLflow model artifacts
uc_model_temp_path = "/Volumes/workspace/food_delivery/mlflow_models_temp"

# Infer the model signature using the sample input and output
signature = infer_signature(
    sample_input.toPandas(),
    sample_output.toPandas()
)

# Log the trained Spark ML pipeline model to MLflow with signature and input example
mlflow.spark.log_model(
    spark_model=pipeline_model,
    artifact_path="churn_pipeline_model",
    dfs_tmpdir=uc_model_temp_path,
    signature=signature,
    input_example=sample_input.toPandas()
)

In [0]:
pipeline_model.transform(test_df).printSchema()

> - Due to extreme class imbalance (≈0.12% churn rate) under a business-realistic churn definition (30+ days of inactivity), supervised churn prediction was not feasible without artificial resampling or proxy label redesign. This analysis instead highlights strong customer retention and underscores the critical role of churn definition in determining the viability of predictive modeling.


## Summary

This notebook establishes a **production-grade machine learning lifecycle**
for crisis-aware churn prediction by:

- Training a supervised churn model on curated, business-aligned features
- Capturing non-linear customer behavior using Gradient-Boosted Trees
- Evaluating performance with **recall-first metrics** aligned to churn risk
- Tracking experiments, metrics, and parameters using MLflow
- Registering a fully governed Spark ML pipeline with a formal model signature

It acts as the **decision intelligence layer** of the Crisis Recovery Lakehouse,
transforming engineered features into **predictive, auditable outcomes**.

This ensures that churn models do not merely predict risk,
but do so in a way that is:
- Reproducible
- Interpretable
- Safe to deploy in production


## Downstream Dependencies

The Model Training & Lifecycle layer feeds the following systems:

### Churn Prediction & Scoring Jobs
- Batch churn scoring pipelines
- Daily / weekly customer risk refresh
- Crisis-period churn monitoring

---

### MLflow Model Registry
- Model versioning (v1, v2, …)
- Stage transitions (Staging → Production)
- Rollback and audit support
- Governance and compliance workflows

---

### Business & Decision Systems
- Retention and CRM targeting
- Incentive and offer allocation
- Crisis recovery prioritization
- Executive churn risk reporting

---

Any failure or misalignment in this notebook directly impacts:
- Model accuracy
- Prediction reliability
- Business decision quality

This is why model training, evaluation, and registration
must remain **precise, explainable, and reproducible**.
