
# Crisis_recovery_MLFlow_Registration

## Purpose

This notebook is responsible for **registering trained machine learning models**
into the **MLflow Model Registry** and managing their **lifecycle states**.

It acts as the **governance and control layer** of the Crisis Recovery ML system,
ensuring that only validated, traceable models are eligible for deployment.

This notebook does **not train models**.
It formalizes, versions, and promotes models that were already trained and logged.

---

## Business Context

During a crisis, predictive models directly influence:
- Customer retention actions
- Incentive allocation
- Executive decision-making

Deploying an ungoverned or incorrect model can:
- Trigger wrong interventions
- Waste recovery budget
- Erode trust in analytics and AI systems

Therefore, model registration and promotion must be:
- Explicit
- Auditable
- Reversible

This notebook enforces those controls.

---

## Inputs and Outputs

### Inputs

| Artifact | Purpose |
|-------|--------|
| MLflow training run | Source of trained model artifacts |
| Model URI (`runs:/...`) | Identifies the exact model to register |
| Evaluation metrics | Support promotion decisions |

---

### Outputs

| Output | Business Purpose |
|------|------------------|
| Registered MLflow model | Centralized, discoverable model |
| Model version | Versioned snapshot of model state |
| Lifecycle stage | Controls deployment eligibility |

---

## Design Principles

- **No training logic** in this notebook
- Models are registered **only after evaluation**
- Every model version must be traceable to a run
- Lifecycle stages control production access
- Rollback must always be possible

---

## 1: Connect to MLflow Registry

### Business Problem

Without a centralized registry:
- Teams deploy inconsistent models
- Version history is lost
- Governance audits fail

---

### Approach

We connect to the MLflow Model Registry to:
- Discover existing models
- Register new model versions
- Manage lifecycle stages

The registry acts as the **single source of truth**
for all deployable ML models.

---

## 2: Identify Model Artifact to Register

### Business Problem

Registering the wrong artifact:
- Breaks inference pipelines
- Produces invalid predictions
- Creates silent failures

---

### Approach

We explicitly reference the model using:
- MLflow run ID
- Artifact path (`churn_pipeline_model`)

This guarantees:
- Exact reproducibility
- Clear lineage from training → deployment
- Traceability for audits

---

## 3: Register Model Version

### Business Problem

Models must evolve safely over time.

Overwriting existing models:
- Destroys historical context
- Prevents rollback
- Breaks comparison analysis

---

### Approach

We register the model as a **new version**
under a stable model name.

Each registration:
- Creates an immutable version
- Preserves training metadata
- Enables side-by-side comparison with older models

---

## 4: Model Version Validation

### Business Problem

Not every trained model is production-worthy.

Promoting weak models can:
- Increase false churn alerts
- Miss high-risk customers
- Damage business outcomes

---

### Approach

Before promotion, we validate:
- Evaluation metrics (recall-first focus)
- Feature set consistency
- Training data alignment

Only models that meet predefined criteria
are considered for lifecycle promotion.

---

## 5: Lifecycle Stage Management

### Business Problem

Production systems must never consume:
- Experimental models
- Unvalidated models
- Incomplete artifacts

---

### Approach

We manage model stages explicitly:
- `None` → newly registered
- `Staging` → validated, pre-production
- `Production` → approved for inference

Stage transitions are intentional
and reversible, enabling safe experimentation.

---

## 6: Versioning & Rollback Strategy

### Business Problem

Crisis conditions evolve rapidly.
Models may degrade or become invalid.

Without rollback:
- Failures persist
- Customer trust erodes
- Recovery actions misfire

---

### Approach

MLflow versioning enables:
- Instant rollback to a prior version
- Comparison of historical performance
- Controlled deprecation of outdated models

This ensures resilience under changing crisis dynamics.




In [0]:
import mlflow
from mlflow.tracking import MlflowClient

# -------------------------------------------------------
# 0. Setup
# -------------------------------------------------------
EXPERIMENT_NAME = "/Shared/QuickBite_Churn_Prediction"
MODEL_NAME = "workspace.default.quickbite_churn_predictor"

client = MlflowClient()

# -------------------------------------------------------
# 1. Get latest successful run (training artifact)
# -------------------------------------------------------
last_run = mlflow.search_runs(
    experiment_names=[EXPERIMENT_NAME],
    max_results=1
).iloc[0]

run_id = last_run.run_id

# IMPORTANT: must match artifact_path used in log_model
model_uri = f"runs:/{run_id}/churn_pipeline_model"

print(f"Latest run_id: {run_id}")

# -------------------------------------------------------
# 2. Check if registered model already exists (UC-safe)
# -------------------------------------------------------
try:
    client.get_registered_model(MODEL_NAME)
    model_exists = True
    print(f"Registered model '{MODEL_NAME}' already exists.")
except Exception:
    model_exists = False
    print(f"Registered model '{MODEL_NAME}' does NOT exist yet.")

# -------------------------------------------------------
# 3. If model exists, check if THIS run is already registered
# -------------------------------------------------------
already_registered = False

if model_exists:
    existing_versions = client.search_model_versions(
        f"name='{MODEL_NAME}'"
    )

    already_registered = any(
        v.run_id == run_id for v in existing_versions
    )

# -------------------------------------------------------
# 4. Register model ONLY if needed
# -------------------------------------------------------
if already_registered:
    print("Model from this run is already registered. Skipping registration.")
else:
    print(f"Registering model '{MODEL_NAME}' from run {run_id}")

    model_version = mlflow.register_model(
        model_uri=model_uri,
        name=MODEL_NAME
    )

    # ---------------------------------------------------
    # 5. Assign alias (Unity Catalog – NOT stages)
    # ---------------------------------------------------
    client.set_registered_model_alias(
        name=MODEL_NAME,
        alias="staging",
        version=model_version.version
    )

    # ---------------------------------------------------
    # 6. Governance tags (enterprise requirement)
    # ---------------------------------------------------
    client.set_model_version_tag(
        name=MODEL_NAME,
        version=model_version.version,
        key="deployment_mode",
        value="batch_inference_only"
    )

    client.set_model_version_tag(
        name=MODEL_NAME,
        version=model_version.version,
        key="known_limitation",
        value="extreme_class_imbalance"
    )

    print(
        f"Model {MODEL_NAME} v{model_version.version} "
        f"registered and assigned alias 'staging'"
    )


- This model predicts churn risk using behavioral and crisis-aware features.
Due to extreme class imbalance, it is intended for batch analysis and segmentation, not real-time decisioning. Therefore this model is registered as "Staging".

## Summary

This notebook establishes **model governance and lifecycle control**
for crisis-aware churn prediction by:

- Registering trained models into a centralized registry
- Creating immutable, versioned model snapshots
- Managing promotion to Staging and Production
- Enabling auditability, rollback, and safe deployment

It represents the **control plane** of the ML system,
ensuring that predictive intelligence remains
trustworthy, explainable, and production-safe.


## Downstream Dependencies

The MLflow Registration layer feeds:

### Model Inference Pipelines
- Batch churn scoring jobs
- Scheduled crisis-period predictions
- Real-time or near-real-time inference systems



### Deployment & Serving Infrastructure
- Databricks Model Serving
- Batch scoring workflows
- Feature store–backed inference jobs

### Governance & Monitoring Systems
- Model audit reviews
- Performance drift monitoring
- Compliance and documentation workflows



Any error in this layer directly affects:
- Production model behavior
- Customer-facing decisions
- Business trust in AI systems

This is why **model registration must be explicit, disciplined, and auditable**.
