# Task 4.3.6: Model Registry & Version Control

**Module:** 4.3 - MLOps & Experiment Tracking  
**Time:** 2 hours  
**Difficulty:** ‚≠ê‚≠ê‚≠ê (Intermediate)

---

## Learning Objectives

By the end of this notebook, you will:
- [ ] Understand model versioning concepts and best practices
- [ ] Use MLflow Model Registry for version control
- [ ] Manage model lifecycle stages (staging, production)
- [ ] Implement model promotion and rollback workflows
- [ ] Use Hugging Face Hub for model storage

---

## Prerequisites

- Completed: Task 4.3.5 (Drift Detection)
- Knowledge of: MLflow basics, Git concepts

---

## Real-World Context

Imagine deploying a new model version that performs worse than the old one. Your service is degrading. Users are complaining. What do you do?

**Rollback!**

But you can only rollback if:
1. You saved the previous model
2. You know which version was running before
3. You can deploy it quickly

Model registries solve all these problems. Companies like Netflix, Uber, and Airbnb use them to manage thousands of models in production.

---

## ELI5: What is a Model Registry?

> **Imagine you're a chef creating new recipes.**
>
> You don't just throw away your old recipe when you make a new one! Instead, you:
> - Keep a **recipe book** with all versions
> - Mark which recipe is **currently served** at the restaurant
> - Note which ones are being **tested** by the kitchen staff
> - Archive old recipes that are **retired**
>
> If the new recipe flops, you can quickly go back to the old one!
>
> **In AI terms:** A model registry is a centralized store for all your model versions. It tracks which version is in production, which is being tested, and lets you quickly switch between them.

---

## Model Lifecycle Stages

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ    None      ‚îÇ --> ‚îÇ   Staging    ‚îÇ --> ‚îÇ  Production  ‚îÇ --> ‚îÇ   Archived   ‚îÇ
‚îÇ  (Training)  ‚îÇ     ‚îÇ  (Testing)   ‚îÇ     ‚îÇ   (Live)     ‚îÇ     ‚îÇ  (Retired)   ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

| Stage | Purpose | Actions Allowed |
|-------|---------|----------------|
| **None** | Just trained | Test, evaluate, promote |
| **Staging** | Pre-production testing | A/B test, validate, promote |
| **Production** | Serving traffic | Monitor, compare, archive |
| **Archived** | No longer active | Reference, restore |

## Part 1: MLflow Model Registry Setup

In [None]:
import mlflow
from mlflow.tracking import MlflowClient
import torch
import torch.nn as nn
import numpy as np
import os

print(f"MLflow version: {mlflow.__version__}")

# Configure MLflow
TRACKING_DIR = "./mlruns"
os.makedirs(TRACKING_DIR, exist_ok=True)
mlflow.set_tracking_uri(f"file://{os.path.abspath(TRACKING_DIR)}")

print(f"Tracking URI: {mlflow.get_tracking_uri()}")

In [None]:
# Create a simple model for demonstration
class SentimentModel(nn.Module):
    """Simple sentiment classifier."""
    
    def __init__(self, input_dim: int = 100, hidden_dim: int = 64, version: str = "1.0"):
        super().__init__()
        self.version = version
        self.network = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Linear(hidden_dim // 2, 1),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        return self.network(x)

# Create some model versions with different "performance"
def create_model_version(version: str, hidden_dim: int) -> SentimentModel:
    """Create a model version."""
    return SentimentModel(hidden_dim=hidden_dim, version=version)

print("Model class defined!")

---

## Part 2: Registering Models

Let's train and register multiple model versions.

In [None]:
def train_and_register_model(
    model_name: str,
    hidden_dim: int,
    learning_rate: float,
    epochs: int = 10,
    registered_name: str = "SentimentClassifier"
):
    """
    Train a model and register it in MLflow Model Registry.
    """
    mlflow.set_experiment("Model-Registry-Demo")
    
    with mlflow.start_run(run_name=model_name) as run:
        # Log parameters
        mlflow.log_params({
            "hidden_dim": hidden_dim,
            "learning_rate": learning_rate,
            "epochs": epochs,
            "model_version": model_name
        })
        
        # Create and "train" model (simplified for demo)
        model = create_model_version(model_name, hidden_dim)
        
        # Simulate training metrics
        np.random.seed(hash(model_name) % 2**32)
        for epoch in range(epochs):
            loss = 1.0 / (epoch + 1) + np.random.random() * 0.1
            acc = 0.5 + 0.05 * epoch + np.random.random() * 0.05
            mlflow.log_metrics({
                "train_loss": loss,
                "val_accuracy": min(acc, 0.95)
            }, step=epoch)
        
        # Final metrics
        final_accuracy = 0.8 + (hidden_dim / 256) * 0.15 + np.random.random() * 0.05
        mlflow.log_metric("final_accuracy", min(final_accuracy, 0.98))
        
        # Log model to registry
        model_info = mlflow.pytorch.log_model(
            model,
            artifact_path="model",
            registered_model_name=registered_name
        )
        
        print(f"Registered {model_name}")
        print(f"  Run ID: {run.info.run_id}")
        print(f"  Model URI: {model_info.model_uri}")
        print(f"  Final accuracy: {final_accuracy:.4f}")
        
        return run.info.run_id, model_info

In [None]:
# Register multiple model versions
registered_name = "SentimentClassifier"

# Version 1: Small model
run_id_v1, _ = train_and_register_model(
    model_name="v1-small",
    hidden_dim=32,
    learning_rate=0.01
)

# Version 2: Medium model
run_id_v2, _ = train_and_register_model(
    model_name="v2-medium",
    hidden_dim=64,
    learning_rate=0.005
)

# Version 3: Large model
run_id_v3, _ = train_and_register_model(
    model_name="v3-large",
    hidden_dim=128,
    learning_rate=0.001
)

---

## Part 3: Managing Model Versions

In [None]:
# Initialize MLflow client
client = MlflowClient()

# List all versions of our model
print(f"Model: {registered_name}")
print("="*60)

try:
    versions = client.search_model_versions(f"name='{registered_name}'")
    
    for version in versions:
        print(f"\nVersion {version.version}:")
        print(f"  Status: {version.status}")
        print(f"  Stage: {version.current_stage}")
        print(f"  Run ID: {version.run_id}")
        print(f"  Created: {version.creation_timestamp}")
except Exception as e:
    print(f"Error listing versions: {e}")

In [None]:
# Transition model versions through stages

def promote_model(model_name: str, version: int, stage: str):
    """
    Promote a model version to a new stage.
    
    Args:
        model_name: Registered model name
        version: Version number to promote
        stage: Target stage (Staging, Production, Archived)
    """
    client.transition_model_version_stage(
        name=model_name,
        version=version,
        stage=stage,
        archive_existing_versions=(stage == "Production")
    )
    print(f"Model {model_name} v{version} -> {stage}")

# Example workflow:
# 1. V1 was our first production model
# 2. V2 is now in staging (testing)
# 3. V3 is newly trained

try:
    print("\nPromoting models through lifecycle:")
    promote_model(registered_name, 1, "Production")
    promote_model(registered_name, 2, "Staging")
    print("\nLifecycle stages updated!")
except Exception as e:
    print(f"Note: {e}")

In [None]:
# Get model by stage
def get_production_model(model_name: str):
    """
    Get the current production model.
    """
    try:
        versions = client.get_latest_versions(model_name, stages=["Production"])
        if versions:
            return versions[0]
        return None
    except Exception as e:
        print(f"Error: {e}")
        return None

def get_staging_model(model_name: str):
    """
    Get the current staging model.
    """
    try:
        versions = client.get_latest_versions(model_name, stages=["Staging"])
        if versions:
            return versions[0]
        return None
    except Exception as e:
        print(f"Error: {e}")
        return None

prod_model = get_production_model(registered_name)
staging_model = get_staging_model(registered_name)

if prod_model:
    print(f"Production model: v{prod_model.version}")
if staging_model:
    print(f"Staging model: v{staging_model.version}")

---

## Part 4: Model Promotion Workflow

Let's implement a proper promotion workflow with validation.

In [None]:
from dataclasses import dataclass
from typing import Optional, List

@dataclass
class PromotionCheck:
    """Result of a promotion validation check."""
    name: str
    passed: bool
    message: str

class ModelPromoter:
    """
    Manages model promotion with validation checks.
    
    Example:
        promoter = ModelPromoter(client, "SentimentClassifier")
        success = promoter.promote_to_production(
            version=2,
            min_accuracy=0.85,
            require_staging=True
        )
    """
    
    def __init__(self, client: MlflowClient, model_name: str):
        self.client = client
        self.model_name = model_name
    
    def validate_for_production(
        self,
        version: int,
        min_accuracy: float = 0.8,
        require_staging: bool = True
    ) -> List[PromotionCheck]:
        """
        Run validation checks before promotion.
        """
        checks = []
        
        # Get model version info
        try:
            model_version = self.client.get_model_version(self.model_name, str(version))
        except Exception as e:
            checks.append(PromotionCheck(
                name="version_exists",
                passed=False,
                message=f"Version {version} not found: {e}"
            ))
            return checks
        
        checks.append(PromotionCheck(
            name="version_exists",
            passed=True,
            message=f"Version {version} exists"
        ))
        
        # Check if it was in staging
        if require_staging:
            was_staged = model_version.current_stage == "Staging"
            checks.append(PromotionCheck(
                name="staging_tested",
                passed=was_staged or not require_staging,
                message=f"Current stage: {model_version.current_stage}"
            ))
        
        # Check accuracy from run metrics
        run_id = model_version.run_id
        try:
            run = self.client.get_run(run_id)
            accuracy = run.data.metrics.get('final_accuracy', 0)
            checks.append(PromotionCheck(
                name="accuracy_threshold",
                passed=accuracy >= min_accuracy,
                message=f"Accuracy {accuracy:.4f} (min: {min_accuracy})"
            ))
        except Exception as e:
            checks.append(PromotionCheck(
                name="accuracy_threshold",
                passed=False,
                message=f"Could not get metrics: {e}"
            ))
        
        # Check model can be loaded
        try:
            model_uri = f"models:/{self.model_name}/{version}"
            # Don't actually load, just check URI is valid
            checks.append(PromotionCheck(
                name="model_loadable",
                passed=True,
                message=f"Model URI: {model_uri}"
            ))
        except Exception as e:
            checks.append(PromotionCheck(
                name="model_loadable",
                passed=False,
                message=f"Cannot load model: {e}"
            ))
        
        return checks
    
    def promote_to_production(
        self,
        version: int,
        min_accuracy: float = 0.8,
        require_staging: bool = True,
        dry_run: bool = False
    ) -> bool:
        """
        Promote a model to production with validation.
        """
        print(f"\nPromotion Request: {self.model_name} v{version} -> Production")
        print("="*60)
        
        # Run validation
        checks = self.validate_for_production(version, min_accuracy, require_staging)
        
        print("\nValidation Checks:")
        all_passed = True
        for check in checks:
            icon = "‚úÖ" if check.passed else "‚ùå"
            print(f"  {icon} {check.name}: {check.message}")
            if not check.passed:
                all_passed = False
        
        if not all_passed:
            print("\n‚ùå Promotion BLOCKED: Validation failed")
            return False
        
        if dry_run:
            print("\nüîç DRY RUN: Would promote to Production")
            return True
        
        # Perform promotion
        try:
            self.client.transition_model_version_stage(
                name=self.model_name,
                version=str(version),
                stage="Production",
                archive_existing_versions=True
            )
            print(f"\n‚úÖ SUCCESS: v{version} is now in Production")
            return True
        except Exception as e:
            print(f"\n‚ùå ERROR: {e}")
            return False
    
    def rollback(self, to_version: int) -> bool:
        """
        Rollback to a previous version.
        """
        print(f"\n‚ö†Ô∏è  ROLLBACK: {self.model_name} to v{to_version}")
        
        try:
            # Archive current production
            current_prod = get_production_model(self.model_name)
            if current_prod:
                self.client.transition_model_version_stage(
                    name=self.model_name,
                    version=current_prod.version,
                    stage="Archived"
                )
                print(f"  Archived v{current_prod.version}")
            
            # Promote rollback target
            self.client.transition_model_version_stage(
                name=self.model_name,
                version=str(to_version),
                stage="Production"
            )
            print(f"  Promoted v{to_version} to Production")
            print("\n‚úÖ Rollback complete!")
            return True
        except Exception as e:
            print(f"\n‚ùå Rollback failed: {e}")
            return False

In [None]:
# Test the promotion workflow
promoter = ModelPromoter(client, registered_name)

# Try to promote v2 to production (dry run first)
promoter.promote_to_production(
    version=2,
    min_accuracy=0.75,
    require_staging=False,  # Skip staging requirement for demo
    dry_run=True
)

In [None]:
# Now actually promote
success = promoter.promote_to_production(
    version=2,
    min_accuracy=0.75,
    require_staging=False,
    dry_run=False
)

---

## Part 5: Hugging Face Hub Integration

For LLM models, Hugging Face Hub is often a better choice.

In [None]:
# Hugging Face Hub usage pattern
hf_example = '''
from huggingface_hub import HfApi, create_repo, upload_folder
from transformers import AutoModelForCausalLM, AutoTokenizer

# Initialize API
api = HfApi()

# Create a private model repository
repo_id = "your-username/my-finetuned-llm"
create_repo(repo_id, private=True, exist_ok=True)

# Save and push model
model.save_pretrained("./my-model")
tokenizer.save_pretrained("./my-model")

# Upload to Hub
api.upload_folder(
    folder_path="./my-model",
    repo_id=repo_id,
    commit_message="v1.0: Initial fine-tuned model"
)

# Version with branches
api.create_branch(repo_id, branch="v1.1")
api.upload_folder(
    folder_path="./my-model-v1.1",
    repo_id=repo_id,
    revision="v1.1",
    commit_message="v1.1: Improved training"
)

# Load specific version
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    revision="v1.0"  # or "v1.1", or "main"
)
'''

print("Hugging Face Hub Model Versioning:")
print("="*60)
print(hf_example)

In [None]:
# Model card template for documentation
model_card_template = '''
---
license: apache-2.0
language:
  - en
tags:
  - sentiment-analysis
  - fine-tuned
  - dgx-spark
datasets:
  - custom-sentiment
metrics:
  - accuracy
  - f1
model-index:
  - name: SentimentClassifier-v2
    results:
      - task:
          name: Text Classification
          type: text-classification
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.92
---

# SentimentClassifier v2

## Model Description

Fine-tuned sentiment classifier trained on custom dataset.

## Training Details

- **Base Model:** microsoft/phi-2
- **Training Hardware:** DGX Spark (128GB unified memory)
- **Training Time:** 2 hours
- **Dataset Size:** 50,000 examples

## Usage

```python
from transformers import pipeline

classifier = pipeline("sentiment-analysis", model="your-username/sentiment-v2")
result = classifier("This product is amazing!")
print(result)  # [{"label": "POSITIVE", "score": 0.98}]
```

## Version History

| Version | Date | Changes | Accuracy |
|---------|------|---------|----------|
| v2.0 | 2024-01 | Improved training data | 92% |
| v1.0 | 2023-12 | Initial release | 85% |
'''

print("Model Card Template:")
print(model_card_template)

---

## Part 6: Dataset Versioning with DVC

Models aren't the only thing that needs versioning - data does too!

In [None]:
# DVC (Data Version Control) workflow
dvc_workflow = '''
# Initialize DVC in your project
dvc init

# Track a large dataset
dvc add data/training_dataset.parquet
# Creates data/training_dataset.parquet.dvc (small metadata file)
# Add .dvc file to git, actual data goes to DVC remote

# Configure remote storage (S3, GCS, local, etc.)
dvc remote add -d myremote s3://my-bucket/dvc-storage

# Push data to remote
dvc push

# Pull data on another machine
git clone <your-repo>
dvc pull  # Downloads the actual data

# Version data with git
# When dataset changes:
dvc add data/training_dataset.parquet
git add data/training_dataset.parquet.dvc
git commit -m "Update training data v2"
dvc push

# Checkout old data version
git checkout v1.0
dvc checkout  # Gets the data that matches that commit
'''

print("DVC Data Versioning Workflow:")
print("="*60)
print(dvc_workflow)

---

## Try It Yourself

Create a complete versioning workflow that:
1. Trains 3 model versions with different hyperparameters
2. Registers all versions in MLflow
3. Promotes the best one to staging, then production
4. Simulates a rollback scenario

<details>
<summary>Hint</summary>

Use a loop to train multiple versions:
```python
configs = [
    {"hidden_dim": 32, "lr": 0.01},
    {"hidden_dim": 64, "lr": 0.005},
    {"hidden_dim": 128, "lr": 0.001},
]

for i, config in enumerate(configs):
    train_and_register_model(f"v{i+1}", **config)
```

</details>

In [None]:
# YOUR CODE HERE
# Create your versioning workflow

# Your workflow code...

---

## Common Mistakes

### Mistake 1: Skipping Staging

```python
# Wrong - straight to production
promote(model, "Production")  # Risky!

# Right - test in staging first
promote(model, "Staging")
run_integration_tests()
if tests_pass:
    promote(model, "Production")
```
**Why:** Staging catches issues before they affect users.

### Mistake 2: Not Keeping Old Versions

```python
# Wrong - deleting old models
delete_model_version("MyModel", 1)
delete_model_version("MyModel", 2)

# Right - archive instead
archive_model_version("MyModel", 1)
archive_model_version("MyModel", 2)
```
**Why:** You may need to rollback or compare.

### Mistake 3: No Validation Before Promotion

```python
# Wrong - blind promotion
promote_to_production(new_model)

# Right - validate first
if validate_model(new_model):
    if compare_to_current(new_model) > 0:  # Better than current
        promote_to_production(new_model)
```
**Why:** Automated checks prevent human error.

### Mistake 4: Inconsistent Naming

```python
# Wrong - inconsistent names
register(model, "sentiment_classifier")
register(model, "SentimentClassifier")
register(model, "sentiment-classifier-v2")

# Right - consistent naming convention
register(model, "SentimentClassifier")  # PascalCase for model names
# Use version numbers, not name suffixes
```
**Why:** Inconsistent naming causes confusion and errors.

---

## Checkpoint

You've learned:
- How model registries manage version lifecycle
- How to use MLflow Model Registry
- How to implement promotion workflows with validation
- How to safely rollback to previous versions
- Basics of data versioning with DVC

---

## Challenge (Optional)

Build a complete CI/CD pipeline that:
1. Automatically trains and registers new model versions on code changes
2. Runs automated benchmarks
3. Promotes to staging if benchmarks pass
4. Requires manual approval for production
5. Monitors for drift and auto-triggers retraining

---

## Further Reading

- [MLflow Model Registry](https://mlflow.org/docs/latest/model-registry.html)
- [Hugging Face Hub](https://huggingface.co/docs/hub/)
- [DVC Documentation](https://dvc.org/doc)
- [Neptune.ai Model Registry](https://docs.neptune.ai/model_registry/)

---

## Cleanup

In [None]:
# Clean up
import shutil

# Keep mlruns for future use, but you can clean if needed:
# shutil.rmtree("./mlruns", ignore_errors=True)

print("Cleanup complete!")
print("Note: MLflow data preserved in ./mlruns for future notebooks.")

---

## Next Steps

The final notebook in this module covers **reproducibility** - ensuring your experiments can be exactly replicated. This is crucial for scientific validity and debugging production issues.

**Continue to:** [07-reproducibility-audit.ipynb](07-reproducibility-audit.ipynb)