# Lab 4.2.4: Model Registry & Version Control

**Module:** 4.2 - Benchmarking, Evaluation & MLOps  
**Time:** 2 hours  
**Difficulty:** ⭐⭐⭐

---

## 🎯 Learning Objectives

By the end of this notebook, you will:
- [ ] Understand model versioning and why it matters
- [ ] Use MLflow Model Registry to manage model versions
- [ ] Implement stage transitions (Staging → Production)
- [ ] Version datasets with DVC
- [ ] Set up a complete versioning workflow

---

## 📚 Prerequisites

- Completed: Lab 4.2.3 (MLflow Setup)
- Knowledge of: Git basics, MLflow tracking
- Hardware: Any system

---

## 🌍 Real-World Context

**Imagine you're a software company shipping a new AI feature.** Questions you need to answer:

- "Which model is currently in production?"
- "What changed between v1.2 and v1.3?"
- "The new model has issues - can we rollback?"
- "Which dataset was used to train the production model?"

**Without proper versioning, you're flying blind!** Companies like:
- **Uber:** Version every model serving millions of users
- **Spotify:** Track which recommendation model is live
- **Tesla:** Version models that control actual cars!

Model versioning is just as important as code versioning.

---

## 🧒 ELI5: Model Registry

> **Imagine you're a baker with many cookie recipes.** You need to keep track of:
>
> - **Version 1:** Original chocolate chip recipe
> - **Version 2:** Added sea salt (customers love it!)
> - **Version 3:** Made vegan (testing with small group)
>
> You also have **stages**:
> - **Development:** New recipes you're testing
> - **Staging:** Recipes getting customer feedback
> - **Production:** The recipe you actually sell
>
> **A Model Registry is the same thing!** It's a catalog that tells you:
> - All versions of a model
> - Which version is in production
> - What changed between versions
> - Who approved each version

---

## Part 1: MLflow Model Registry

Let's set up the MLflow Model Registry.

In [None]:
import mlflow
import mlflow.pytorch
from mlflow.tracking import MlflowClient
import os
import torch
import torch.nn as nn

# Set up MLflow
MLFLOW_DIR = os.path.abspath("../mlflow")
os.makedirs(MLFLOW_DIR, exist_ok=True)
mlflow.set_tracking_uri(f"file://{MLFLOW_DIR}")

# Create client for registry operations
client = MlflowClient()

print(f"MLflow tracking URI: {mlflow.get_tracking_uri()}")
print(f"MLflow version: {mlflow.__version__}")

In [None]:
# Create a simple model for demonstration
class SentimentClassifier(nn.Module):
    """Simple sentiment classification model."""
    
    def __init__(self, vocab_size=10000, embedding_dim=128, hidden_dim=256):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, 2)  # Binary classification
        self.version = "1.0.0"  # Track version in model
        
    def forward(self, x):
        embedded = self.embedding(x)
        _, (hidden, _) = self.lstm(embedded)
        return self.fc(hidden[-1])

# Create model
model_base = SentimentClassifier()
print(f"Created base model with {sum(p.numel() for p in model_base.parameters()):,} parameters")

In [None]:
# Set up experiment
experiment_name = "Sentiment-Analysis-Models"
mlflow.set_experiment(experiment_name)

# Register model name
MODEL_NAME = "SentimentClassifier"

print(f"Experiment: {experiment_name}")
print(f"Model name for registry: {MODEL_NAME}")

In [None]:
# Train and register Version 1
with mlflow.start_run(run_name="sentiment-v1") as run:
    
    # Log parameters
    mlflow.log_params({
        "vocab_size": 10000,
        "embedding_dim": 128,
        "hidden_dim": 256,
        "model_version": "1.0.0",
        "training_data": "imdb_reviews_v1"
    })
    
    # Simulate training metrics
    mlflow.log_metrics({
        "accuracy": 0.85,
        "f1_score": 0.84,
        "eval_loss": 0.42
    })
    
    # Log and register the model
    sample_input = torch.randint(0, 10000, (1, 50))
    
    mlflow.pytorch.log_model(
        model_base,
        artifact_path="model",
        registered_model_name=MODEL_NAME,  # This registers it!
        input_example=sample_input.numpy()
    )
    
    run_id_v1 = run.info.run_id
    print(f"✅ Model v1 registered!")
    print(f"   Run ID: {run_id_v1}")

In [None]:
# View registered models
print("\n📋 Registered Models:")
print("=" * 50)

for rm in client.search_registered_models():
    print(f"\nModel: {rm.name}")
    print(f"  Created: {rm.creation_timestamp}")
    
    # Get all versions
    for version in client.search_model_versions(f"name='{rm.name}'"):
        print(f"  Version {version.version}:")
        print(f"    Stage: {version.current_stage}")
        print(f"    Run ID: {version.run_id[:8]}...")

---

## Part 2: Managing Model Versions

Let's create and manage multiple versions of our model.

In [None]:
# Create an improved model (v2)
class SentimentClassifierLarge(nn.Module):
    """Improved sentiment classifier with attention."""
    
    def __init__(self, vocab_size=10000, embedding_dim=128, hidden_dim=256):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, batch_first=True, bidirectional=True)
        # Simple attention mechanism
        self.attention = nn.Linear(hidden_dim * 2, 1)
        self.fc = nn.Linear(hidden_dim * 2, 2)
        self.version = "2.0.0"
        
    def forward(self, x):
        embedded = self.embedding(x)
        lstm_out, _ = self.lstm(embedded)
        # Attention weights
        attn_weights = torch.softmax(self.attention(lstm_out), dim=1)
        context = torch.sum(attn_weights * lstm_out, dim=1)
        return self.fc(context)

model_large = SentimentClassifierLarge()
print(f"Created large model with {sum(p.numel() for p in model_large.parameters()):,} parameters")
print(f"(Large model has more parameters due to bidirectional LSTM and attention)")

In [None]:
# Register Version 2
with mlflow.start_run(run_name="sentiment-v2") as run:
    
    mlflow.log_params({
        "vocab_size": 10000,
        "embedding_dim": 128,
        "hidden_dim": 256,
        "model_version": "2.0.0",
        "bidirectional": True,
        "attention": True,
        "training_data": "imdb_reviews_v2"  # Updated dataset
    })
    
    # Better metrics!
    mlflow.log_metrics({
        "accuracy": 0.89,
        "f1_score": 0.88,
        "eval_loss": 0.35
    })
    
    sample_input = torch.randint(0, 10000, (1, 50))
    
    mlflow.pytorch.log_model(
        model_large,
        artifact_path="model",
        registered_model_name=MODEL_NAME,
        input_example=sample_input.numpy()
    )
    
    run_id_v2 = run.info.run_id
    print(f"✅ Model v2 registered!")

In [None]:
# View all versions
print(f"\n📋 All versions of {MODEL_NAME}:")
print("=" * 60)

versions = client.search_model_versions(f"name='{MODEL_NAME}'")

for v in versions:
    # Get the run to access metrics
    run_data = mlflow.get_run(v.run_id)
    accuracy = run_data.data.metrics.get('accuracy', 'N/A')
    
    print(f"\nVersion {v.version}:")
    print(f"  Stage: {v.current_stage}")
    print(f"  Accuracy: {accuracy}")
    print(f"  Created: {v.creation_timestamp}")
    print(f"  Run ID: {v.run_id[:16]}...")

---

## Part 3: Stage Transitions

Models go through stages: None → Staging → Production → Archived

### 🧒 ELI5: Model Stages

> **Think of a restaurant kitchen:**
>
> 1. **None (Development):** Chef is experimenting with new dishes
> 2. **Staging:** Dish is being tested on a few customers
> 3. **Production:** Dish is on the main menu, served to everyone
> 4. **Archived:** Old dish, kept for reference but not served
>
> Moving through stages requires **approval** - you don't want an untested dish on the main menu!

In [None]:
# Transition v1 to Staging
client.transition_model_version_stage(
    name=MODEL_NAME,
    version="1",
    stage="Staging",
    archive_existing_versions=False  # Don't archive other staging versions
)

print(f"✅ {MODEL_NAME} v1 transitioned to Staging")

In [None]:
# After testing, promote v1 to Production
client.transition_model_version_stage(
    name=MODEL_NAME,
    version="1",
    stage="Production",
    archive_existing_versions=True  # Archive any existing production version
)

print(f"✅ {MODEL_NAME} v1 promoted to Production!")

In [None]:
# Now test v2 in Staging
client.transition_model_version_stage(
    name=MODEL_NAME,
    version="2",
    stage="Staging"
)

print(f"✅ {MODEL_NAME} v2 moved to Staging for testing")

In [None]:
# View current stage status
print(f"\n📊 Current Stage Status for {MODEL_NAME}:")
print("=" * 50)

for stage in ["None", "Staging", "Production", "Archived"]:
    versions = client.get_latest_versions(MODEL_NAME, stages=[stage])
    if versions:
        for v in versions:
            run = mlflow.get_run(v.run_id)
            acc = run.data.metrics.get('accuracy', 'N/A')
            print(f"  {stage}: Version {v.version} (accuracy: {acc})")
    else:
        print(f"  {stage}: (none)")

In [None]:
# Simulate: v2 passes tests, promote to Production (replaces v1)
print("\n🚀 Promoting v2 to Production...")

client.transition_model_version_stage(
    name=MODEL_NAME,
    version="2",
    stage="Production",
    archive_existing_versions=True  # This archives v1
)

print("\n📊 Updated Stage Status:")
for stage in ["Staging", "Production", "Archived"]:
    versions = client.get_latest_versions(MODEL_NAME, stages=[stage])
    if versions:
        for v in versions:
            print(f"  {stage}: Version {v.version}")

---

## Part 4: Loading Models by Stage

In production, you load models by stage, not version number.

In [None]:
# Load the production model
production_model = mlflow.pytorch.load_model(
    f"models:/{MODEL_NAME}/Production"
)

print(f"Loaded production model: {type(production_model).__name__}")
print(f"Model version attribute: {getattr(production_model, 'version', 'N/A')}")

In [None]:
# Test inference with production model
test_input = torch.randint(0, 10000, (2, 50))  # Batch of 2

with torch.no_grad():
    output = production_model(test_input)
    predictions = torch.softmax(output, dim=1)

print(f"\nInference Results:")
print(f"  Input shape: {test_input.shape}")
print(f"  Output shape: {output.shape}")
print(f"  Predictions:")
for i, pred in enumerate(predictions):
    sentiment = "Positive" if pred[1] > pred[0] else "Negative"
    confidence = max(pred).item()
    print(f"    Sample {i+1}: {sentiment} ({confidence:.2%} confidence)")

In [None]:
# Different ways to load models
print("""
📚 Model Loading Options:
{'='*50}

# By stage (recommended for production):
model = mlflow.pytorch.load_model("models:/ModelName/Production")
model = mlflow.pytorch.load_model("models:/ModelName/Staging")

# By specific version:
model = mlflow.pytorch.load_model("models:/ModelName/1")
model = mlflow.pytorch.load_model("models:/ModelName/2")

# By run ID:
model = mlflow.pytorch.load_model("runs:/abc123/model")

# From local path (after downloading):
model = mlflow.pytorch.load_model("/path/to/model")
""")

---

## Part 5: Adding Model Descriptions and Tags

Good documentation is essential for model management.

In [None]:
# Update model description
client.update_registered_model(
    name=MODEL_NAME,
    description="""Sentiment Analysis Model
    
Classifies text as positive or negative sentiment.
Trained on IMDB movie reviews dataset.

Input: Tokenized text (max 50 tokens)
Output: 2-class probability distribution

Owner: ML Team
Contact: ml-team@company.com
"""
)

print("✅ Model description updated!")

In [None]:
# Add version-specific description
client.update_model_version(
    name=MODEL_NAME,
    version="2",
    description="""Version 2.0.0 - Major Update
    
Changes:
- Added bidirectional LSTM
- Added attention mechanism
- Trained on larger dataset (v2)

Improvements:
- Accuracy: 85% → 89% (+4%)
- F1 Score: 84% → 88% (+4%)

Approved by: Senior ML Engineer
"""
)

print("✅ Version description updated!")

In [None]:
# Add tags to model version
client.set_model_version_tag(
    name=MODEL_NAME,
    version="2",
    key="validation_status",
    value="approved"
)

client.set_model_version_tag(
    name=MODEL_NAME,
    version="2",
    key="approved_by",
    value="ml-lead@company.com"
)

client.set_model_version_tag(
    name=MODEL_NAME,
    version="2",
    key="dataset_version",
    value="imdb_reviews_v2"
)

print("✅ Version tags added!")

In [None]:
# View model with all metadata
print(f"\n📋 {MODEL_NAME} - Full Details")
print("=" * 60)

model_info = client.get_registered_model(MODEL_NAME)
print(f"\nDescription:\n{model_info.description}")

print("\n\nVersions:")
for v in client.search_model_versions(f"name='{MODEL_NAME}'"):
    print(f"\n  Version {v.version} ({v.current_stage}):")
    if v.description:
        # Show first 100 chars
        desc_preview = v.description[:100] + "..." if len(v.description) > 100 else v.description
        print(f"    Description: {desc_preview}")
    if v.tags:
        print(f"    Tags:")
        for key, val in v.tags.items():
            print(f"      {key}: {val}")

---

## Part 6: Dataset Versioning

Models are trained on data. To reproduce results, we need to version data too!

---

### ⚠️ DVC on ARM64 (DGX Spark)

**Important:** DVC (Data Version Control) has limited ARM64 support and may not work reliably on DGX Spark's ARM-based CPU.

**Recommended approach for DGX Spark:**
1. Use the **hash-based versioning** approach shown in this notebook
2. Store data manifests with SHA256 checksums
3. Log data versions as MLflow parameters

This provides the same reproducibility benefits without DVC dependencies.

In [None]:
# Install DVC (Data Version Control)
# Note: DVC may have limited ARM64 support. The hash-based versioning 
# shown below is a good alternative for DGX Spark.

import subprocess
import sys

try:
    import dvc
    print(f"DVC already installed: {dvc.__version__}")
except ImportError:
    print("Attempting to install DVC...")
    print("Note: DVC on ARM64 may have limited support.")
    print("Consider using the hash-based versioning approach shown below.")
    try:
        subprocess.check_call([sys.executable, "-m", "pip", "install", "dvc", "-q"])
        import dvc
        print(f"DVC installed: {dvc.__version__}")
    except Exception as e:
        print(f"DVC installation failed: {e}")
        print("Using hash-based versioning instead (see next cells)")

In [None]:
import os

# Create a sample data directory
DATA_DIR = os.path.abspath("../data/versioned")
os.makedirs(DATA_DIR, exist_ok=True)

print(f"Data directory: {DATA_DIR}")

In [None]:
# Create sample training data
import json

# Version 1 of dataset
train_data_v1 = [
    {"text": "This movie was amazing!", "label": 1},
    {"text": "Terrible waste of time.", "label": 0},
    {"text": "Great performances all around.", "label": 1},
    {"text": "Boring and predictable.", "label": 0},
    # ... more samples
]

with open(f"{DATA_DIR}/train_v1.json", 'w') as f:
    json.dump(train_data_v1, f, indent=2)

print(f"Created training data v1: {len(train_data_v1)} samples")

In [None]:
# DVC commands for versioning (run in terminal)
print("""
📚 DVC Data Versioning Workflow
{'='*50}

# 1. Initialize DVC in your project
dvc init

# 2. Track your data file
dvc add data/train.json
# This creates:
#   - data/train.json.dvc (small pointer file, committed to git)
#   - .gitignore (ignores the actual data file)

# 3. Commit to git
git add data/train.json.dvc .gitignore
git commit -m "Add training data v1"
git tag v1.0-data

# 4. Push data to remote storage
dvc remote add -d myremote s3://mybucket/dvc
dvc push

# 5. Update data and create new version
# ... modify train.json ...
dvc add data/train.json
git add data/train.json.dvc
git commit -m "Update training data v2"
git tag v2.0-data
dvc push

# 6. Switch between versions
git checkout v1.0-data
dvc checkout
# Now you have v1 of the data!

git checkout v2.0-data  
dvc checkout
# Now you have v2 of the data!
""")

In [None]:
# Alternative: Simple hash-based versioning without DVC
import hashlib

def compute_data_hash(file_path: str) -> str:
    """Compute SHA256 hash of a data file."""
    sha256 = hashlib.sha256()
    with open(file_path, 'rb') as f:
        for chunk in iter(lambda: f.read(4096), b''):
            sha256.update(chunk)
    return sha256.hexdigest()[:12]  # Short hash

# Create a data manifest
def create_data_manifest(data_dir: str) -> dict:
    """Create a manifest of all data files with hashes."""
    manifest = {
        "version": "1.0",
        "files": {}
    }
    
    for root, _, files in os.walk(data_dir):
        for file in files:
            if file.endswith('.json'):
                file_path = os.path.join(root, file)
                rel_path = os.path.relpath(file_path, data_dir)
                manifest["files"][rel_path] = {
                    "hash": compute_data_hash(file_path),
                    "size": os.path.getsize(file_path)
                }
    
    return manifest

manifest = create_data_manifest(DATA_DIR)
print("\n📋 Data Manifest:")
print(json.dumps(manifest, indent=2))

In [None]:
# Log data version with model in MLflow
with mlflow.start_run(run_name="model-with-data-tracking"):
    
    # Log data manifest
    data_manifest_path = "/tmp/data_manifest.json"
    with open(data_manifest_path, 'w') as f:
        json.dump(manifest, f, indent=2)
    
    mlflow.log_artifact(data_manifest_path, "data")
    
    # Log data hash as parameter
    data_hash = list(manifest["files"].values())[0]["hash"] if manifest["files"] else "none"
    mlflow.log_param("data_hash", data_hash)
    mlflow.log_param("data_version", manifest["version"])
    
    print(f"✅ Logged data manifest with hash: {data_hash}")

---

## ✋ Try It Yourself: Exercise

**Task:** Create a complete model lifecycle workflow.

1. Create 3 versions of a simple model (vary architecture)
2. Register all versions in MLflow
3. Promote the best one to Production
4. Add descriptions and tags
5. Write a function to load the current production model

<details>
<summary>💡 Hint</summary>

Use a loop to create versions with different hyperparameters:
```python
for hidden_size in [128, 256, 512]:
    model = MyModel(hidden_size=hidden_size)
    # Train and register...
```

</details>

In [None]:
# YOUR CODE HERE

# Step 1: Create 3 model versions

# Step 2: Register all versions

# Step 3: Compare and promote best to Production

# Step 4: Add descriptions and tags

# Step 5: Write load function

---

## ⚠️ Common Mistakes

### Mistake 1: Not Versioning Data with Models

In [None]:
# ❌ Wrong: Only versioning models
# mlflow.pytorch.log_model(model, "model", registered_model_name="MyModel")
# # Where's the training data?

# ✅ Right: Version both model AND data
# mlflow.log_param("data_version", "v1.0")
# mlflow.log_param("data_hash", compute_data_hash(data_path))
# mlflow.log_artifact(data_manifest_path)  # Log data info
# mlflow.pytorch.log_model(model, "model", registered_model_name="MyModel")

print("Always log data version/hash with your model!")

### Mistake 2: Skipping the Staging Phase

In [None]:
# ❌ Wrong: Pushing directly to Production
# client.transition_model_version_stage(name="Model", version="5", stage="Production")
# # What if it has bugs?

# ✅ Right: Test in Staging first
# client.transition_model_version_stage(name="Model", version="5", stage="Staging")
# # Run tests, validate metrics...
# if all_tests_pass:
#     client.transition_model_version_stage(name="Model", version="5", stage="Production")

print("Always test in Staging before Production!")

---

## 🎉 Checkpoint

You've learned:
- ✅ Using MLflow Model Registry
- ✅ Managing model versions
- ✅ Stage transitions (Staging → Production)
- ✅ Loading models by stage
- ✅ Dataset versioning basics

---

## 📖 Further Reading

- [MLflow Model Registry](https://mlflow.org/docs/latest/model-registry.html)
- [DVC Documentation](https://dvc.org/doc)
- [HuggingFace Hub](https://huggingface.co/docs/hub)

---

## 🧹 Cleanup

In [None]:
import gc
gc.collect()
if torch.cuda.is_available():
    torch.cuda.empty_cache()