# 2.4 Introduction to MLOps: Model and Experiment Versioning

MLOps (Machine Learning Operations) is a set of practices that aims to streamline the deployment, monitoring, and management of machine learning models in production environments. One of the key aspects of MLOps is **versioning**—tracking changes to models, datasets, and experiments to ensure reproducibility and collaboration.

## Learning Objectives

- Understand the motivation and need for MLOps in professional environments.
- Learn about model and experiment versioning techniques.
- Explore popular tools and architectures used in real-world MLOps workflows.
- Compare different approaches and discuss their pros and cons.
- Implement basic versioning using open-source tools.

---

## Why MLOps and Versioning?

In professional environments, machine learning projects involve multiple stakeholders, frequent changes, and the need for reproducibility. Without proper versioning, it is difficult to:

- Track which data, code, and parameters produced a given model

- Collaborate across teams

- Reproduce results

- Deploy and monitor models reliably


![MLOps Lifecycle](https://ml-ops.org/img/ml-engineering.jpg)

*Figure: The MLOps lifecycle covers data, code, model, and deployment versioning*

---

## Key Concepts in MLOps Versioning


- **Model Versioning**: Tracking different versions of trained models, including their parameters, code, and metadata.
- **Experiment Tracking**: Logging hyperparameters, metrics, and artifacts for each experiment run.
- **Data Versioning**: Ensuring the exact dataset used for training/testing is tracked and reproducible.

These concepts are essential for reproducibility, auditing, and collaboration in real-world ML projects.

## Real-World MLOps Architectures and Tools


Professional MLOps workflows use a combination of tools and platforms to manage versioning, deployment, and monitoring. Common architectures include:


- **Local Development + Git + MLflow/DVC**: Suitable for small teams and research projects.
- **Cloud Platforms (AWS SageMaker, Azure ML, GCP Vertex AI)**: Provide integrated experiment tracking, model registry, and deployment.
- **Hybrid Solutions**: Combine open-source tools (MLflow, DVC, Kubeflow) with cloud storage and CI/CD pipelines.

| Tool         | Model Versioning | Experiment Tracking | Data Versioning | Deployment | Notes |
|--------------|------------------|--------------------|-----------------|-----------|-------|
| MLflow       | Yes              | Yes                | Limited         | Yes       | Popular, open-source |
| DVC          | Yes              | Limited            | Yes             | No        | Git-based, open-source |
| Kubeflow     | Yes              | Yes                | Yes             | Yes       | Kubernetes-native |
| SageMaker    | Yes              | Yes                | Yes             | Yes       | AWS cloud |
| Azure ML     | Yes              | Yes                | Yes             | Yes       | Azure cloud |
| Vertex AI    | Yes              | Yes                | Yes             | Yes       | GCP cloud |

---

## Example: Experiment Tracking and Model Versioning with MLflow


[MLflow](https://mlflow.org/) is a popular open-source platform for managing the ML lifecycle. It provides tools for experiment tracking, model registry, and deployment.

**Workflow:**

1. Log parameters, metrics, and artifacts for each experiment run.
2. Register models in the MLflow Model Registry.
3. Deploy models to production from the registry.

![MLflow Architecture](https://mlflow.org/docs/2.7.0/_images/mlflow-overview.png)

*Figure: MLflow architecture for experiment tracking and model versioning*

In [5]:
# Example: MLflow experiment tracking
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load data
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

input_example = X_train[:2]  # Use a small batch for input example

with mlflow.start_run():
    clf = RandomForestClassifier(n_estimators=100)
    clf.fit(X_train, y_train)
    acc = clf.score(X_test, y_test)
    mlflow.log_param('n_estimators', 100)
    mlflow.log_metric('accuracy', acc)
    mlflow.sklearn.log_model(clf, name='model', input_example=input_example)
    print(f'Logged model with accuracy: {acc:.4f}')

Logged model with accuracy: 1.0000


---

## Example: Data Versioning with DVC


[DVC](https://dvc.org/) (Data Version Control) is an open-source tool for versioning datasets and models, built on top of Git. It enables reproducible pipelines and collaborative ML development.

**Workflow:**

1. Track datasets and model files with DVC commands.
2. Store large files in remote storage (S3, GCS, etc.).
3. Share and reproduce experiments across teams.

![DVC Architecture](https://dvc.org/studio-architecture-diagram-722d060ac956da93c0839ddc736a27a0.svg)

*Figure: DVC architecture for data and model versioning*


**Comparison:**

- MLflow focuses on experiment tracking and model registry.
- DVC specializes in data and pipeline versioning.
- Both can be integrated for end-to-end MLOps workflows.

In [None]:
# Example: DVC commands for data and model versioning
!dvc init
!dvc add data/dataset.csv
!git add data/dataset.csv.dvc .gitignore
!git commit -m "Track dataset with DVC"
!dvc remote add -d myremote s3://mybucket/dvcstore
!dvc push

---

## Summary and Practical Recommendations

- Use experiment tracking tools (MLflow, SageMaker, Azure ML) to log parameters, metrics, and artifacts.
- Version datasets and models with DVC or cloud-native solutions.
- Register and deploy models using model registries for traceability.
- Integrate CI/CD pipelines for automated testing and deployment.
- Choose tools and architectures based on team size, project scale, and compliance needs.

**Professional Tip:**
- Combine MLflow (experiment/model tracking) with DVC (data/pipeline versioning) for robust, reproducible workflows.
- For enterprise projects, leverage cloud platforms for scalability and security.

---