# Chapter 61: Introduction to MLOps

## Learning Objectives

By the end of this chapter, you will be able to:

- Understand what MLOps is and why it is essential for deploying and maintaining machine learning systems in production
- Distinguish between MLOps and DevOps and appreciate the unique challenges of operationalising ML
- Identify the key principles of MLOps: automation, reproducibility, versioning, monitoring, and collaboration
- Describe the end‑to‑end MLOps lifecycle, from data collection to model monitoring
- Recognise the typical roles and team structures in an MLOps‑enabled organisation
- Survey the MLOps tooling landscape and select appropriate tools for different stages of the lifecycle
- Assess the maturity of an ML system using MLOps maturity models
- Plan the initial steps to introduce MLOps practices in a project like the NEPSE prediction system

---

## Introduction

Building a high‑accuracy machine learning model for NEPSE stock prediction is a significant achievement, but it is only the beginning. To deliver real value, the model must be deployed into a production environment, integrated with other systems, and maintained over time. This is where **MLOps** (Machine Learning Operations) comes into play.

MLOps is a set of practices that combines machine learning, DevOps, and data engineering to reliably and efficiently deploy and maintain ML systems in production. It aims to automate and streamline the entire ML lifecycle, from data preparation to model monitoring and retraining. Without MLOps, ML projects often fail to deliver long‑term business value due to technical debt, manual processes, and lack of reproducibility.

In this chapter, we will introduce the core concepts of MLOps, its principles, and its lifecycle. We will use the NEPSE prediction system as a concrete example to illustrate how MLOps practices can be applied. By the end, you will have a clear understanding of what it takes to operationalise a time‑series prediction system and the first steps to get there.

---

## 61.1 What is MLOps?

MLOps is a discipline that aims to unify ML system development (Dev) and ML system deployment (Ops). It standardises and streamlines the lifecycle of ML models, from inception to retirement, ensuring that models are reliable, scalable, and maintainable in production.

### 61.1.1 The Need for MLOps

Developing an ML model in a Jupyter notebook is very different from running it in production. In development, the focus is on experimentation: trying different algorithms, features, and hyperparameters. The environment is often isolated, data is static, and reproducibility is limited to the notebook itself.

In production, the model must:

- Handle varying data volumes and velocities.
- Integrate with other systems via APIs.
- Be monitored for performance degradation (drift).
- Be updated or retrained without downtime.
- Scale to meet demand.
- Comply with security and regulatory requirements.

MLOps provides the practices and tools to bridge this gap, reducing the time from model development to deployment while maintaining quality and governance.

### 61.1.2 MLOps vs. DevOps

MLOps is inspired by DevOps, which brought similar principles to software development. Both emphasise automation, continuous integration and delivery (CI/CD), testing, and monitoring. However, ML systems introduce additional complexities:

- **Data is versioned alongside code**: Models depend on data, which changes over time. Reproducing a model requires the exact data version.
- **Experimentation is non‑deterministic**: Model training involves randomness; reproducibility requires seeding and logging.
- **Model performance degrades over time**: Data drift and concept drift mean models must be monitored and retrained.
- **Multiple artifacts**: Besides code, we have datasets, feature definitions, model files, and evaluation metrics.
- **Model explainability and fairness**: These are often required for compliance.

Thus, MLOps extends DevOps to address these unique challenges.

---

## 61.2 MLOps Principles

The core principles of MLOps are:

1. **Automation**: Automate as much of the ML lifecycle as possible, including data validation, training, testing, deployment, and monitoring. This reduces manual errors and speeds up iterations.

2. **Reproducibility**: Every model should be reproducible. This means versioning code, data, and environment, and tracking all parameters and metrics.

3. **Versioning**: All artifacts (code, data, features, models) must be versioned to enable rollbacks, audits, and collaboration.

4. **Testing**: ML systems require testing at multiple levels: data tests (e.g., schema validation), model tests (e.g., accuracy on holdout set), and infrastructure tests (e.g., API response time).

5. **Monitoring**: Continuously monitor model performance, data drift, and system health. Set up alerts for anomalies.

6. **Collaboration**: Enable data scientists, engineers, and operations teams to work together with shared tools and processes.

7. **Continuous Training and Delivery**: Models should be retrained automatically when new data arrives or when performance drops, and new versions should be deployed seamlessly.

---

## 61.3 The MLOps Lifecycle

The MLOps lifecycle encompasses all stages from data collection to model retirement. A typical high‑level view is:

```
Data Collection → Data Validation → Feature Engineering → Model Training → Model Validation → Model Deployment → Model Monitoring → (Feedback loop back to Data Collection)
```

Let's break down each stage with the NEPSE system in mind.

### 61.3.1 Data Collection and Ingestion

Raw data from various sources (e.g., NEPSE CSV files, live feeds) is ingested into a data lake or warehouse. This step must be reliable and handle both batch and streaming data.

**MLOps considerations**: Automate data ingestion pipelines, validate schema and freshness, and version the raw data.

### 61.3.2 Data Validation

Check for data quality issues: missing values, outliers, schema changes. For NEPSE, we might validate that the 'Close' column is numeric and within a reasonable range.

**MLOps considerations**: Write automated data tests that run before training or inference.

### 61.3.3 Feature Engineering

Transform raw data into features (e.g., lags, technical indicators). This step must be consistent between training and serving to avoid training‑serving skew.

**MLOps considerations**: Use a feature store to define and serve features consistently. Version feature definitions.

### 61.3.4 Model Training

Train models using the latest features. This may involve hyperparameter tuning and cross‑validation.

**MLOps considerations**: Automate training runs, track experiments (parameters, metrics), and store model artifacts with metadata.

### 61.3.5 Model Validation

Evaluate the trained model on a holdout set or using A/B tests. Compare against baseline and check for fairness and explainability.

**MLOps considerations**: Define validation gates that must be passed before deployment.

### 61.3.6 Model Deployment

Deploy the validated model to a serving environment (e.g., REST API, batch job). Use deployment strategies like blue‑green or canary to minimise risk.

**MLOps considerations**: Automate deployment pipelines (CI/CD). Ensure rollback capability.

### 61.3.7 Model Monitoring

Monitor the model's performance in production: track prediction drift, data drift, latency, and error rates. Set up alerts.

**MLOps considerations**: Log predictions and inputs for later analysis. Use monitoring tools to detect issues.

### 61.3.8 Continuous Retraining

When performance degrades or new data arrives, automatically trigger retraining. The new model goes through the same validation and deployment process.

**MLOps considerations**: Orchestrate retraining pipelines (e.g., Airflow, Kubeflow). Decide on retraining triggers (time‑based, drift‑based).

---

## 61.4 Team Structure and Roles

MLOps requires collaboration across multiple roles:

- **Data Scientists**: Develop models, experiment with features, define validation criteria.
- **Data Engineers**: Build and maintain data pipelines, ensure data quality.
- **ML Engineers**: Bridge data science and engineering; implement ML pipelines, deploy models, set up monitoring.
- **DevOps Engineers**: Manage infrastructure, CI/CD, monitoring, and security.
- **Business Stakeholders**: Define requirements, evaluate business impact.

In a small team, one person may wear multiple hats. In a larger organisation, these roles are distinct.

For the NEPSE project, you might start with a single ML engineer/data scientist, but as the system grows, you'll need dedicated data engineers and DevOps support.

---

## 61.5 MLOps Tooling Landscape

The MLOps ecosystem is vast and rapidly evolving. Tools can be categorised by lifecycle stage:

- **Data versioning**: DVC, LakeFS, Delta Lake.
- **Feature stores**: Feast, Tecton, Hopsworks.
- **Experiment tracking**: MLflow, Weights & Biases, Neptune, Comet.
- **Orchestration**: Apache Airflow, Prefect, Dagster, Kubeflow Pipelines.
- **Model serving**: TensorFlow Serving, TorchServe, Seldon, BentoML, KServe.
- **Monitoring**: Prometheus + Grafana, Evidently, WhyLabs, Arize.
- **CI/CD for ML**: Jenkins, GitLab CI, GitHub Actions, Kubeflow.
- **End‑to‑end platforms**: SageMaker, Vertex AI, Azure ML, Databricks.

For the NEPSE system, a pragmatic stack might include:

- **DVC** or **Git‑LFS** for data versioning.
- **MLflow** for experiment tracking and model registry.
- **Airflow** for orchestrating retraining pipelines.
- **FastAPI** for serving (with Docker and Kubernetes).
- **Prometheus/Grafana** for monitoring.

We will explore many of these tools in later chapters.

---

## 61.6 MLOps Maturity Models

Organisations typically progress through levels of MLOps maturity. A common model is:

- **Level 0: Manual Process** – Data scientists hand over a notebook to engineers who manually deploy. No automation, no monitoring.
- **Level 1: ML Pipeline Automation** – Automated training and deployment pipelines. Models are retrained on a schedule.
- **Level 2: CI/CD Pipeline Automation** – Full CI/CD for ML, including automated testing of data and models. Rapid, reliable deployments.
- **Level 3: Automated Operations** – Models are automatically retrained and deployed based on monitoring triggers. Continuous improvement.

The NEPSE project likely starts at Level 0. The goal is to progress through the levels to ensure reliability and efficiency.

---

## 61.7 Getting Started with MLOps

Introducing MLOps to an existing project like NEPSE can be daunting. Start small and iterate:

1. **Version control everything**: Put all code (including notebooks) in Git. Use DVC for data.
2. **Automate the training pipeline**: Write a script that trains the model, logs parameters and metrics (e.g., with MLflow), and saves the model.
3. **Establish a model registry**: Use MLflow or a simple file system to store models with version tags.
4. **Set up a deployment pipeline**: Automate the deployment of the model as a REST API using CI/CD (e.g., GitHub Actions).
5. **Add monitoring**: Log predictions and inputs, set up basic dashboards.

As you gain confidence, add more advanced practices: feature stores, A/B testing, automated retraining.

---

## 61.8 Example: First MLOps Steps for NEPSE

Let's walk through a concrete example of taking the NEPSE prediction model from a notebook to a minimal MLOps setup.

### 61.8.1 Versioning Code and Data

We create a Git repository for the project. We add a `data/` directory and use DVC to track the CSV files.

```bash
git init
dvc init
dvc add data/nepse_raw.csv
git add data/nepse_raw.csv.dvc .gitignore
git commit -m "Add raw NEPSE data"
```

**Explanation:**  
DVC stores a small metadata file in Git, while the actual data file is in a separate cache (e.g., local or cloud). This allows versioning large datasets without bloating the Git repository.

### 61.8.2 Automating Training with MLflow

We refactor the training code into a Python script `train.py` that uses MLflow to track parameters and metrics.

```python
# train.py
import mlflow
import mlflow.sklearn
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--n_estimators', type=int, default=100)
parser.add_argument('--max_depth', type=int, default=10)
args = parser.parse_args()

with mlflow.start_run():
    # Log parameters
    mlflow.log_param("n_estimators", args.n_estimators)
    mlflow.log_param("max_depth", args.max_depth)

    # Load data (from DVC)
    df = pd.read_csv('data/nepse_features.csv')
    X = df.drop(columns=['target'])
    y = df['target']

    # Train
    model = RandomForestClassifier(n_estimators=args.n_estimators,
                                   max_depth=args.max_depth,
                                   random_state=42)
    model.fit(X, y)

    # Evaluate
    preds = model.predict(X)  # simplified; use proper split
    acc = accuracy_score(y, preds)
    mlflow.log_metric("accuracy", acc)

    # Save model
    mlflow.sklearn.log_model(model, "model")

print("Training complete. Run 'mlflow ui' to view results.")
```

**Explanation:**  
We use MLflow to automatically log parameters, metrics, and the model artifact. This creates a run that can be viewed in the MLflow UI.

### 61.8.3 Automating with a Script

We can now run `python train.py --n_estimators 200 --max_depth 15` and have the result tracked. Next, we could add a shell script to run training with different hyperparameters and compare runs.

### 61.8.4 CI/CD with GitHub Actions

We set up a GitHub Action that runs training and, if on the main branch, deploys the model. This is a simplified example.

```yaml
# .github/workflows/train.yml
name: Train and Deploy

on:
  push:
    branches: [ main ]
  schedule:
    - cron: '0 2 * * 0'  # weekly on Sunday

jobs:
  train:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: '3.9'
    - name: Install dependencies
      run: pip install -r requirements.txt
    - name: Pull data from DVC
      run: dvc pull  # assumes remote storage configured
    - name: Train model
      run: python train.py
    - name: Deploy model
      run: python deploy.py  # script that updates the production API
```

**Explanation:**  
This workflow triggers on every push to main and weekly. It trains a new model and, if successful, deploys it. The `deploy.py` script would update, for example, a Kubernetes deployment with a new model version.

---

## Chapter Summary

In this introductory chapter to MLOps, we covered:

- The definition and necessity of MLOps for production ML systems.
- The key differences between MLOps and DevOps.
- The core principles: automation, reproducibility, versioning, testing, monitoring, collaboration, and continuous training.
- The end‑to‑end MLOps lifecycle with the NEPSE prediction system as a running example.
- Typical team structures and roles.
- A survey of the MLOps tooling landscape.
- MLOps maturity models to assess and guide progress.
- Practical first steps to introduce MLOps to a project.

MLOps is not a single tool but a culture and set of practices that ensure ML systems deliver long‑term value. For the NEPSE system, adopting MLOps principles will transform a one‑off experiment into a reliable, maintainable, and continuously improving service.

In the next chapter, we will dive deeper into **CI/CD for Machine Learning**, exploring how to build automated pipelines that test and deploy models with confidence.

---

**End of Chapter 61**

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='../7. advanced_topics/60. advanced_optimization_techniques.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='62. cicd_for_machine_learning.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
