# 01 - Fundamentals of MLOps
---

### **Introduction**
In a similar way to how DevOps is the process of managing and deploying code, MLOps is the process of managing and deploying machine learning models. However, MLOps presents a different set of challenges. With DevOps, changes to the behaviour of the deployed code are predominantly driven by changes to the code. A developer introduces a bug by mistake and this leads to a problem with how the deployed code functions. However changes to a machine learning systems could be driven by:
- Data or distribution changes
- Labels arriving late
- Stochastic experiments
- Divergence of training and inference environments

MLOps exists to:
- Make experiments reproducible
- Track what produced a model
- Make deployments safe
- Detect degradation
- Recover safely

### **The ML Lifecycle**
The ML lifecycle consists of the following phases:
1. **Problem Framing**
- Clearly, define the objective
- Translate into a machine learning task
- Define success metrics
- Identify constraints
2. **Data**
- Collect data and perform quality checks
- Conduct feature engineering
- Split into train, validation and test sets
3. **Model Development**
- Train models in a reproducible way
- Conduct hyperparameter tuning
- Track experiments (metrics, parameters, artifacts etc.)
4. **Model Validation**
- Cross-validation
- Bias & fairness checks
- Check performance agaisnt business metrics
5. **Packaging & Deployment**
- Containerise model and deploy (either batch or real-time serving)
- CI/CD for ML
6. **Monitoring in Production**
- Monitor predictions
- Monitor latency if required
- Monitor input data and detect data drift
- Detect concept drift
7. **Retraining & Continuous Improvement**
- Trigger-based retraining (e.g. time, drift, performance drop)
- Automated retraining pipelines
- Track model versions with rollback mechanisms


### **Reproducibility**
**Reproducibility** in ML is the ability to reliably recreate a model exactly as it was originally trained. This is important because:
- It enables debugging of production issues
- It allows safe rollback to a previous model version
- It ensures fair comparison between experiments
- It enables collaboration across teams
- It supports audit and regulatory requirements

In practice, reproducibility means you must be able to recreate:
- The data             
- The code
- The feature pipeline
- The hyperparameters
- The random seed
- The execution environment

Modern ML platforms such as Azure ML, AWS SageMaker, and Google Vertex AI provide built-in mechanisms to help manage these components in a structured and reproducible way. They generally share many of the same fundamental ideas about what should be tracked, albeit their implementations differ. The core ideas about how reproducibility is achieved are discussed below. 

**Data Reproducibility**
Data is made reproducible by using **data versioning** which is where specific snapshots of the data are saved as named versions. These named versions are used as inputs for training jobs. Since the data is immutable, the exact dataset used to train a model can always be retrieved and reused.

**Code Reproducibility**
Code is made reproducible my using version management tools such as Git. By recording the repository, branch and commit ID, the exact training logic to be reconstructed at any point in time.

**Feature Pipeline Reproducibility**
Feature engineering is the process of turning your raw data in processed data ready for training and is often one of the largest sources of inconsistency. To ensure reproducibility:
- Feature transformations must be defined in code (not performed manually)
- Preprocessing steps must be part of the training pipeline
- The exact feature set used for training must be recorded (i.e. via data versioning)
- Training and inference pipelines must be identical

**Hyperparameter Reproducibility**
The hyperparameters can alter the performance of a model significantly thus recording which values were used as part of a training process is extremely important for reproducibility. You should record all hyperparameter values including search strategies (e.g. grid search, random search, Bayesian optimisation) and the search space. Most experiment tracking tools typically log these by default.

**Randomness Control**
Many ML incorporate some kind of randomness. For example, with gradient decent, generally the parameters are initialised with random values. A random seed is an ID value used by a random number generator, ensuring that the sequence of “random” numbers it produces is the same each time. Setting and recording the random seed allows for the stochastic elements of ML algorithms random elements to be controlled for, thus enabling the training process to be recreated exactly, despite the randomness.  

**Environment Reproducibility**
The environment an ML training pipeline is run in consists of the following:
- The programming language version (e.g. Python 3.10)
- The packages and their versions (e.g pandas 2.2.2)      
- The operating system and its version (e.g. linux 24.04.1)            
- Hardware drivers and their versions (e.g. NVIDIA CUDA 12.2)

Each of these elements influences the output of a model training pipeline and so need to be controlled for. Reproducibility is most effectively achieved through Docker containerisation which ensures the elements listed above are packaged together in a consistent and portable way. Note that containerisation is discussed separately in futher detail .

### **Experiment Tracking**
During the model development phase, usually many models are trained using different data versions, feature sets, algorithms and hyperparameters. This neccessitates a structured method for recording the training process including the inputs, settings and outputs. **Experiment tracking** provides a systematic way to log and compare different training runs. 

At a minimum, each experiment run should record:
- Parameters (e.g. learning rate, number of trees, max depth)
- Metrics (e.g. accuracy, RMSE, precision, recall)
- Data version
- Code version (commit ID)
- Artifacts (trained model files, plots, feature importance outputs)
- Environment information

Doing so enables:
- Fair comparison between models
- Reproducibility of past results
- Identification of performance regressions
- Collaboration across teams
- Selection of a production-ready model

Many ML platforms provide built-in functionality for experiment tracking. This usually involves:
- A central experiment store
- A UI for comparing runs
- Automatic logging of metrics and parameters
- Integration with model registries

### **Model Registry**
The model development process typically produces many different models but eventually one is chosen to be the production model. However, over time new data may become available or the performance of the deployed model may deteriorate and so a newer version of the model needs to be trained and deployed. **Model registries** are centralised systems used to:
- Store trained models
- Version models
- Track metadata about each model
- Manage promotion to production

Model registries make it easy to track which model version was deployed at a given time and rollback to previous versions if needed. Typically, the output of an experiment run is promoted to a registered model version. That version can then be promoted through to production. Inference endpoints can be congifured to either serve the latest version or serve a specific version. This enabled A/B testing, and safe rollback. It also enables canary deployments which is where a second version of a model is deployed on a small percentage of traffic and then this is increased to eventually phase out the older model in a safer way. 