# MLOps Zoomcamp
## Putting ML to Production
MLOps is a set of best practices and tools for deploying machine learning in production.

Imaging trying to predict the duration of a given taxi trip.

The main steps in developing such a machine learning project:
1. Design
> Is machine learning hte right approach
2. Train
> Try and compare different models and pick the most appropriate one
3. Deploy and maintain
> Make the service available to the public through an API

MLOps assists us in each of the above steps by enabling automation and collaboration.

## The issue with notebooks
Notebooks are used for experimentation, but can often result in unorganized and sloppy code.
Cells are out of order and code is often not modular => no csoftware engineering best practices.

### Experiment tracking
When trying to find the best model with optimal parameters, the history of experiments is often lost (or relies on the discipline of the data scientists). With this, an experiment tracking tool can help.

### Model saving
Saving models locally along with performance. Model registry can help with this.

### Model pipelines
Training a machine learning model consists of a series of sequential steps:
1. load and prepare data
2. vectorize data and format the data
3. train the models

When training multiple models and trying different parameters, the above pipeline will have to be run a large amount of times. For time-saving purposes, it is crucial that the workflow streamlined and parametrized, this allowing for quick iterations.

- updating the training data
- adjusting parameters for a given model
- training a new training algorithm entirely

Tools such as Kestra or Prefect can help us with this.

### Model serving
When deployed, a model is typically put in a publicly accessible service. Users can send requests and receive the predictions output by the model.

### Model moniroring
Model need to be maintained as behaviors change, When a model no longer perform at the expected level, a new updated model needs to be trained and deployed.
This process can be automated such that no human is involved.

The overall idea behind MLOps is to provide tools and best practices to automate part of the machine learning project workflow.
It borrows from DevOps and overall software engineering best practices to ensure proper documentation, testing, and maintenance.

Data scientists rarely work in isolation. In addition to engineering best practices, it is important to establish proper processes for communication, documentation and code ownership such that ensure we know what and why we are solving a problem.

## MLOps Maturity Model
### Level 1: No MLOps/Automation
- notebooks only
- no tracking of models and metadata
- data scientists work alone
- good for POC projects only
### Level 2: DevOps but no MLOps
- some automation (engineers involved)
- automated releases like web apps
- CI/CD, operations metrics
- not ML specific
- no experiment tracking
- DS separated from engineers
### Level 3: Automated Training
- parametrized training pipeline
- experiment tracking and registry
- deployment is not automated yet but low friction
- DS and engineers work together
- good when you have multiple models in prod
### Level 4: Automated deployment operations
- no human deployment or very easy
- ML platform for deployment through API
- A/B test capabilities to compare models
- model monitoring
### Level 5: Full MLOps automation
- automated training and deployment in an integrated environment

When you should be at which models:
- when testing ideas (proof of concept), level 1 is enough for early results
- deployment requires at least level 2
- multiple models require level 2
- even in mature organizations, not all models need to be fully automated
- pragmatism: only level 4, 5 if required (high infrastructure cost)
- you may want some level of human intervention in the process