# MLFlow

## Introduction
> __MLflow is an open-source platform for managing the end-to-end ML lifecycle.__

Its offerings for managing ML experiments and deployment include the following:

- ML-code packaging for reproducibility/sharing (dependencies, models, etc.).
- Experiment tracking (compare results between runs).
- Model deployment to various inference platforms.
- A central store for MLFlow models (model versioning, stage transitions and annotations).

> __MLFlow integrates with popular frameworks, such as `sklearn`, `pytorch` and `spark`.__

> __It also provides APIs not only for `python`, but also for `R` and `java`__, as well as REST APIs/CLIs for other languages.

Here, we will focus on Python and shell usage.


## Development Operations (Reminder)

Development Operations (DevOps) is concerned with streamlining the building, testing and deployment of high-quality applications.

### Key DevOps practices
- Continuous integration.
- Single source of truth for code and artifacts.
- Continuous testing.
- Definining infrastructure as code.
- Observability and monitoring.
- Bug reporting.
- Security ops (DevSecOps).

### Benefits of DevOps
- Promotes automation (e.g. CI-CD).
- Promotes the definition of repetitive processes (e.g code review).
- Standardises development (e.g. code style guide).

## DevOps vs MLOps

> ML systems are fundamentally different from most other software systems because the solution often changes.

Resultantly, ML projects are fraught with many unique challenges that are often absent in traditional software projects:

- The programmer must keep track of what data produced what models.
- The model outputs in production must be monitored.
- Data drift must be monitored
- Training is a necessity.
- Retraining is required.
- Experiments must be shared across teams.
- Features must be shared across teams.

These different parts of the AI stack can be streamlined through processes, documentation and tooling, which are the core functions of MLOps.

MLOps is still in its infancy; thus, its definition is expected to evolve. Every tool used in the ML stack _could_, in theory, be considered an MLOps tool in that they streamline ML-system development. Currently, MLOps tools facilitate easy and rapid iteration through the ML model lifecycle, particularly after the model is put into production.

Consider the below examples.
- __Feature store__: a single location to store features that might be used across teams. It can, however, be costly to compute (for example, the average purchase size this year). 
- __Model registry__: similar to MLFlow's, it allows the movement of models into and out of production with a simple dropdown menu. 
- __Data versioning__: tools, such as DVC, help you to repeat experiments and track what worked best for different datasets as they constantly evolve.

### Benefits of MLOps

1. Relatively rapid project delivery:
    - rapid handovers between teams.
    - rapid onboarding of new team members.
    - rapid issue resolution.
    - project development starts quickly.
<br><br>
2. Standardisation: This promotes
    - repeatability (everyone has a common language and uses the same ground-truth sources for data, features, models, etc.).
    - explainability (it is easy to explain the pipeline if everyone is familiar with the same one).
    - auditability (it is easy to identify weak points in a well-known pipeline).
    - compliance (it is easy to check for anything if you know where to look).
    - technical-debt reduction (the build-up of technical debt can be significantly reduced).
<br><br>
3. Breaking down silos between teams
    - When MLOps processes are implemented, data scientists can focus on data science.
    - In some companies, data scientists take up the role of data engineers and cloud engineers, simply because of the lack of MlOps processes. 
    - In other companies, data scientists 'throw the model over the wall' to the ML engineers, who deploy them by simply sending a Jupyter notebook. Jupyter notebooks cannot be put into production. Further, some ML engineers have complained about having to 'start from scratch', even when the data scientist has developed a proof of concept, which is highly inefficient. These inefficiencies can be eliminated by defining processes and specifications for these teams to interact.

 In the subsequent lessons, we will explore the above in depth.


## Conclusion
At this point, you should have a good understanding of 
- MLFlow and its benefits
- the differences between DevOps and MLOps.