# Introduction to MLOps

## The Machine Learning workflow

[![ML workflow by RedHat](images/wiidii_ml_workflow.png)](https://www.redhat.com/files/summit/session-assets/2019/T957A0.pdf)

### Codifying problems and metrics

- Main questions:

    - What is the business objective?
    - How to measure success?
    - What are the technical, temoral and organisational constraints?

- Possible solutions: communicate with PO and stakeholders, knowing product and client needs.

### Data collection and cleaning

- Main questions:

    - Which data?
    - Is it free/in adequate quantity/noisy/labelled/biased?
    - Is it stable or evolving?
    
- Possible solutions: [public datasets](https://github.com/awesomedata/awesome-public-datasets), [DVC](https://dvc.org/), [Doccano](https://github.com/doccano/doccano), manual work.

### Feature engineering

- Main questions:

    - What is the format of my input data?
    - Whet features could potentially be useful for my models?

- Possible solutions: data pipelines, feature stores, domain experts.

[![Feature store](images/feature_store.png)](https://www.tecton.ai/blog/what-is-a-feature-store/)

### Model training and tuning

- Main questions:

    - Which model(s)?
    - How to optimize its performance?
    - How to track model versions?

- Possible solutions: starting simple, hyperparameter tuning, [MLflow](https://mlflow.org).

### Model validation

- Main questions:

    - Does the model address the business objective?
    - How to measure its performance?
    - Are there uptime constraints for my model?

- Possible solutions: testing set, [continuous integration](https://en.wikipedia.org/wiki/Continuous_integration), [memoization](https://en.wikipedia.org/wiki/Memoization).

### Model deployment

- Main questions:

    - How to serve my model?
    - How to handle model versioning?
    - How to handle scaling?

- Possible solutions: [FastAPI](https://fastapi.tiangolo.com/), [Docker](https://www.docker.com/), [Kubernetes](https://kubernetes.io/), [Cortex](https://www.cortex.dev/), [Databricks](https://databricks.com/), stress tests.

### Monitoring

- Main questions:

    - How to check model performance in production?
    - How to prevent [model drifting](https://c3.ai/glossary/data-science/model-drift/)?
    - How to explain model results?
    
- Possible solutions: [A/B testing](https://en.wikipedia.org/wiki/A/B_testing), [canary release](https://martinfowler.com/bliki/CanaryRelease.html), [explainability tools](https://github.com/EthicalML/awesome-production-machine-learning#explaining-black-box-models-and-datasets).