# Introduction to <img src="MLflow-logo-black.png" width="40%" height="40%" align="middle"> Managing the ML lifecycle from experimentation to deployment

Workshop by Sebastian Herold

# Survey, expectations

* Which experiences with model management do you have?
* Which tools do you use?
* What issues do you face during the model development or the whole machine learning lifecycle where model management can help you?
* What functionalities do you expect from a model management tool?

# Overview on MLflow
* Open source platform for managing end-to-end machine learning lifecycle
* Three primary functions:
  * Tracking experiments to record and compare parameters and results
  * Packaging ML code in a reusable, reproducible form
  * Managing and deploying models
* library-agnostic: can be used with any ML library, in any programming language, project includes a REST API, Python API, R API, and Java API.

In this workshop, we will discuss MLflow's three components: *MLflow Tracking*, *MLflow Projects*, and *MLflow Models*.

# MLflow Tracking
API and UI for logging parameters, code versions, metrics, and artifacts when running machine learning code and for later visualizing the results

## Setup
* e.g., on Windows
  
    ```bash
    mlflow_server\Scripts\activate
    mlflow server \
        --host 0.0.0.0 \
        --port 5000 \
        --backend-store-uri file:///%cd%\mlruns \
        --default-artifact-root file:/%cd%\mlruns
    ```
* Unix, Docker: see repo
* Backends, artifact stores: [MLflow docs](https://www.mlflow.org/docs/latest/tracking.html#mlflow-tracking-servers)
* UI: http://localhost:5000

## Python API Logging Functions
### High-level

`import mlflow`

Tracking URI  
`mlflow.set_tracking_uri("<URI>")`  
`mlflow.tracking.get_tracking_uri()`

Experiments  
`mlflow.create_experiment("<EXPERIMENT_NAME>")`  
`mlflow.set_experiment("<EXPERIMENT_NAME>")`

Runs  
`mlflow.start_run()`  
`mlflow.end_run()`

## Python API Logging Functions
### High-level

Logging and tags  
`mlflow.log_param("<PARNAME>", "<PARVALUE>")`  
`mlflow.log_params(<PARDICT>)`

`mlflow.log_metric("<METRICNAME>", <METRICVALUE>)`  
`mlflow.log_metrics(<METRICDICT>)`

`mlflow.set_tag("<TAGNAME>", "<TAGVALUE>")`  
`mlflow.set_tags(<TAGDICT>)`

`mlflow.log_artifact("<LOCALPATH>")`  
`mlflow.log_artifacts("<LOCALDIR>")`

## Python API Logging Functions
### Low-level

`import mlflow.tracking`

`mlflow.tracking.MLflowClient()`  
  `.create_experiment()`  
  `.create_run()`  
  `.download_artifacts()`  
  `.get_experiment()` and `.get_experiment_by_name()`  
  `.get_run()`  
  `.list_artifacts()`  
  `.list_experiments()`  
  `.log_artifact()` and `.log_artifacts()`  
  `.log_metric()`  
  `.log_param()`  
  `.log_batch()`

## So let's get started

# MLflow Projects

Standard format for packaging reusable data science code.

## Setup
* Directory with code or a Git repository, including descriptor file or simply convention to specify dependencies and how to run code
* [Template Folder](https://github.com/MynherVanKoek/AMLD_2020_MLflow/tree/master/20_projects/200_mlflow_project_template)

## Running the project

`mlflow run <FOLDER> [<ARGUMENTS>]`

**Windows Users:** Follow additional instructions prior to running code.

## Hands-on, part two

# MLflow Models
Convention for packaging machine learning models in multiple flavors and providing a variety of tools to deploy these models

## MLflow's way of storing models
```bash
# Directory written by mlflow.sklearn.save_model(model, "my_model")
my_model/
├── MLmodel
└── model.pkl
```

`MLmodel` file
```yaml
utc_time_created: '2019-12-28 07:13:54.587332'
run_id: 5c37ed12cb5a4cd68ddeec596482db6f
artifact_path: model
flavors:
  python_function:
    data: model.pkl
    env: conda.yaml
    loader_module: mlflow.sklearn
    python_version: 3.6.5
  sklearn:
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: '0.22'
```

## Flavors
For example:
* Python Function (`python_function`)
* Keras (`keras`)
* PyTorch (`pytorch`)
* Scikit-learn (`sklearn`)
* Spark MLlib (`spark`)
* TensorFlow (`tensorflow`)

## MLflow Model Python API

* `mlflow.<FLAVOR>.save_model(<MODEL>, "<LOCALPATH>")`
* `mlflow.<FLAVOR>.log_model(<MODEL>, "<ARTIFACTPATH>")`
* `mlflow.<FLAVOR>.load_model("<MODELURI>")`
* `mlflow.<FLAVOR>.autolog()`

## Deployment
MLflow provides tools for deploying MLflow models on a local machine or to several production environments. Currently, these are
* deployment of MLflow Models
* deployment of a `python_function` model on Microsoft Azure ML
* deployment of a `python_function` model on Amazon SageMaker
* export of a `python_function` model as an Apache Spark UDF

## Deployment of MLflow Models

* Creating a REST API server:
  `mlflow models serve [<Arguments>]`
  
  
* Getting predictions: call `/invocations` endpoint with POST request, e.g., by using `curl`:
    ```bash
    # split-oriented
    curl http://localhost:<PORT>/invocations \
        -H 'Content-Type: application/json; format=pandas-split' \
        -d '{
        "columns": ["a", "b", "c"],
        "data": [[1, 2, 3], [4, 5, 6]]
        }'
    ```

## So, let's deploy some MLflow Models    