# MLflow Demo

## Resources:
* https://mlflow.org/docs/latest/quickstart.html

This notebook summarizes basic use cases for MLflow that helps visualizing and understanding the benefits of it.

This notebook is purely bash commands to interact with an mlflow runtime in docker. This demo includes playing around with the mlflow demos in which we can have scikit-learn and pytorch models, as well as environment configurations in both conda and docker.

## MLflow Overview
One way to summarize mlflow's value proposal is as follows:
* Enables and facilitates the environment configuration needed to run experiments (via conda and docker)
* Enables and facilitates experiment tracking.
* Enables and facilitates model storage and serving for quick AdHoc experiments.
* It does the above with few additional stuff in the project, it integrates with popular ML frameworks and even other monitoring tools like tensorboard.

## MLflow Runtime
MLflow uses a tracking server to do just that. Tracking experiments, storing models, some basic reporting and managing the runtime environment for each experiment. The tracking server can be both local or remote. The local tracking server is best for local experiments, e.g., during development etc. And the remote server is more suited to heavy training. It is important to clarify that this server is only for tracking and storing artifacts. It is not a runtime for experiments. Each experiment can run anywhere, the experiment runtime just need to have access to the tracking server to report progress.

In local runtime, mlflow can be used to recreate experiments from scratch without needing to manually configure anything.

### Installation
Within a python or conda environment, install mlflow with `pip install mlflow`. The CLI tool gets installed and you can start working with mlflow. The github repo has a lot of examples: https://github.com/mlflow/mlflow.

### Running a simple experiment
For illustration purposes, let's write a simple program to log dummy metrics

In [1]:
import os
from random import random, randint
from mlflow import log_metric, log_param, log_artifacts

# log a parameter
log_param("param1", randint(0, 100))

# log a metric
log_metric("foo", random())
log_metric("foo", random() + 1)
log_metric("foo", random() + 2)

# log an artifact
outputs = "/tmp/mlflow-demo/outputs"
if not os.path.exists(outputs):
    os.makedirs(outputs)
with open(os.path.join(outputs, "test.txt"), "w") as f:
    f.write("hello world!")
log_artifacts(outputs)

The above code does three things:

1. It logs a parameter, for instance a model hyperparameter, like the learning rate, epochs, etc.
2. It logs metrics, during training, you can log the metrics progess which changes over time, like accuracy, loss, etc.
3. It logs artifacts to recreate the model, like the model + weights, this gets persisted and sent to the server for later usage.

The above code also produces a folder called 'mlruns' in your working directory which is to track everything you decide to log.

In [4]:
!tree ./mlruns/

[01;34m./mlruns/[00m
└── [01;34m0[00m
    ├── [01;34m0732020976b14911a7eada8185d82a87[00m
    │   ├── [01;34martifacts[00m
    │   │   └── test.txt
    │   ├── meta.yaml
    │   ├── [01;34mmetrics[00m
    │   │   └── foo
    │   ├── [01;34mparams[00m
    │   │   └── param1
    │   └── [01;34mtags[00m
    │       ├── mlflow.source.name
    │       ├── mlflow.source.type
    │       └── mlflow.user
    └── meta.yaml

6 directories, 8 files


We can save the exact same code as a python script and execute it as a standalone program

In [7]:
!python mlflow_tracking.py

Everything runs locally. If we want to visualize the results, we can start the ui via `mlflow ui` which runs a server listening on `http://localhost:5000` by default. This command should be executed at `mlruns` parent folder, so the ui can read it and display the experiments as shown in the picture. There we can observe the experiments we just run, as well as the metrics. If we click in the experiments, we can observe the artifact we saved as well.

![mlflow-ui](images/mlflow-ui.png)

## MLflow projects
An MLflow project is like a package containing all the necessary artifacts to run an experiment, including the runtime environment definition (as a conda enviroment or a Dockerfile) and an additional file that tells MLflow some attributes about the project endpoints, input parameters, etc. This is necessary to enable running experiments against an external tracking server.

We can run experiments directly from a github repository or a local folder. In local mode the results are again stored in `mlruns` and can be viewed with the ui.

For example, let's run one of the mlflow examples at https://github.com/mlflow/mlflow-example.git, this is a simple scikit-learn model to predict wine quality. It only consists in the project definition file, a conda environment file, the data set in csv format and the python scripts that trains the model.

In [9]:
!mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=5.0

2020/10/14 17:50:00 INFO mlflow.projects.utils: === Fetching project from https://github.com/mlflow/mlflow-example.git into /tmp/tmpborqt582 ===
2020/10/14 17:50:04 INFO mlflow.utils.conda: === Creating conda environment mlflow-1abc00771765dd9dd15731cbda4938c765fbb90b ===
Collecting package metadata (repodata.json): done
Solving environment: done


  current version: 4.8.4
  latest version: 4.8.5

Please update conda by running

    $ conda update -n base -c defaults conda



Downloading and Extracting Packages
pip-20.2.3           | 1.7 MB    | ##################################### | 100% 
setuptools-50.3.0    | 710 KB    | ##################################### | 100% 
pandas-1.1.3         | 8.1 MB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Installing pip dependencies: / Ran pip subprocess with arguments:
['/home/ohtar10/miniconda3/envs/mlflow-1abc00771765dd9dd15731cbda4938c765fbb90b/bin/python

The first time we run the project via mlflow, it will automatically fetch all the files from the repository, and use the conda environment file to create the local conda environment to run the experiment. Then, mlflow runs the experiment as it was defined and will log the defined parameters, metrics and artifacts. If we explore the content of this project, the vast majority of it is related to scikit-learn and the model itself, and very few pieces are related to mlflow.

The Project file is another yaml file that describes the project metadata, including the running environment specs and the entry points:

```yaml
name: tutorial

conda_env: conda.yaml

entry_points:
  main:
    parameters:
      alpha: float
      l1_ratio: {type: float, default: 0.1}
    command: "python train.py {alpha} {l1_ratio}"
```

Through `conda_env` we tell mlflow which conda environment file we need to use to create the appropriate runtime environment before running the model. If the project is run again, the preexisting conda environment will be reused.

### Serving models
Now let's suppose after training the model we want to publish it for AdHoc testing. We can create endpoints from the executed experiments and make http requests to them.

First, we need to execute an experiment:

In [11]:
!python ~/git/mlflow/examples/sklearn_logistic_regression/train.py

Score: 0.6666666666666666
Model saved in run 263aa26bfccd434b818c18938591ceeb


Now, we can create an endpoint to the model with:
```
mlflow models serve -m ./notebooks/mlruns/0/263aa26bfccd434b818c18938591ceeb/artifacts/model -p 1234
```

Notice that in local mode we specify the path to the actual experiment using the run id we obtained above. We can also specify a custom port to expose. Finally, if there is no conda environment created for this service, mlflow will automatically create it for you, then serve the model which can be consumed via `curl`.

In [12]:
!curl -d '{"columns":["x"], "data":[[1], [-1]]}' -H 'Content-Type: application/json; format=pandas-split' -X POST http://localhost:1234/invocations

[1, 0]

### Working with docker environments
So far, I have demonstrated the usage of mlflow projects to recreate the runtime environment via conda for both training and serving. However, we can also use docker instead. We just need to create the Dockerfile and image and specify it in the mlflow project instead of a conda environment. The runtime will create a container using that image and run the experiment. The rest of the process is exactly the same.

#### Dockerfile example
We can have anything we want in the docker file. For instance, this is a simple docker image based on miniconda and installing some packages.

```
FROM continuumio/miniconda:4.5.4

RUN pip install mlflow>=1.0 \
    && pip install azure-storage-blob==12.3.0 \
    && pip install numpy==1.14.3 \
    && pip install scipy \
    && pip install pandas==0.22.0 \
    && pip install scikit-learn==0.19.1 \
    && pip install cloudpickle
```

We should build this image and put it in some docker registry as the image identifier is what we need to specify in the mlflow project.

In [13]:
!docker image build -t mlflow-docker-example:latest -f ../test/docker/Dockerfile .


Step 1/2 : FROM continuumio/miniconda:4.5.4
4.5.4: Pulling from continuumio/miniconda

[1B
[1B
[1B
[1BDigest: sha256:19d3eedab8b6301a0e1819476cfc50d53399881612183cf65208d7d43db99cd9
Status: Downloaded newer image for continuumio/miniconda:4.5.4
 ---> 16e4fbac86ce
Step 2/2 : RUN pip install mlflow>=1.0     && pip install azure-storage-blob==12.3.0     && pip install numpy==1.14.3     && pip install scipy     && pip install pandas==0.22.0     && pip install scikit-learn==0.19.1     && pip install cloudpickle
 ---> Running in b541d45bb67c
[91mYou are using pip version 10.0.1, however version 20.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
[0mCollecting azure-storage-blob==12.3.0
  Downloading https://files.pythonhosted.org/packages/ff/76/5ed49519f30636beba06b4fab41e086abc9fa75ee79770e942e14c92136a/azure_storage_blob-12.3.0-py2.py3-none-any.whl (279kB)
Collecting azure-core<2.0.0,>=1.2.2 (from azure-storage-blob==12.3.0)
  Downloading 

The mlflow project file should look like this:
```
name: docker-example

docker_env:
  image:  mlflow-docker-example:latest

entry_points:
  main:
    parameters:
      alpha: float
      l1_ratio: {type: float, default: 0.1}
    command: "python train.py --alpha {alpha} --l1-ratio {l1_ratio}"
```
Nodice instead of a `conda_env` property we provide `docker_env` and the image we just built above.

Now, we just need to run the mlflow project. Mlflow will grab the info and create the container to run the experiment. Notice the `command` option, here we tell to run a python program and execute the `train.py` script which is part of the project files.

In [14]:
!mlflow run ../test/docker -P alpha=0.5

2020/10/15 10:34:47 INFO mlflow.projects.docker: === Building docker image docker-example ===
2020/10/15 10:34:47 INFO mlflow.projects.utils: === Created directory /tmp/tmpt83_9vgp for downloading remote URIs passed to arguments of type 'path' ===
2020/10/15 10:34:47 INFO mlflow.projects.backend.local: === Running command 'docker run --rm -v /home/ohtar10/tests/mlflow/notebooks/mlruns:/mlflow/tmp/mlruns -v /home/ohtar10/tests/mlflow/notebooks/mlruns/0/6d47addf45db41efaf17bcafdb82ad4d/artifacts:/home/ohtar10/tests/mlflow/notebooks/mlruns/0/6d47addf45db41efaf17bcafdb82ad4d/artifacts -e MLFLOW_RUN_ID=6d47addf45db41efaf17bcafdb82ad4d -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 docker-example:latest python train.py --alpha 0.5 --l1-ratio 0.1' in run with ID '6d47addf45db41efaf17bcafdb82ad4d' === 
Elasticnet model (alpha=0.500000, l1_ratio=0.100000):
  RMSE: 0.794793101903653
  MAE: 0.6189130834228139
  R2: 0.18411668718221808
2020/10/15 10:34:49 INFO mlflow.pr

If you pay attention, mlflow automatically creates the container with shared volumes to preserve the experiment results and it will automatically publish the results in the specified tracking uri.

## Working with a remote tracking server
All examples above work perfectly in local environments. However, it is normal to use a remote tracking server because there is where the final experiments will be stored and there is where we can have a model registry to keep track of all the experiments, versions, expose them as services etc.

For this part of the demo I have prepared a docker-compose runtime environment with a postgres data base as a backend registry and a shared volume as model registry. Then we can run several experiments against this instance and we will be able to keep track of each experiment. We just need to ensure to specify the `MLFLOW_TRACKING_URI` environment variable to where the server is running. Then, all the experiments can run normally.

In this particular case, one of the containers will run an mlflow server (not just a simple ui server) with the following command:
```
mlflow server --backend-store-uri postgresql://mlflow-store:123456@mlflow-store/mlflow-store --default-artifact-root /tmp/mlflow --host 0.0.0.0
```
Where:

* `backend-store-uri` is the postgres data base uri.
* `default-artifact-root` is the artifact store, i.e., where the models are actually stored, this can be an s3 bucket or some other external storage. It is recommended this to be visible from all the nodes that will interact wit the tracking server since there is where the local mlflow instance will publish the models.
* `host` by default, mlflow server only listens from localhost, we need to specify the net adress we want the server to listen to.

Once more, we can run experiments as we did above, but specifying the remote tracking server uri.


In [3]:
# Run this before running the compose file to ensure permissions to a common volumne
! mkdir -p /tmp/mlflow

In [4]:
%%bash
export MLFLOW_TRACKING_URI=http://localhost:5000/
python mlflow_tracking.py

We can now observe the results in the remote server, including the artifacts!

It is a good practice to provide names to the experiments so that they can be organized in the tracking server under the same name, it is easier to track progress and compare metrics between runs of the same experiment name.

In [5]:
%%time
%%bash
export MLFLOW_TRACKING_URI=http://localhost:5000/
mlflow run --experiment-name sklearn-en-wine ~/git/mlflow/examples/sklearn_elasticnet_wine/ -P alpha=0.35 -P l1_ratio=0.1

INFO: 'sklearn-en-wine' does not exist. Creating a new experiment
Elasticnet model (alpha=0.350000, l1_ratio=0.100000):
  RMSE: 0.7380094532393624
  MAE: 0.5685087888806625
  R2: 0.22828386798815148
2020/10/16 06:41:08 INFO mlflow.projects.utils: === Created directory /tmp/tmpizzn2vw1 for downloading remote URIs passed to arguments of type 'path' ===
2020/10/16 06:41:08 INFO mlflow.projects.backend.local: === Running command 'source /home/ohtar10/miniconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-6284a367a61b51ccdf445333a216776597fb4efc 1>&2 && python train.py 0.35 0.1' in run with ID '366cbbfbf4e74f2b9c0ec373fc7f322f' === 
Successfully registered model 'ElasticnetWineModel'.
Created version '1' of model 'ElasticnetWineModel'.
2020/10/16 06:41:11 INFO mlflow.projects: === Run (ID '366cbbfbf4e74f2b9c0ec373fc7f322f') succeeded ===
CPU times: user 0 ns, sys: 6.26 ms, total: 6.26 ms
Wall time: 6.9 s


We can run several runs for the same experiments to compare performance:

In [6]:
%%time
%%bash
export MLFLOW_TRACKING_URI=http://localhost:5000/
mlflow run --experiment-name sklearn-en-wine ~/git/mlflow/examples/sklearn_elasticnet_wine/ -P alpha=0.5 -P l1_ratio=0.1
mlflow run --experiment-name sklearn-en-wine ~/git/mlflow/examples/sklearn_elasticnet_wine/ -P alpha=0.4 -P l1_ratio=0.1
mlflow run --experiment-name sklearn-en-wine ~/git/mlflow/examples/sklearn_elasticnet_wine/ -P alpha=0.5 -P l1_ratio=0.2
mlflow run --experiment-name sklearn-en-wine ~/git/mlflow/examples/sklearn_elasticnet_wine/ -P alpha=0.4 -P l1_ratio=0.2
mlflow run --experiment-name sklearn-en-wine ~/git/mlflow/examples/sklearn_elasticnet_wine/ -P alpha=0.35 -P l1_ratio=0.2

Elasticnet model (alpha=0.500000, l1_ratio=0.100000):
  RMSE: 0.7460550348172179
  MAE: 0.576381895873763
  R2: 0.21136606570632266
Elasticnet model (alpha=0.400000, l1_ratio=0.100000):
  RMSE: 0.7410782793160982
  MAE: 0.5712718681984226
  R2: 0.22185255063708886
Elasticnet model (alpha=0.500000, l1_ratio=0.200000):
  RMSE: 0.7543919979968401
  MAE: 0.5857669727382302
  R2: 0.19364204365178084
Elasticnet model (alpha=0.400000, l1_ratio=0.200000):
  RMSE: 0.7468093030485083
  MAE: 0.5777243300021722
  R2: 0.20977062786327272
Elasticnet model (alpha=0.350000, l1_ratio=0.200000):
  RMSE: 0.7431910168050467
  MAE: 0.5739937604254349
  R2: 0.2174093904435277
2020/10/16 06:41:41 INFO mlflow.projects.utils: === Created directory /tmp/tmpu36s0uc4 for downloading remote URIs passed to arguments of type 'path' ===
2020/10/16 06:41:41 INFO mlflow.projects.backend.local: === Running command 'source /home/ohtar10/miniconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-6284a367a61b51ccdf4

### Serving models from the registry
The above example not only runs an experiment but also registers a model in mlflow. We can then serve registered models to do AdHoc tests. In the above example we run the experiment 6 times, hence, we end up with 6 versions of `ElasticnetWineModel`. We simply do:

```
export MLFLOW_TRACKING_URI=http://localhost:5000/
mlflow models serve -m "models:/ElasticnetWineModel/6" -p 1234
```

This will create the service instance at port 1234 using the version 6 of the model. We can execute a curl call then.

In [10]:
!curl -d '{"columns":["fixed acidity", "volatile acidity", "citric acid", "residual sugar", "chlorides", "free sulfur dioxide", "total sulfur dioxide", "density", "pH", "sulphates", "alcohol"], "data":[[7, 0.27, 0.36, 20.7, 0.045, 45, 150, 1.001, 3, 0.45, 8.8]]}' -H 'Content-Type: application/json; format=pandas-split' -X POST http://localhost:1234/invocations

[4.950647762564361]

## Pytorch with tensorboard
This example 

In [7]:
%%time
%%bash
export MLFLOW_TRACKING_URI=http://localhost:5000/
mlflow run --experiment-name pytorch-mnist ~/git/mlflow/examples/pytorch/ -P lr=0.05 -P epochs=10 -P momentum=0.5

INFO: 'pytorch-mnist' does not exist. Creating a new experiment
Writing TensorBoard events locally to /tmp/tmpevkxoefe


Test set: Average loss: 4.7327, Accuracy: 9695/10000 (97%)


Test set: Average loss: 4.6956, Accuracy: 9763/10000 (98%)


Test set: Average loss: 4.6816, Accuracy: 9821/10000 (98%)


Test set: Average loss: 4.6736, Accuracy: 9831/10000 (98%)


Test set: Average loss: 4.6662, Accuracy: 9849/10000 (98%)


Test set: Average loss: 4.6652, Accuracy: 9846/10000 (98%)


Test set: Average loss: 4.6619, Accuracy: 9857/10000 (99%)


Test set: Average loss: 4.6603, Accuracy: 9869/10000 (99%)


Test set: Average loss: 4.6589, Accuracy: 9859/10000 (99%)


Test set: Average loss: 4.6573, Accuracy: 9880/10000 (99%)

Uploading TensorBoard events as a run artifact...

Launch TensorBoard with:

tensorboard --logdir=/tmp/mlflow/2/389c11d3da7c4ae0af2d23cc5ec61653/artifacts/events

Logging the trained model as a run artifact...

The model is logged at:
/tmp/mlflow/2/389c11d3da7c4ae0af2d2

Now let's run some other experiments to compare

In [8]:
%%time
%%bash
export MLFLOW_TRACKING_URI=http://localhost:5000/
mlflow run --experiment-name pytorch-mnist ~/git/mlflow/examples/pytorch/ -P lr=0.01 -P epochs=20 -P momentum=0.5
mlflow run --experiment-name pytorch-mnist ~/git/mlflow/examples/pytorch/ -P lr=0.01 -P epochs=20 -P momentum=0.9
mlflow run --experiment-name pytorch-mnist ~/git/mlflow/examples/pytorch/ -P lr=0.001 -P epochs=50 -P momentum=0.9

16748

Test set: Average loss: 4.6723, Accuracy: 9835/10000 (98%)


Test set: Average loss: 4.6709, Accuracy: 9837/10000 (98%)


Test set: Average loss: 4.6700, Accuracy: 9841/10000 (98%)


Test set: Average loss: 4.6691, Accuracy: 9842/10000 (98%)


Test set: Average loss: 4.6663, Accuracy: 9832/10000 (98%)


Test set: Average loss: 4.6665, Accuracy: 9844/10000 (98%)


Test set: Average loss: 4.6653, Accuracy: 9849/10000 (98%)


Test set: Average loss: 4.6637, Accuracy: 9854/10000 (99%)


Test set: Average loss: 4.6645, Accuracy: 9846/10000 (98%)


Test set: Average loss: 4.6654, Accuracy: 9860/10000 (99%)


Test set: Average loss: 4.6635, Accuracy: 9858/10000 (99%)


Test set: Average loss: 4.6623, Accuracy: 9865/10000 (99%)


Test set: Average loss: 4.6617, Accuracy: 9858/10000 (99%)


Test set: Average loss: 4.6624, Accuracy: 9855/10000 (99%)


Test set: Average loss: 4.6616, Accuracy: 9855/10000 (99%)


Test set: Average loss: 4.6625, Accuracy: 9864/10000 (99%)


Test set: Average