# MLflow tutorial with Anaconda virtual environments on lcoalholst

*Based on https://www.youtube.com/watch?v=d60SAK4OOJY&t=906s

Mlflow is used to control machine leanring life cycle to to record parameters, metrics, and library dependencies to secure experimental reproducability. And Mlflow is roughly compoed fo the following four components: Mlflow tracking, mlflow projects, mlflow models, and mlflow registry. And those processes are based on storage of log data recorded in mlflow tracking. 



![Mlflow Overviw](mlflow_overview.png)

## 1, Mlflow tracking

Taking logs of ML experiments with Mlflow tracking is at the basis of Mlflow. You can check performaces of each experience by each log.
You have only to add some commands to take logs of paramters. If code is relatively, it would be btter to customize logs, but when it comes to deep learning, automaticlaly taking logs would be more convenient. 

![Mlflow Data Logging](mlflow_logs.png)

### Example: regression of wine data

Please first install necessary libraries. 

In [None]:
!pip install -r requirements.txt

The most basic code introduced in Mlflow official tutorial uses ElasticNet in scikit-learn. The model predicts quality of wine given some features of wine in the data below. 

In [2]:
import pandas as pd
csv_url = (
        "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
    )
data = pd.read_csv(csv_url, sep=";")
data.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


In [1]:
!touch requirements.txt

The program of regression is easily done by runnign train_wine_regression.py, and parameters, metrics, and trained models are logged by mlflow.log_param(), mlflow.log_metric(), mlflow.sklearn.log_modle(), respectively. 


In [12]:
!python train_wine_regression.py

Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
  RMSE: 0.7931640229276851
  MAE: 0.6271946374319586
  R2: 0.10862644997792614
Model saved in run 16ba6a65177b4b0d81bf44a8078d7294


In [13]:
!python train_wine_regression.py 0.6 0.4

Elasticnet model (alpha=0.600000, l1_ratio=0.400000):
  RMSE: 0.7928446872861473
  MAE: 0.626666444473971
  R2: 0.10934405701835759
Model saved in run 7cf97a38e2504b62bf449da7c931719a


The logs are recorded in './mlruns/0' one after another. And directories of each experiment are constructed like below. 

*There are so many options for choosing where to log these tracking data, for example different places in localhost, databases, HTTP server, Databricks workspce. So I would like to keep examples here the most basic. For further details and options, please check scenarios in https://www.mlflow.org/docs/latest/tracking.html#where-runs-are-recorded 

In [15]:
!tree mlruns #Installing tree is optional

[01;34mmlruns[0m
└── [01;34m0[0m
    ├── [01;34m16ba6a65177b4b0d81bf44a8078d7294[0m
    │   ├── [01;34martifacts[0m
    │   │   └── [01;34mmodel[0m
    │   │       ├── [00mMLmodel[0m
    │   │       ├── [00mconda.yaml[0m
    │   │       ├── [00mmodel.pkl[0m
    │   │       ├── [00mpython_env.yaml[0m
    │   │       └── [00mrequirements.txt[0m
    │   ├── [00mmeta.yaml[0m
    │   ├── [01;34mmetrics[0m
    │   │   ├── [00mmae[0m
    │   │   ├── [00mr2[0m
    │   │   └── [00mrmse[0m
    │   ├── [01;34mparams[0m
    │   │   ├── [00malpha[0m
    │   │   └── [00ml1_ratio[0m
    │   └── [01;34mtags[0m
    │       ├── [00mmlflow.log-model.history[0m
    │       ├── [00mmlflow.source.git.commit[0m
    │       ├── [00mmlflow.source.name[0m
    │       ├── [00mmlflow.source.type[0m
    │       └── [00mmlflow.user[0m
    ├── [01;34m7cf97a38e2504b62bf449da7c931719a[0m
    │   ├── [01;34martifacts[0m
    │   │   └── [01;3

In [None]:
You can check logs by accecing http://127.0.0.1:<port number> by running the following command. 

In [37]:
!mlflow ui

[2022-06-15 14:04:30 +0900] [33446] [INFO] Starting gunicorn 20.1.0
[2022-06-15 14:04:30 +0900] [33446] [INFO] Listening at: http://127.0.0.1:5000 (33446)
[2022-06-15 14:04:30 +0900] [33446] [INFO] Using worker: sync
[2022-06-15 14:04:30 +0900] [33448] [INFO] Booting worker with pid: 33448
^C
[2022-06-15 14:04:30 +0900] [33446] [INFO] Handling signal: int
[2022-06-15 14:04:30 +0900] [33448] [INFO] Worker exiting (pid: 33448)


![Mlflow UI 1](mlflow_UI.png)

### Example: neural network

When you run deep learning code, automaticlaly taking logs with mlflow.keras.autolog() is convenient. 


In [22]:
!python sample_keras.py 

Loading data...
8982 train sequences
2246 test sequences
46 classes
Vectorizing sequence data...
x_train shape: (8982, 1000)
x_test shape: (2246, 1000)
Convert class vector to binary class matrix (for use with categorical_crossentropy)
y_train shape: (8982, 46)
y_test shape: (2246, 46)
Building model...
2022-06-15 13:33:18.375569: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022/06/15 13:33:18 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '72691d8862f04bf2bf363e4bbfdb1c34', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current tensorflow workflow
Epoch 1/3
Epoch 2/3
Epoch 3/3
Test score: 0.8933367729187012
Test accuracy: 0.79162955284

In [25]:
!python sample_keras.py --batch_size 32 --train_epochs 5

Loading data...
8982 train sequences
2246 test sequences
46 classes
Vectorizing sequence data...
x_train shape: (8982, 1000)
x_test shape: (2246, 1000)
Convert class vector to binary class matrix (for use with categorical_crossentropy)
y_train shape: (8982, 46)
y_test shape: (2246, 46)
Building model...
2022-06-15 13:42:06.152310: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022/06/15 13:42:06 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '44ef713a6960438484c7630e7eb8e1fb', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current tensorflow workflow
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test score: 0.8923482894897461
Test acc

In [26]:
!mlflow ui

[2022-06-15 13:42:32 +0900] [32341] [INFO] Starting gunicorn 20.1.0
[2022-06-15 13:42:32 +0900] [32341] [INFO] Listening at: http://127.0.0.1:5000 (32341)
[2022-06-15 13:42:32 +0900] [32341] [INFO] Using worker: sync
[2022-06-15 13:42:32 +0900] [32343] [INFO] Booting worker with pid: 32343
^C
[2022-06-15 13:43:55 +0900] [32341] [INFO] Handling signal: int
[2022-06-15 13:43:56 +0900] [32343] [INFO] Worker exiting (pid: 32343)


![Keras Metrics](keras_metric_logs.png)

### 2, Mlflow Projects


Mlflow Projects allow you to package experimental environments so that other poeple easily reproduce your experiements. You newly have only to prepare MLproject file because "conda.yaml" file is automatically generated in Mlflow Tracking. "conda.yaml" file describes version dependencies of libraries, and "MLproject" file describes which 
As long as these files are prepared, you can simulate the experimental settings easily also from GitHub

![Mlflow Projects](mlflow_projects_overview.png)

This Github link has necesasry files for packaging an experimental environment, thus we can set up an environment and run code with the following command.

*This command takes a while as it set up an virtual environemnt. 

In [27]:
!mlflow run https://github.com/mlflow/mlflow-example -P alpha=0.6

2022/06/15 13:48:32 INFO mlflow.projects.utils: === Fetching project from https://github.com/mlflow/mlflow-example into /var/folders/n3/2_y2z9ns7js3zfs6xp0ls65m0000gn/T/tmpc1y5d11x ===
2022/06/15 13:48:35 INFO mlflow.utils.conda: === Creating conda environment mlflow-1abc00771765dd9dd15731cbda4938c765fbb90b ===
Collecting package metadata (repodata.json): done
Solving environment: done
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Installing pip dependencies: \ Ran pip subprocess with arguments:
['/Users/tamurataito/opt/anaconda3/envs/mlflow-1abc00771765dd9dd15731cbda4938c765fbb90b/bin/python', '-m', 'pip', 'install', '-U', '-r', '/var/folders/n3/2_y2z9ns7js3zfs6xp0ls65m0000gn/T/tmpc1y5d11x/condaenv.08y8_x67.requirements.txt']
Pip subprocess output:
Collecting mlflow
  Using cached mlflow-1.26.1-py3-none-any.whl (17.8 MB)
Collecting protobuf>=3.12.0
  Using cached protobuf-4.21.1-cp37-abi3-macosx_10_9_universal2.whl (483 kB)
Collecting requests>=2.

  from collections import Sequence
  from collections import Iterable
  from collections import Mapping, namedtuple, defaultdict, Sequence
Elasticnet model (alpha=0.600000, l1_ratio=0.100000):
  RMSE: 0.7985733780987151
  MAE: 0.6202991221099381
  R2: 0.17633705471063543
2022/06/15 13:50:56 INFO mlflow.projects: === Run (ID 'bdef03f084d34dbd8c8470cc34facbc5') succeeded ===


In [None]:
You can see the new tracking data is stored in mlruns/0 (as bdef03f084d34dbd8c8470cc34facbc5 in my case).

In [30]:
!ls mlruns/0

[34m16ba6a65177b4b0d81bf44a8078d7294[m[m [34m8cbf0a32935b43759e93adc5c09a6ccf[m[m
[34m44ef713a6960438484c7630e7eb8e1fb[m[m [34mbdef03f084d34dbd8c8470cc34facbc5[m[m
[34m72691d8862f04bf2bf363e4bbfdb1c34[m[m meta.yaml
[34m7cf97a38e2504b62bf449da7c931719a[m[m


You can check that a virtual environments is created by the command below. 

In [None]:
!conda info --envs

### 3, Mlflow Models


Mlflow Models deploy trained models for a variety of downstreaming tasks for example real-time serving through REST API, batch inference on Apache Spark. Again there so many cases assumed. In am going to take an simple example of setting up an API with the wine regression model you trained, and giving a test input throuth HTTP. 

![Mlflow Models](mlflow_models_overview.png)

We will consider a case of predincting wine quality giving data below with the model we trained.

In [35]:
{"columns": ["fixed acidity", "volatile acidity", "citric acid", "residual sugar", "chlorides", "free sulfur dioxide", "total sulfur dioxide", "density", "pH", "sulphates", "alcohol"], 
 "data": [[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}


{'columns': ['fixed acidity',
  'volatile acidity',
  'citric acid',
  'residual sugar',
  'chlorides',
  'free sulfur dioxide',
  'total sulfur dioxide',
  'density',
  'pH',
  'sulphates',
  'alcohol'],
 'data': [[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}

Pease check one of your tracked data contains all the files necessary for using Mlflow Models by looking at directories under mlruns. The "0440051e26cf4a3fa485bcce0c16220a" below is replaced with a directory you made. 





In [33]:
!ls mlruns/0/7cf97a38e2504b62bf449da7c931719a/artifacts/model

MLmodel          model.pkl        requirements.txt
conda.yaml       python_env.yaml


The trained model can be served to for example port 5001 with the command below. 
Another virtual environment is created. 

In [34]:
!mlflow models serve -m ./mlruns/0/7cf97a38e2504b62bf449da7c931719a/artifacts/model -p 5001

2022/06/15 13:56:24 INFO mlflow.models.cli: Selected backend for flavor 'python_function'
2022/06/15 13:56:25 INFO mlflow.utils.conda: === Creating conda environment mlflow-5484d1ed727ebeea670316c57c44ad16123db937 ===
Collecting package metadata (repodata.json): done
Solving environment: done


  current version: 4.11.0
  latest version: 4.13.0

Please update conda by running

    $ conda update -n base conda


Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Installing pip dependencies: | Ran pip subprocess with arguments:
['/Users/tamurataito/opt/anaconda3/envs/mlflow-5484d1ed727ebeea670316c57c44ad16123db937/bin/python', '-m', 'pip', 'install', '-U', '-r', '/Users/tamurataito/Documents/mlflow_datanomiq_demo_2/mlruns/0/7cf97a38e2504b62bf449da7c931719a/artifacts/model/condaenv.9ak3quxa.requirements.txt']
Pip subprocess output:
Collecting mlflow
  Using cached mlflow-1.26.1-py3-none-any.whl (17.8 MB)
Collecting cloudpickle==2.1.0
  Using cached cloudpi

^C
[2022-06-15 14:00:46 +0900] [33360] [INFO] Handling signal: int
[2022-06-15 14:00:46 +0900] [33369] [INFO] Worker exiting (pid: 33369)


And please open another window and type in the command below on commandline while running the command above.

In [None]:
curl -X POST -H "Content-Type:application/json; format=pandas-split"  --data '{"columns": ["fixed acidity", "volatile acidity", "citric acid", "residual sugar", "chlorides", "free sulfur dioxide", "total sulfur dioxide", "density", "pH", "sulphates", "alcohol"], "data": [[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' http://localhost:5001/invocations 


Then you will find a predicted result of wine rank by the regression model. In the case below, the predicted rank is 5.118. 

![Mlflow Models Inference](inference_API.png)

## 4, Mlflow Registry

Mlflow registry make versioning of models easier like in the window below. But in order to use Mlflow Registry, tracking data have to be saved in DB, so let me skip this this topic in this demo. 

![Mlflow Registry Versioning](oss_registry_3_overview.png)

## Further things to consider
- Considering more use cases by using Docker, Databricks etc. so that several people can work together.
- Logging and deployig trained deep learning models. 
- More practical deployment of trained modls, for example inputting a csv file and regressing data in it. 
- Versioning of those trianed models. 

Please let me know what you want to know more in detail.