# MLFlow Intro 

## ML development challanges
<ul>
    <li><h4>Lots of develpment tools</h4></li>
    <li><h4>Hard to track and reproduce results</h4></li>
</ul>

## MLFlow 
<ul>
    <li><h4>Opensource machine learning platform</h4></li>
    <li><h4>Works with any ML library</h4></li>
    <li><h4>Runs the same everywhere</h4></li>
    <li><h4>Allows for collaboration</h4></li>
</ul>

<img src="https://databricks.com/wp-content/uploads/2018/06/mlflow.png">

python -m ipykernel install --user --name myenv --display-name "Python (workshop)

Use mlflow using pip: <br>


In [60]:
#!pip install mlflow

# MLFlow Tracking 

<h3>Key components in tracking</h3>
<ul>
    <li><h4>Parameters - input to code</h4></li>
    <li><h4>Metrics - can change over time</h4></li>
    <li><h4>Artifacts - files, inlcuding models</h4></li>
    <li><h4>Source - what produced the run</h4></li>
</ul>

## imports


In [61]:
import mlflow
import mlflow.sklearn

import numpy as np
import pandas as pd

from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split

Working with remote server:

In [62]:
#!mlflow server -h 0.0.0.0 -p 5050

In [63]:
def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2

In [64]:
def build_model(alpha, l1_ratio):
    model = ElasticNet(alpha=alpha, l1_ratio=l1_ratio)
    return model

## Data

In [65]:
df = pd.read_csv("wine-quality.csv")
df.describe()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
count,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0
mean,6.854788,0.278241,0.334192,6.391415,0.045772,35.308085,138.360657,0.994027,3.188267,0.489847,10.514267,5.877909
std,0.843868,0.100795,0.12102,5.072058,0.021848,17.007137,42.498065,0.002991,0.151001,0.114126,1.230621,0.885639
min,3.8,0.08,0.0,0.6,0.009,2.0,9.0,0.98711,2.72,0.22,8.0,3.0
25%,6.3,0.21,0.27,1.7,0.036,23.0,108.0,0.991723,3.09,0.41,9.5,5.0
50%,6.8,0.26,0.32,5.2,0.043,34.0,134.0,0.99374,3.18,0.47,10.4,6.0
75%,7.3,0.32,0.39,9.9,0.05,46.0,167.0,0.9961,3.28,0.55,11.4,6.0
max,14.2,1.1,1.66,65.8,0.346,289.0,440.0,1.03898,3.82,1.08,14.2,9.0


In [66]:
train, test = train_test_split(df)

X_train = train.drop(['quality'], axis=1)
X_test = test.drop(['quality'], axis=1)
y_train = train['quality']
y_test = test['quality']

## Hyperparameters

In [67]:
ALPHA = 0.2
L1_RATIO = 0.6

## Main workflow 

In [68]:
with mlflow.start_run(nested=True):
    model = build_model(ALPHA, L1_RATIO)
    model.fit(X_train, y_train)

    #logging parameters
    mlflow.log_param('alpha', ALPHA)
    mlflow.log_param('l1-ratio', L1_RATIO)


    y_predicted = model.predict(X_test)
    (rmse, mae, r2) = eval_metrics(y_test, y_predicted)

    #logging metrics
    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2", r2)
    mlflow.log_metric("mae", mae)

    #logging artifacts
    mlflow.log_artifact('wine-quality.csv')

    mlflow.sklearn.log_model(model, "model")

In [69]:
!ls 

Untitled.ipynb     conda.yaml~        [34mmlflow[m[m             [34mmlruns[m[m
conda.yaml         introduction.ipynb [34mmlflow2[m[m            wine-quality.csv



http://localhost:5000

Cleanup

In [70]:
!rm -rf mlruns
!ls

Untitled.ipynb     conda.yaml~        [34mmlflow[m[m             wine-quality.csv
conda.yaml         introduction.ipynb [34mmlflow2[m[m


# MLFlow Projects

<p>Projects gives us high level format for reproducing runs on different platforms</p>

<b>setup</b>

In [71]:
!git clone https://github.com/greghop/mlflow.git
!ls mlflow

fatal: destination path 'mlflow' already exists and is not an empty directory.
MLproject        conda.yaml       wine-quality.csv
README.md        train.py


<b>MLproject</b>

In [72]:
!cat mlflow/MLproject

name: tutorial

conda_env: conda.yaml

entry_points:
  main:
    parameters:
      alpha: float
      l1_ratio: {type: float, default: 0.1}
    command: "python train.py {alpha} {l1_ratio}"


<b>conda.yaml<b>

In [73]:
!cat mlflow/conda.yaml

name: tutorial
channels:
  - defaults
dependencies:
  - cloudpickle=0.6.1
  - python=3.6
  - numpy=1.14.3
  - pandas=0.22.0
  - scikit-learn=0.19.1
  - pip:
    - mlflow


http://localhost:5000

In [76]:
!python mlflow/train.py 0.8 0.4

Traceback (most recent call last):
  File "mlflow/train.py", line 10, in <module>
    import mlflow
ModuleNotFoundError: No module named 'mlflow'


<b>Running MLflow project</b> <br><br>
mlflow run URI [OPTIONS] <br>

<ul>
<li>-P name=value</li>
<li>-e NAME</li>
</ul>

In [79]:

!mlflow run mlflow/ -P alpha=0.6 -P l1_ratio=0.7

2019/04/10 14:29:57 INFO mlflow.projects: === Created directory /var/folders/nh/p1tcdzns0ns99399dh81m56r0000gn/T/tmpp93piraw for downloading remote URIs passed to arguments of type 'path' ===
2019/04/10 14:29:57 INFO mlflow.projects: === Running command 'source activate mlflow-b93852916f9be8ee2359db52b5dfab5589743459 && python train.py 0.6 0.7' in run with ID 'f119f41166db48369bd85725edfb674d' === 
  env = yaml.load(_conda_header)
Elasticnet fantastic 3 model (alpha=0.600000, l1_ratio=0.700000):
  RMSE: 0.8591606388045732
  MAE: 0.648352939480482
  R2: 0.04661433685958705
2019/04/10 14:30:02 INFO mlflow.projects: === Run (ID 'f119f41166db48369bd85725edfb674d') succeeded ===


In [80]:
!mlflow run https://github.com/greghop/mlflow.git -P alpha=0.1 -P l1_ratio=0.5

2019/04/10 14:31:19 INFO mlflow.projects: === Fetching project from https://github.com/greghop/mlflow.git into /var/folders/nh/p1tcdzns0ns99399dh81m56r0000gn/T/tmpgkrhu_x4 ===
2019/04/10 14:31:22 INFO mlflow.projects: === Created directory /var/folders/nh/p1tcdzns0ns99399dh81m56r0000gn/T/tmpn2spbovb for downloading remote URIs passed to arguments of type 'path' ===
2019/04/10 14:31:22 INFO mlflow.projects: === Running command 'source activate mlflow-b93852916f9be8ee2359db52b5dfab5589743459 && python train.py 0.1 0.5' in run with ID 'efc009cf47b74603bd89ef39710ab103' === 
  env = yaml.load(_conda_header)
Elasticnet fantastic 3 model (alpha=0.100000, l1_ratio=0.500000):
  RMSE: 0.7845017946547458
  MAE: 0.6150949836730213
  R2: 0.2051086790093516
2019/04/10 14:31:24 INFO mlflow.projects: === Run (ID 'efc009cf47b74603bd89ef39710ab103') succeeded ===


Clean

In [83]:
rm -rf mlruns


# Multistep workflow

In [84]:
!git clone https://github.com/greghop/mlflow2.git
!ls mlflow2

fatal: destination path 'mlflow2' already exists and is not an empty directory.
MLproject        conda.yaml       main.py          wine-quality.csv
README.md        etl.py           train.py


In [85]:
!cat mlflow2/MLproject

name: multistep

conda_env: conda.yaml

entry_points:
  etl:
    parameters:
      scaler: {type: int, default: 1}
    command: "python etl.py --scaler {scaler}"
 
  train:
    parameters:
      run-id: string
      alpha: {type: float, default: 0.1}
      l1-ratio: {type: float, default: 0.1}
    command: "python train.py --run-id {run-id} --alpha {alpha} --l1-ratio {l1-ratio}"

  main: 
    parameters:
      alpha: {type: float, default: 0.1}
      l1-ratio: {type: float, default: 0.1}
    command: "python main.py  --alpha {alpha} --l1-ratio {l1-ratio}"

In [87]:
!mlflow run mlflow2/ -e etl 

2019/04/08 17:32:22 INFO mlflow.projects: === Created directory /var/folders/nh/p1tcdzns0ns99399dh81m56r0000gn/T/tmpfp3d7zyv for downloading remote URIs passed to arguments of type 'path' ===
2019/04/08 17:32:22 INFO mlflow.projects: === Running command 'source activate mlflow-df90610eb3183421bcbf1eef16dc332bc5193c11 && python etl.py --scaler 1' in run with ID 'e492821cb17b405e982250006a1651d7' === 
2019/04/08 17:32:24 INFO mlflow.projects: === Run (ID 'e492821cb17b405e982250006a1651d7') succeeded ===


In [None]:
!mlflow run mlflow2/ -e etl -P scaler='robust'


Simply go to <h3>http://localhost:5000</h3>