# MLFlow Tracking 

<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/mlflow-tracking.png" width="600" />

## What you will learn in this course 🧐🧐

Now that we have a remote server up and running, it's time to have some fun! In this course we will learn how to:

* Monitor your ML workflow 
* Collaborate on ML projects
* What is MLFlow models
* Log a model to MLFlow tracking

## Use MLFlow Tracking 

### Reminder 💡

Before you dive deeper into this course, remember that we set up a remote tracking server on Heroku. Go back to this course if you haven't set it up yet. Each time you will be running your code, you will simply need to refresh the page our remote tracking server to see changes.

**Optional (but definitely advised)**: Although you don't have to run mlflow on a Docker environment, we definitely advise you to do so! It will help you standardize your workflow and ease your work later on in the course. If you haven't built an image yet, use `jedha/sample-mlflow-server`. Most likely, your running container command will look like this: 

```bash
docker run -it\
 -p 4000:4000\
 -v "$(pwd):/home/app"\
 -e APP_URI="APP_URI"\
 -e AWS_ACCESS_KEY_ID="AWS_ACCESS_KEY_ID"\
 -e AWS_SECRET_ACCESS_KEY="AWS_SECRET_ACCESS_KEY"\
 sample-mlflow-server python train.py
```

### Our project

Let's start by simply loading <a href="https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html#sklearn.datasets.load_iris" target="_blank">some data</a> from `sklearn`:

In [5]:
import mlflow
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Load Iris dataset
iris = load_iris()

# Split dataset into X features and Target variable
X = pd.DataFrame(data = iris["data"], columns= iris["feature_names"])
y = pd.Series(data = iris["target"], name="target")

# Split our training set and our test set 
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Visualize dataset 
X_train.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
84,5.4,3.0,4.5,1.5
3,4.6,3.1,1.5,0.2
66,5.6,3.0,4.5,1.5
135,7.7,3.0,6.1,2.3
95,5.7,3.0,4.2,1.2


Now to track your training, what you can do is simply to add:

In [6]:
os.environ["APP_URI"] = "https://testmlflowserverrailway-production.up.railway.app"

In [None]:
# Set your variables for your environment
EXPERIMENT_NAME="my-first-mlflow-experiment"

# Set tracking URI to your Heroku application
mlflow.set_tracking_uri(os.environ["APP_URI"])

# Set experiment's info 
mlflow.set_experiment(EXPERIMENT_NAME)

# Get our experiment info
experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)

# Call mlflow autolog
mlflow.sklearn.autolog()

with mlflow.start_run(experiment_id = experiment.experiment_id):

    # Instanciate and fit the model 
    lr = LogisticRegression()
    lr.fit(X_train.values, y_train.values)

    # Store metrics 
    predicted_qualities = lr.predict(X_test.values)
    accuracy = lr.score(X_test.values, y_test.values)

    # Print results 
    print("LogisticRegression model")
    print("Accuracy: {}".format(accuracy))

## Log Metrics 

As of now, we don't have much in our MLFLow UI as well as in our project directory. This is because we haven't logged anything yet. We are going to show you how to log a metric first. 

Do you remember? A metric is something you use to assess the performance of your model. In our case, we use the `accuracy`.

To log a metric, we call:

```python
mlflow.log_metric("METRIC_NAME", metric)
```

That's all. Here is how it looks in the code above:

In [None]:
# Set your variables for your environment
EXPERIMENT_NAME="my-first-mlflow-experiment"

# Set tracking URI to your Heroku application
mlflow.set_tracking_uri(os.environ["APP_URI"])

# Set experiment's info 
mlflow.set_experiment(EXPERIMENT_NAME)

# Get our experiment info
experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)

# Call mlflow autolog
mlflow.sklearn.autolog()

with mlflow.start_run(experiment_id = experiment.experiment_id):
    # Instanciate and fit the model 
    lr = LogisticRegression()
    lr.fit(X_train.values, y_train.values)

    # Store metrics 
    predicted_qualities = lr.predict(X_test.values)
    accuracy = lr.score(X_test.values, y_test.values)

    # Print results 
    print("LogisticRegression model")
    print("Accuracy: {}".format(accuracy))

    # Log Metric 
    mlflow.log_metric("Accuracy", accuracy)

## Log Parameters 

You can also log parameters of your model to see which one where useful to improve your model's performance. The same way you would do it with metrics, you can log parameters by using: 

```python
mlflow.log_param("PARAM_NAME", param)
```

In [None]:
# Set your variables for your environment
EXPERIMENT_NAME="my-first-mlflow-experiment"

# Set tracking URI to your Heroku application
mlflow.set_tracking_uri(os.environ["APP_URI"])

# Set experiment's info 
mlflow.set_experiment(EXPERIMENT_NAME)

# Get our experiment info
experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)

# Call mlflow autolog
mlflow.sklearn.autolog()

with mlflow.start_run(experiment_id = experiment.experiment_id):
    # Specified Parameters 
    c = 0.5

    # Instanciate and fit the model 
    lr = LogisticRegression(C=c)
    lr.fit(X_train.values, y_train.values)

    # Store metrics 
    predicted_qualities = lr.predict(X_test.values)
    accuracy = lr.score(X_test.values, y_test.values)

    # Print results 
    print("LogisticRegression model")
    print("Accuracy: {}".format(accuracy))

    # Log Metric 
    mlflow.log_metric("Accuracy", accuracy)

    # Log Param
    mlflow.log_param("C", c)

LogisticRegression model
Accuracy: 1.0


If you go to your MLFlow UI again, you should see the following screen: 

![](https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/mlflow-ui-log-params.png)

Congratulations! You know how to track your models. This could be useful for your future projects 😉.

## MLFlow Models

Let's talk about MlFlow models. The goal of this component is to provide a standard format for your Machine Learning models. This is especially useful when you want to deploy them later on any platform your company uses.

<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/mlflow-models.png" width="600" />

### MLFlow flavors 

Before going into more details on MLFlow, let's talk about ***flavors***. These are basically mlflow tuned to the most widely know Machine Learning libraries. Here are all the flavors:

<ul class="simple">
<li><a href="https://mlflow.org/docs/latest/models.html#python-function-python-function">Python Function (<code>python_function</code>)</a></li>
<li><a href="https://mlflow.org/docs/latest/models.html#r-function-crate">R Function (<code>crate</code>)</a></li>
<li><a href="https://mlflow.org/docs/latest/models.html#h2o-h2o">H<sub>2</sub>O (<code>h2o</code>)</a></li>
<li><a href="https://mlflow.org/docs/latest/models.html#keras-keras">Keras (<code>keras</code>)</a></li>
<li><a href="https://mlflow.org/docs/latest/models.html#mleap-mleap">MLeap (<code>mleap</code>)</a></li>
<li><a href="https://mlflow.org/docs/latest/models.html#pytorch-pytorch">PyTorch (<code>pytorch</code>)</a></li>
<li><a href="https://mlflow.org/docs/latest/models.html#scikit-learn-sklearn">Scikit-learn (<code>sklearn</code>)</a></li>
<li><a href="https://mlflow.org/docs/latest/models.html#spark-mllib-spark">Spark MLlib (<code>spark</code>)</a></li>
<li><a href="https://mlflow.org/docs/latest/models.html#tensorflow-tensorflow">TensorFlow (<code>tensorflow</code>)</a></li>
<li><a href="https://mlflow.org/docs/latest/models.html#onnx-onnx">ONNX (<code>onnx</code>)</a></li>
<li><a href="https://mlflow.org/docs/latest/models.html#mxnet-gluon-gluon">MXNet Gluon (<code>gluon</code>)</a></li>
<li><a href="https://mlflow.org/docs/latest/models.html#xgboost-xgboost">XGBoost (<code>xgboost</code>)</a></li>
<li><a href="https://mlflow.org/docs/latest/models.html#lightgbm-lightgbm">LightGBM (<code>lightgbm</code>)</a></li>
<li><a href="https://mlflow.org/docs/latest/models.html#catboost-catboost">CatBoost (<code>catboost</code>)</a></li>
<li><a href="https://mlflow.org/docs/latest/models.html#spacy-spacy">Spacy(<code>spaCy</code>)</a></li>
<li><a href="https://mlflow.org/docs/latest/models.html#fastai-fastai">Fastai(<code>fastai</code>)</a></li>
<li><a href="https://mlflow.org/docs/latest/models.html#statsmodels-statsmodels">Statsmodels (<code>statsmodels</code>)</a></li>
<li><a href="https://mlflow.org/docs/latest/models.html#prophet-prophet">Prophet (<code>prophet</code>)</a></li>
</ul>

These classes are extremely useful as they come with built-in methods that you can use. We will show two of them in the next section 👇

### Log your MLFlow model ⏲️

If you want to be able to format your model, you will need to log it into MLFlow Tracking. This is actually very easy to do. You will simply call: `mlflow.flavors.log()` method:

In [None]:
# Set your variables for your environment
EXPERIMENT_NAME="my-first-mlflow-experiment"

# Set tracking URI to your Heroku application
mlflow.set_tracking_uri(os.environ["APP_URI"])

# Set experiment's info 
mlflow.set_experiment(EXPERIMENT_NAME)

# Get our experiment info
experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)

# Call mlflow autolog
mlflow.sklearn.autolog()

with mlflow.start_run(experiment_id = experiment.experiment_id):
    # Specified Parameters 
    c = 0.5

    # Instanciate and fit the model 
    lr = LogisticRegression(C=c)
    lr.fit(X_train.values, y_train.values)

    # Store metrics 
    predicted_qualities = lr.predict(X_test.values)
    accuracy = lr.score(X_test.values, y_test.values)

    # Print results 
    print("LogisticRegression model")
    print("Accuracy: {}".format(accuracy))

    # Log Metric 
    mlflow.log_metric("Accuracy", accuracy)

    # Log Param
    mlflow.log_param("C", c)

    # Log model 
    mlflow.sklearn.log_model(lr, "model")

## Autologs

As you are running experiments, logging manually lots of parameters, metrics and models can be cumbersome. Fortunately, there is a solution to this problem. With each MLFlow flavor comes a great method `autolog()`. Therefore, you can simply do `mlflow.flavor_name.autolog()`. In the above example:

In [None]:
# Set your variables for your environment
EXPERIMENT_NAME="my-first-mlflow-experiment"

# Set tracking URI to your Heroku application
mlflow.set_tracking_uri(os.environ["APP_URI"])

# Set experiment's info 
mlflow.set_experiment(EXPERIMENT_NAME)

# Get our experiment info
experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)

# Call mlflow autolog
mlflow.sklearn.autolog()

with mlflow.start_run(experiment_id = experiment.experiment_id):
    # Specified Parameters 
    c = 0.5

    # Instanciate and fit the model 
    lr = LogisticRegression(C=c)
    lr.fit(X_train.values, y_train.values)

    # Store metrics 
    predicted_qualities = lr.predict(X_test.values)
    accuracy = lr.score(X_test.values, y_test.values)

    # Print results 
    print("LogisticRegression model")
    print("Accuracy: {}".format(accuracy))

You should see many more information on this last run 😉

## Resources 📚📚

* <a href="https://mlflow.org/docs/latest/tutorials-and-examples/tutorial.html" target="_blank">Mlflow Tutorial</a>
* <a href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html" target="_blank">Logistic Regression</a>
* <a href="https://www.youtube.com/watch?v=859OxXrt_TI" target="_blank">MLflow: An Open Platform to Simplify the Machine Learning Lifecycle</a>
* <a href="https://mlflow.org/docs/latest/tutorials-and-examples/index.html" target="_blank">Tutorials & Example</a>