<img width="300px" src="https://mlflow.org/docs/latest/_static/MLflow-logo-final-black.png">

# Why Mlflow?

A machine learning product consists is not only depend on code, as standard software development does. It is a combination of not only code, but also the input data and model parameters. Organizations need to:

- Version control the data used to fuel ML models;
- Perform model and experiment tracking and versioning;
- Systematically optimize models through hyperparameter optimization;
- Deploy and monitor models in production environments and keep track of the performance.

MLOps is the name given to the processes and tools developed to manage all these components. 
In recent years, the number of tools has been growing rapidly.  
While there are a number of tools currently available for all these different purposes, Mlflow offers a set of features for individuals and teams in an attempt to solve some of these issues.

**Mlflow** is an open-source platform, backed by Databricks, for managing the lifecycle of machine learning models, through four different pillars:

- By tracking experiments to compare results (**Tracking**);
- By allowing data scientists and ML engineers to create reusable machine learning code (**Projects**);
- By defining a standard format for packaging models and send them to diverse deployment tools (**Models**);
- By providing a centralized repository to collaborative manage the lifecycle of models through versioning, stage transitions and annotations (**Registry**).

Besides providing SDK for common languages (Python, R, Java, Julia), and good integration with popular ML frameworks, such as Scikit-learn, Tensorflow, behond others, it is completely **language and library-agnostic**. One can use it with any framework and programming language, since it also provides a REST interface for exchanging metadata with the server.

<img src="https://www.ambiata.com/images/blog/mlops-tools_files/task-scope.png" style="width: 70%">

Nowadays, there are a lot of MLOps tools.

Be aware, even if a tool offers features for given task, they may tackle it with different levels of depth.   
For example, even though Kubeflow offers experiment tracking, it requires a level of DevOps expertise that most data scientist don't have. It sits on top of Kubernetes and can be seen as orchestrator for common MLOps tools. **Mlflow, on the other hand, is simple and perfect for global EDA and ML tracking**. 

Weights and Biases excels in tracking and reporting, considered to be a great tool for teams focused more on research than deliverables. Although it provides deployment capabilities, such as the ability for packaging models into Docker containers, it is not one of its strong suits.

Comet.ml is another great example of tool that provides most of the same features as Mlflow, with superior tracking capabilities. Is is, however, a proprietary and licensed tool, just like W&B.

# Is it perfect?

**NO.**

Although Mlflow is a great tool, it still lacks behind its competitors in some areas:

- It is not easy to compare different experiments.
- Even though super useful, autolog features are still experimental and sometimes buggy.
- Most plots are not embedded widgets, but stored as artifacts.
- Deployment containers are far from optimized, and they are not 100% reliable for a production environment.
- ACL for registry management is only available through Databricks managed version.

**In general, Mlflow offers a basic set of features if all you want is experiment tracking.   
On the other hand, it is an open-source, language and library-agnostic and provides an interesting set of model management features.**

# Mlflow tracking

Mlflow offers an API for tracking and recording machine learning experiments metadata, images, and artifacts.   
It also provides a nice UI for checking and querying the latter.

In [None]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_text
from sklearn import tree
from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix, classification_report, f1_score
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import mlflow
import mlflow.sklearn
from mlflow.tracking import MlflowClient
from mlflow.models.signature import infer_signature
import pandas as pd

### Scikit-learn DecisionTree

First, we need to create or set a Mlflow experiment. An experiment groups `runs`, or model trainings/executions.

In [None]:
# Set an Mlflow experiment


Following that, we need to load our dataset and split it into training and test sets.

In [None]:
# Load iris dataset
iris = load_iris()
df_iris = pd.DataFrame(iris.data, columns=iris.feature_names)
X_train, X_test, y_train, y_test = train_test_split(
        df_iris, iris.target, random_state=0)

In [None]:
df_iris.head(5)

Let's also do a quick EDA to check for missing values.

In [None]:
df_iris.isnull().sum()

In [None]:
df_iris.describe()

In [None]:
pd.Series(iris.target).value_counts()

We decided to use a Decision Tree as our classifier. Given that our feature columns are complete and we are using a tree-based classifier, there is no need for normalization/standardization of the features. Let's proceed by feeding our model the training data and getting the test results.

In [None]:
# Start run

# Define and log hyperparameters
hps = {
    'random_state': 0,
    'max_depth': 2
}

# Fit model
dt = DecisionTreeClassifier(**hps)
dt = dt.fit(X_train, y_train)

# Get test set predictions
y_pred = dt.predict(X_test)
metrics = classification_report(y_test, y_pred, output_dict=True)

# Set Mlflow tags (default and custom)

# Plot confusion matrix and log artifact
cm = confusion_matrix(y_test, y_pred, labels=dt.classes_)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=dt.classes_)


# Plot tree configuration and log figure
f, ax = plt.subplots(1, 1, figsize=(5, 5))
tree.plot_tree(dt, ax=ax)


# Define input and output signatures


# Log model


Access `http://<your-docker-machine-ip>:5000` and we will go through the Mlflow tracking UI.

### TensorFlow 

In [None]:
# TensorFlow and tf.keras
import tensorflow as tf
import mlflow.tensorflow
import mlflow.keras

We are going to load Fashion MNIST data from Tensorflow Keras datasets and normalize the training and test data.
We are loading images of fashion objects. The provided images are 3-dimensional tensors, whose values range from 0 to 255. Neural networks best behave when input values range from 0.0 to 1.0, therefore we need to normalize it.

<img src="https://www.tensorflow.org/tutorials/keras/classification_files/output_m4VEw8Ud9Quh_0.png">

In [None]:
# Load input data from tf.keras.datasets
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

In [None]:
# Normalize train and test data.
train_images = train_images / 255.0
test_images = test_images / 255.0

In [None]:
# Set a Mlflow experiment and turn on autologging


In [None]:
# Create Keras model

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10)
])
model.compile(
    optimizer='adam',
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=3)

# Notice these metrics are not logged
test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)

print('\nTest accuracy:', test_acc)

Let's go to the UI again and check how our loss and accuracy behaves during the training epochs.

Besides being an experimental feature, subject to bugs, not every metric is automatically tracked by Mlflow.   
For more information on which metrics are tracked, check the [docs](https://mlflow.org/docs/latest/tracking.html#tensorflow-and-keras-experimental).

## Hyperparameter tuning using XGBoost and HyperOpt

In [None]:
import hyperopt
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
import xgboost as xgb
import numpy as np
import mlflow.xgboost

In [None]:
# Load Iris dataset from scikit-learn and configure XGBoost data matrices
iris = load_iris()
df_iris = pd.DataFrame(iris.data, columns=iris.feature_names)
X_train, X_test, y_train, y_test = train_test_split(
        df_iris, iris.target, random_state=0)

In [None]:
# Set hyperparameter space
space={
    'max_depth': hp.quniform("max_depth", 2, 6, 1),
    'gamma': hp.uniform ('gamma', 1, 9),
    'reg_alpha' : hp.quniform('reg_alpha', 0, 5, 1),
    'reg_lambda' : hp.uniform('reg_lambda', 0, 1),
    'colsample_bytree' : hp.uniform('colsample_bytree', 0.2, 1)
}

In [None]:
# Set Mlflow experiment


In [None]:
# Configure training data matrices (always after setting up autolog to infer signature)
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

In [None]:
# Create training function
def hyperparameter_tuning(space):
    
    # specify parameters via map
    param = {
        'max_depth': int(space['max_depth']),
        'gamma': space['gamma'],
        'reg_alpha': space['reg_alpha'],
        'reg_lambda': space['reg_lambda'],
        'colsample_bytree': space['colsample_bytree'],
        'min_child_weight': 3,
        'eta': 0.3,  # the training step for each iteration
        'verbosity': 1,  # logging mode - quiet
        'objective': 'multi:softprob',  # error evaluation for multiclass training
        'eval_metric': 'mlogloss',
        'num_class': pd.Series(iris.target).nunique()
    }
    num_round = 2
    bst = xgb.train(param, dtrain, num_round)

    # make prediction
    preds = bst.predict(dtest)
    best_preds = np.asarray([np.argmax(line) for line in preds])
    f1 = f1_score(best_preds, y_test, average='macro')

    #change the metric if you like
    return {'loss': -f1, 'status': STATUS_OK, 'model': bst}

In [None]:
# Run hyperparameter optimization with HyperOpt
trials = Trials()
best = fmin(
    fn=hyperparameter_tuning,
    space=space,
    algo=tpe.suggest,
    max_evals=100,
    trials=trials
)

Let's now access the UI and plot the hyperparameters and metrics.   
Check which hyperparameter ranges had the most impact on the model F1 score.


Finally, we want to retrive the best model, but we also need to know the ID of our experiment.

In [None]:
# List experiments


In [None]:
# Search through experiment runs


Having access to the `run_id` of the best performing model in our experiment, we can again get the model artifact, as well as other artifacts stored in our repo.

In [None]:
# Get best model artifact


In [None]:
# Compute again the F1-Score against the test set
best_preds = np.asarray([np.argmax(line) for line in bst.predict(dtest)])
f1_score(best_preds, y_test, average='macro')

In [None]:
# Download feature importance image artifact


In [None]:
# Display image artifact


## Mlflow Projects

Our next goal is to create a reproducible code base for our model. Mlflow provides a format for packaging data science code so that we are able to easily reuse it.

We only need four things:

- A folder with the name of the project;
- A `conda.yml` file or a docker image for the running environment;
- A `.sh` or `.py` entrypoint file;
- A `MLproject` file that contains the project definition.

Having all these, we can simply run in our terminal:

```
mlflow run <PROJECT_NAME> --experiment-name <EXPERIMENT_NAME> [-P parameter1=value1 ...]
```

## Mlflow Models

An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools—for example, real-time serving through a REST API or batch inference on Apache Spark. The format defines a convention that lets you save a model in different "flavors" that can be understood by different downstream tools.

Let's check our XGBoost model artifacts.

In [None]:
# Load Python function version of XGBoost model


In [None]:
# Print model


In [None]:
# Predict with sample input


Mlflow Models provide a standard format so that one can load a model in different flavors. For example:
- A pickled scikit-learn object that can be loaded into a Scikit-learn pipeline.
- A generic Python function that can be loaded into any compatible Python environment, or any of the available deployment tools.

If any of the available flavors that Mlflow provides does not fit your needs, one can define a custom model:

In [None]:
import mlflow.pyfunc

# Define the model class


# Construct and save the model


In [None]:
# Load the model in `python_function` format
loaded_model = mlflow.pyfunc.load_model(model_path)

# Evaluate the model
model_input = pd.DataFrame([range(10)])
model_output = loaded_model.predict(model_input)
assert model_output.equals(pd.DataFrame([range(5, 15)]))

## Built-in deployment tools

Mlflow provides a series of built-in deployment tools, so that one can serve a model locally, or remotely in Azure ML, AWS SageMaker, or as a Apache Spark UDF.

The tool builds a Docker images with REST API endpoints using Mlflow Python functions, that accepts data in multiple formats as POST input to the `/invocations` endpoint path:

| Description                                                    | Content-Type
|:---------------------------------------------------------------|:-------------------------------------------------------------------------|
| JSON-serialized pandas DataFrames in the split orientation <br />  `pandas_df.to_json(orient='split')`     | `application/json` or `application/json; format=pandas-split` |
| JSON-serialized pandas DataFrames in the records orientation <br />   | `application/json; format=pandas-records`                                |
| CSV-serialized pandas DataFrames <br /> `pandas_df.to_csv()`                           | `text/csv`                                                                |
| Tensor input formatted as described in TF Serving’s API docs. | `application/json`|

To serve a model locally, one can simply run:

```
mlflow models serve -m "models:/<MODEL_NAME>/<MODEL_VERSION_OR_STAGE>" -p 1234
```

Where `<MODEL_VERSION_OR_STAGE>` may correspond to the model version or the stage (Staging, Production).

In [None]:
r = requests.post(
    'http://127.0.0.1:1234/invocations',
    headers={'Content-Type': 'application/json'},
    data=X_test.to_json(orient='split'),
)

In [None]:
r.json()

In [None]:
# Get latest versions on Production


# Resources and references

- [Mlflow](https://mlflow.org)
- [ML workspace](https://github.com/ml-tooling/ml-workspace)
- [mlflow-docker](https://github.com/Toumash/mlflow-docker)
- [Ambiata - MLOps tools](https://www.ambiata.com/blog/2020-12-07-mlops-tools/)
- [The Cheesy analogy of Mlflow and Kubeflow](https://servian.dev/the-cheesy-analogy-of-mlflow-and-kubeflow-715a45580fbe)
- [Machine learning tools comparison](https://www.netguru.com/blog/machine-learning-tools-comparison)
- [Decision Trees - scikit-learn](https://scikit-learn.org/stable/modules/tree.html#classification)
- [Basic classification: Classify images of clothing](https://www.tensorflow.org/tutorials/keras/classification)
- [HyperParameter Tuning — Hyperopt Bayesian Optimization](https://medium.com/analytics-vidhya/hyperparameter-tuning-hyperopt-bayesian-optimization-for-xgboost-and-neural-network-8aedf278a1c9)