# Training and tracking a Keras classifier with MLflow

This notebook demonstrates how to use MLflow for tracking experiment using MLflow in Azure Machine Learning with the popular MNIST problem.

In [None]:
# Ensure you have the dependencies for this notebook
%pip install -r keras_mnist_with_mlflow.txt

## Configuring the experiment

Let's get started. It's always a good idea to start by configuring the name of the experiment we are working with in MLflow. Experiments allows you to organize runs in a comprehensive way so you can compare different experiment's runs with different parameters and configuration. MLflow configures the default experiment named "Default" but you can change this name.

In [None]:
import mlflow

mlflow.set_experiment(experiment_name="keras-mnist-classifier")

## Exploring the data

In [None]:
import tensorflow as tf
import tensorflow.keras as keras
import pandas as pd

In [None]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train.shape

As usual, let's ensure our predictors are normalized in the range [0,1]

In [None]:
x_train, x_test = x_train / 255.0, x_test / 255.0

## Training a model

We are going to use autologging capabilities in MLflow to track parameters and metrics:

In [None]:
mlflow.tensorflow.autolog()

Let's create a simple classifier and train it:

In [None]:
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(128, activation="relu"),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation="softmax"),
    ]
)

Let's compile this model:

In [None]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])

As soon as the `train` method is executed, MLflow will stat a run in Azure ML to start tracking the experiment's run. However, it is always a good idea to start the run manually so you have the run ID at hand quickly. This is not required though.

> Important: When running training routines in Azure ML as jobs, you don't need to start or end the run in your training code as it is automatically done for you by Azure ML.

In [None]:
run = mlflow.start_run()

In [None]:
model.fit(x_train, y_train, epochs=1)

Let's now evaluate the model

In [None]:
model.evaluate(x_test, y_test, verbose=2)

Once done with the training, let's end the run:

> Important: Remember that when training with jobs, you should not start/end runs manually.

In [None]:
mlflow.end_run()

## Exploring the experiment with MLFlow

To see what's has been logged, we can query the run again:

In [None]:
run = mlflow.get_run(run.info.run_id)

Let's explore the parameters that got logged:

In [None]:
pd.DataFrame(data=[run.data.params], index=["Value"]).T

Let's explore the metrics values:

In [None]:
pd.DataFrame(data=[run.data.metrics], index=["Value"]).T

Let's explore artifacts that got logged in the run. This requires to use the MLflow client:

In [None]:
client = mlflow.tracking.MlflowClient()
client.list_artifacts(run_id=run.info.run_id)

As you can see in this example, three artifacts are availble in the run:

* `model`, the path where the model is stored. Note that this artifact is a directory.
* `model_summary.txt` -> Contains a summary of the training process of the TensorFlow model. This is TensorFlow 
* `tensorboard_logs` -> The TensorBoard logs. Note that this artifact is a directory.
specific.

You can download any artifact using the method `download_artifact`

In [None]:
file_path = mlflow.artifacts.download_artifacts(
    run_id=run.info.run_id, artifact_path="model_summary.txt"
)

Since the artifact is an image, we can display it in the following way:

In [None]:
with open(file_path, "r") as f:
    print(f.readlines())

## Loading the model back

`autolog` has also logged the model for us, let's try to get it back

In [None]:
classifier = mlflow.keras.load_model(f"runs:/{run.info.run_id}/model")

See that the type returned by this method is an XGBoost model's classifier

In [None]:
type(classifier)

You can get prediction back from the model

In [None]:
classifier.predict(x_test)

We can get the classes with:

In [None]:
classifier.predict(x_test).argmax(axis=-1)

## Logging models with preprocessing

As can be seen, MLflow automatically logs models for you, but some times you need to log a different model, specially when you are doing preprocessing. In this example we did some feature scaling before applying a model. That will also be required when de model performs inference.

To ensure the model works as expected during deployment, we need to ensure those steps are also applied. In the following example, a new model is constructed using the previous one, but now the pre and post processing steps has been added to the sequential model:

In [None]:
new_model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Rescaling(1.0 / 255.0),
        classifier,
        tf.keras.layers.Lambda(lambda x: tf.math.argmax(x, axis=-1)),
    ]
)

Let's compile it:

In [None]:
new_model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])

Let's test this new model with the original data:

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

Run the model:

In [None]:
predictions = new_model.predict(x_test)
predictions

Then, in the training routine we cal also log this new model manually.