# MLflow test notebook
This notebook contains the code used for testing MLflow with visualisation of the same MNIST models as the other tests.


In [None]:
# Requirements:

%pip install tensorflow mlflow

In [None]:
import tensorflow as tf
from tensorflow import keras
from keras.datasets import mnist     
from keras.models import Sequential  
from keras.layers import Dense, Dropout, Activation 
from keras.utils import np_utils 

import mlflow
from mlflow import log_params

In [None]:
# Separating the MNIST classifier dataset into training and testing sets
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()

# All values are between 0 and 255, dividing them to have values from 0 to 1:
X_train_divided = X_train / 255
X_test_divided = X_test / 255

# Setting a few lists of variables to make varying the model output easy. 
first_layer_nodes = [100, 50, 100, 50, 100, 50]
first_layer_activation = ["relu", "sigmoid", "softmax", "relu", "sigmoid", "softmax"]
second_layer_nodes = [10, 15, 25, 25, 15, 10]
second_layer_activation = ["sigmoid", "softmax", "relu", "relu", "sigmoid", "softmax"]

MLflow serves the visualisations on a localhost. To start the visualisation server run the command "mlflow server" in a terminal. This can't be done in the notebook because by adding a cell with "!mlflow server", it will just keep running that one cell and never move on to subsequent cells, like the ones with logging code. \
Once the server has started, the localhost URL needs to be specified in the code, which will default to "http://localhost:5000", followed by the creation of the experiment (you cannot use the same name twice) and the runs. \
Since MLflow is built based on TensorFlow, logging for keras models is available automatically, so in this case the automatic logging is used. \
For manual logging the code should look like this:

 mlflow.log_params({  \
        "epochs": 10, \
        "first-layer-dense": 'relu', \
        "second-layer-dense": 'sigmoid' \
    }) \
    mlflow.log_metrics({ \
        "accuracy": score['accuracy'], \
        "loss": score['loss'] \
    }) 


You may notice that in the screenshots on the report I forgot to change the name of the runs, but that can be defined when creating the runs.

In [None]:
# Setting up client and localhost
mlflow.set_tracking_uri("http://127.0.0.1:5000")
client = mlflow.tracking.MlflowClient()

# Creating and naming experiment...
experiment_name = "MLflow MNIST test"
experiment_id = mlflow.create_experiment(experiment_name)

for i in range(6):
    with mlflow.start_run(experiment_id = experiment_id, run_name = "run-{}".format(i+1)) as run:
        log_params({ # Logging hyperparameters
            "architecture": "CNN",
            "dataset":  "keras MNIST dataset",
            "first layer nodes": first_layer_nodes[i],
            "first layer activation": first_layer_activation[i],
            "second layer nodes": second_layer_nodes[i],
            "second layer activation": second_layer_activation[i],
            "optimizer": "adam",
            "loss calculation": "sparse_categorical_crossentropy",
            "epochs": 5,
        })
        # Autologging, yay!
        mlflow.tensorflow.autolog()

        # Model creation, compilation and training.
        model = Sequential([
            keras.layers.Flatten(input_shape = (28,28)),
            keras.layers.Dense(first_layer_nodes[i], first_layer_activation[i]),
            keras.layers.Dense(second_layer_nodes[i], second_layer_activation[i])
        ])
        model.compile(
            optimizer = 'adam',
            loss = 'sparse_categorical_crossentropy',
            metrics = ['accuracy']
        )
        model.fit(X_train_divided, y_train, epochs = 5, verbose = 1)
        model.evaluate(X_test_divided, y_test)

If you open a run manually (here I'm not counting it as manually because "with mlflow.start_run(...)" is used), you will also need to close the run, by using the functions "mlflow.start_run(...)" and then "mlflow.end_run()"

Lastly, one of the features of MLflow is uploading to the model registry through code instead of the client, so below we will create a model worth uploading, based on the knowledge gained from the visualisation of the 6 model variations. If you do not specify an experiment and run, this will be uploaded in the default experiment section, as a model with logged metric. To upload to the model registry, locate the model you uploaded and click on the "register" button, then select the model name (passed through the "registered_model_name" variable when logging the model).

In [None]:
# Import needed for the upload of model to the registry
from mlflow.models.signature import infer_signature

# Model creation, compilation and training, based on the best performing out of the 6 visualised so far
model_best = Sequential([
    keras.layers.Flatten(input_shape = (28,28)),
    keras.layers.Dense(100, 'relu'),
    keras.layers.Dense(10, 'sigmoid')
])
model_best.compile(
    optimizer = 'adam',
    loss = 'sparse_categorical_crossentropy',
    metrics = ['accuracy']
)
model_best.fit(X_train_divided, y_train, epochs = 5, verbose = 1)
model_best.evaluate(X_test_divided, y_test)

# Model upload
signature = infer_signature(X_test_divided, model.predict(X_test_divided))
mlflow.tensorflow.log_model(model_best, "MNIST_model", registered_model_name = "best_model", signature=signature)