# MLflow test notebook
This notebook contains the code used for testing MLflow with visualisation of the same MNIST models as the other tests.

In [None]:
# Requirements:

%pip install tensorflow mlflow

In [None]:
import tensorflow as tf
from tensorflow import keras
from keras.datasets import mnist     
from keras.models import Sequential  
from keras.layers.core import Dense, Dropout, Activation 
from keras.utils import np_utils 

import mlflow
from mlflow import log_metric, log_params, log_artifacts

In [None]:
# Separating the MNIST classifier dataset into training and testing sets
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()

# All values are between 0 and 255, dividing them to have values from 0 to 1:
X_train_divided = X_train / 255
X_test_divided = X_test / 255

# Setting a few lists of variables to make varying the model output easy. 
first_layer_nodes = [100, 50, 100, 50, 100, 50]
first_layer_activation = ["relu", "sigmoid", "softmax", "relu", "sigmoid", "softmax"]
second_layer_nodes = [10, 15, 25, 25, 15, 10]
second_layer_activation = ["sigmoid", "softmax", "relu", "relu", "sigmoid", "softmax"]

MLflow serves the visualisations on a localhost, so fist of all that needs to be specified, followed by the creation of the experiment (you cannot use the same name twice) and the runs. Since MLflow is built based on TensorFlow, logging for keras models is available automatically, so in this case the automatic logging is used. Also, it should be noted that this time I forgot to name the runs and I'm not going to go change the graphs I already put in the report, so you have the occasion to see the default naming system, what an honour.

In [None]:
# Setting up client and localhost
mlflow.set_tracking_uri("http://127.0.0.1:5000")
client = mlflow.tracking.MlflowClient()

# Creating and naming experiment...
experiment_name = "MLflow MNIST test"
experiment_id = mlflow.create_experiment(experiment_name)

for i in range(6):
    with mlflow.start_run(experiment_id = experiment_id) as run:
        log_params({ # Logging hyperparameters
            "architecture": "CNN",
            "dataset":  "keras MNIST dataset",
            "first layer nodes": first_layer_nodes[i],
            "first layer activation": first_layer_activation[i],
            "second layer nodes": second_layer_nodes[i],
            "second layer activation": second_layer_activation[i],
            "optimizer": "adam",
            "loss calculation": "sparse_categorical_crossentropy",
            "epochs": 5,
        })
        # Autologging, yay!
        mlflow.tensorflow.autolog()

        # Model creation, compilation and training.
        model = Sequential([
            keras.layers.Flatten(input_shape = (28,28)),
            keras.layers.Dense(first_layer_nodes[i], first_layer_activation[i]),
            keras.layers.Dense(second_layer_nodes[i], second_layer_activation[i])
        ])
        model.compile(
            optimizer = 'adam',
            loss = 'sparse_categorical_crossentropy',
            metrics = ['accuracy']
        )
        history = model.fit(X_train_divided, y_train, epochs = 5, verbose = 1)
        score = model.evaluate(X_test_divided, y_test)

Since the metrics are being served to a localhost connection, the visualisation graphs will only work so long as the connection is open, and once the localhost connection is no longer needed, it should be closed as follows.

In [None]:
mlflow.end_run()

Lastly, one of the features of MLflow is uploading to the model registry through code instead of the client, so below we will create a model worth uploading, based on the knowledge gained from the visualisation of the 6 model variations.

In [None]:
# Import needed for the upload of model to the registry
from mlflow.models.signature import infer_signature

# Model creation, compilation and training, based on the best performing out of the 6 visualised so far
model = Sequential([
    keras.layers.Flatten(input_shape = (28,28)),
    keras.layers.Dense(100, 'relu'),
    keras.layers.Dense(10, 'sigmoid')
])
model.compile(
    optimizer = 'adam',
    loss = 'sparse_categorical_crossentropy',
    metrics = ['accuracy']
)
history = model.fit(X_train_divided, y_train, epochs = 5, verbose = 1)
score = model.evaluate(X_test_divided, y_test)

# Model upload
signature = infer_signature(X_test_divided, model.predict(X_test_divided))
mlflow.tensorflow.log_model(model, "MNIST_best_model", signature=signature)