# Tutorial for using MLflow on Puhti

This tutorial will guide you through using MLflow in the Puhti computing environment, offering a streamlined and centralized approach to tracking machine learning experiments. It’s tailored for machine learning practitioners who seek an efficient way to manage and monitor their experiments.

While prior experience with MLflow isn’t necessary, a basic understanding of supercomputing is recommended. We’ll explore the core components of MLflow and demonstrate their application through practical examples. You can follow along with the provided sample code or incorporate your own code into the tutorial.

### What is MLflow?

**MLflow** in an open-source tool for managing machine learning models throughout their life cycle. It has four key components that can be widely utilized, from experimenting to deploying models:

- **Tracking Server** is the core component used for tracking experiments. Results can be viewed and compared through an informative user interface or API.
	
- **Models** is for packaging the models in a unified format, making it easy to move and share them.

- **Model Registry** provides tools for registering and versioning models. The registry can also be managed through the UI.

- **Projects** is for packaging entire ML project code, enabling easy sharing and reproducibility.

For more info on components visit MLflow documentation: https://mlflow.org/docs/latest/index.html

In [10]:
import pandas as pd
import requests 
import os

from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras import layers
from keras.datasets import mnist
from keras.utils import to_categorical


import mlflow
#from mlflow.keras import log_model, save_model, autolog
#from mlflow.models.signature import infer_signature

In [2]:
""""
Määritellään MLflow 
"""

#project_id = "your_project_1234"
#mlflow.set_tracking_uri("/scratch/{project_id}/mlruns")

mlflow.set_tracking_uri("/home/sternade/Nextcloud_Kannu/Opari/mlruns") # where artifacts and metadata is stored
description = "Experimenting with different models to find the best performer on MNIST."
mlflow.set_experiment("MNIST example") # set experiment 


<Experiment: artifact_location='/home/sternade/Nextcloud_Kannu/Opari/mlruns/581032223001777713', creation_time=1723792570471, experiment_id='581032223001777713', last_update_time=1723792570471, lifecycle_stage='active', name='MNIST example', tags={}>

In [8]:
mlflow.tensorflow.autolog(every_n_iter=1, 
                          log_models=True, 
                          log_datasets=True) # https://mlflow.org/docs/latest/python_api/mlflow.tensorflow.html
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Preprocess the data
X_train = X_train / 255.
X_test = X_test / 255.
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# Let's build and compile two slightly different models to compare. You can use the examples for studying or you can add your own model.
model_1 = Sequential(
    [
        layers.Flatten(input_shape=(28, 28)),
        layers.Dense(128, activation='relu'),
        layers.Dense(10, activation='softmax')
    ]
)
model_1.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

model_2 = Sequential(
    [
        layers.Flatten(input_shape=(28, 28)),
        layers.Dense(128, activation='tanh'),
        layers.Dense(64, activation='tanh'),
        layers.Dense(10, activation='softmax')
    ]
)
model_2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

models = {model_1: "sequential_3layers", 
          model_2: "sequential_with_tanh"}

for model, run_name in models.items():
        
    with mlflow.start_run(): # trigger mlflow to start tracking the run
        mlflow.set_tag("mlflow.runName", run_name)
        print(f"Run name: {run_name}")
        print(model)

        # Train the model
        model.fit(X_train, y_train, epochs=10, batch_size=1, validation_data=(X_test, y_test))

        # Sign the signature
        #signature = infer_signature(X_train, model.predict(X_test))

        # Evaluate the model
        test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
        print('\nTest accuracy:', test_acc)

        #mlflow.tensorflow.log_model(model, "mnist_tensorflow_{run_name}", signature=signature)
        
        mlflow.end_run()

  super().__init__(**kwargs)


Run name: sequential_3layers
<Sequential name=sequential, built=True>


Epoch 1/10
[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step - accuracy: 0.9073 - loss: 0.3073



[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1080s[0m 18ms/step - accuracy: 0.9073 - loss: 0.3073 - val_accuracy: 0.9643 - val_loss: 0.1225
Epoch 2/10
[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m995s[0m 17ms/step - accuracy: 0.9656 - loss: 0.1249 - val_accuracy: 0.9668 - val_loss: 0.1375
Epoch 3/10
[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m709s[0m 11ms/step - accuracy: 0.9737 - loss: 0.0991 - val_accuracy: 0.9680 - val_loss: 0.1423
Epoch 4/10
[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m420s[0m 7ms/step - accuracy: 0.9770 - loss: 0.0918 - val_accuracy: 0.9709 - val_loss: 0.1588
Epoch 5/10
[1m59994/60000[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 7ms/step - accuracy: 0.9813 - loss: 0.0793



[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m446s[0m 7ms/step - accuracy: 0.9813 - loss: 0.0793 - val_accuracy: 0.9742 - val_loss: 0.1565
Epoch 6/10
[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m474s[0m 8ms/step - accuracy: 0.9831 - loss: 0.0727 - val_accuracy: 0.9731 - val_loss: 0.1777
Epoch 7/10
[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m486s[0m 8ms/step - accuracy: 0.9853 - loss: 0.0627 - val_accuracy: 0.9707 - val_loss: 0.2264
Epoch 8/10
[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m453s[0m 8ms/step - accuracy: 0.9877 - loss: 0.0590 - val_accuracy: 0.9700 - val_loss: 0.2345
Epoch 9/10
[1m59996/60000[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 8ms/step - accuracy: 0.9875 - loss: 0.0586



[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m482s[0m 8ms/step - accuracy: 0.9875 - loss: 0.0586 - val_accuracy: 0.9725 - val_loss: 0.2278
Epoch 10/10
[1m59999/60000[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 7ms/step - accuracy: 0.9890 - loss: 0.0546



[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m451s[0m 8ms/step - accuracy: 0.9890 - loss: 0.0546 - val_accuracy: 0.9749 - val_loss: 0.2214
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 226ms/step




313/313 - 1s - 4ms/step - accuracy: 0.9749 - loss: 0.2214

Test accuracy: 0.9749000072479248
Run name: sequential_with_tanh
<Sequential name=sequential_1, built=True>


Epoch 1/10
[1m59994/60000[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 7ms/step - accuracy: 0.8949 - loss: 0.3439



[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m474s[0m 8ms/step - accuracy: 0.8949 - loss: 0.3439 - val_accuracy: 0.9564 - val_loss: 0.1467
Epoch 2/10
[1m59993/60000[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 8ms/step - accuracy: 0.9543 - loss: 0.1527



[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m498s[0m 8ms/step - accuracy: 0.9543 - loss: 0.1527 - val_accuracy: 0.9561 - val_loss: 0.1425
Epoch 3/10
[1m59998/60000[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 8ms/step - accuracy: 0.9620 - loss: 0.1251



[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m485s[0m 8ms/step - accuracy: 0.9620 - loss: 0.1251 - val_accuracy: 0.9602 - val_loss: 0.1329
Epoch 4/10
[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - accuracy: 0.9670 - loss: 0.1102



[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1018s[0m 17ms/step - accuracy: 0.9670 - loss: 0.1102 - val_accuracy: 0.9615 - val_loss: 0.1297
Epoch 5/10
[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - accuracy: 0.9684 - loss: 0.1026



[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1257s[0m 21ms/step - accuracy: 0.9684 - loss: 0.1026 - val_accuracy: 0.9644 - val_loss: 0.1243
Epoch 6/10
[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1326s[0m 22ms/step - accuracy: 0.9685 - loss: 0.1024 - val_accuracy: 0.9622 - val_loss: 0.1417
Epoch 7/10
[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step - accuracy: 0.9727 - loss: 0.0897



[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1484s[0m 25ms/step - accuracy: 0.9727 - loss: 0.0897 - val_accuracy: 0.9633 - val_loss: 0.1300
Epoch 8/10
[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - accuracy: 0.9715 - loss: 0.0934



[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m890s[0m 15ms/step - accuracy: 0.9715 - loss: 0.0934 - val_accuracy: 0.9651 - val_loss: 0.1287
Epoch 9/10
[1m59997/60000[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 10ms/step - accuracy: 0.9741 - loss: 0.0841



[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m645s[0m 11ms/step - accuracy: 0.9741 - loss: 0.0841 - val_accuracy: 0.9663 - val_loss: 0.1268
Epoch 10/10
[1m59997/60000[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 10ms/step - accuracy: 0.9757 - loss: 0.0803



[1m60000/60000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m608s[0m 10ms/step - accuracy: 0.9757 - loss: 0.0803 - val_accuracy: 0.9683 - val_loss: 0.1153
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 224ms/step
313/313 - 2s - 6ms/step - accuracy: 0.9683 - loss: 0.1153

Test accuracy: 0.9682999849319458


In [9]:
!mlflow ui

[2024-08-27 14:55:42 +0300] [200190] [INFO] Starting gunicorn 21.2.0
[2024-08-27 14:55:42 +0300] [200190] [INFO] Listening at: http://127.0.0.1:5000 (200190)
[2024-08-27 14:55:42 +0300] [200190] [INFO] Using worker: sync
[2024-08-27 14:55:42 +0300] [200195] [INFO] Booting worker with pid: 200195
[2024-08-27 14:55:42 +0300] [200196] [INFO] Booting worker with pid: 200196
[2024-08-27 14:55:42 +0300] [200197] [INFO] Booting worker with pid: 200197
[2024-08-27 14:55:42 +0300] [200199] [INFO] Booting worker with pid: 200199
2024/08/27 14:57:16 ERROR mlflow.server: Exception on /get-artifact [GET]
Traceback (most recent call last):
  File "/home/sternade/.local/lib/python3.10/site-packages/flask/app.py", line 2190, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/sternade/.local/lib/python3.10/site-packages/flask/app.py", line 1486, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/sternade/.local/lib/python3.10/site-packages/flask/app.py", li

Tässä välissä kurkataan UIn puolella ja esitellään niitä ominaisuuksia

model_2 = Sequential(
    [
        model_2.add(Flatten(input_shape=(28, 28))),
        model_2.add(Dense(128, activation='tanh')),
        model_2.add(Dense(64, activation='tanh')),
        model_2.add(Dense(10, activation='softmax'))
    ]
)