# Tutorial for using MLflow on Puhti

This tutorial will guide you through using MLflow in the Puhti computing environment, offering a streamlined and centralized approach to tracking machine learning experiments. It’s tailored for machine learning practitioners who seek an efficient way to manage and monitor their experiments.

While prior experience with MLflow isn’t necessary, a basic understanding of supercomputing is recommended. We’ll explore the core components of MLflow and demonstrate their application through practical examples. You can follow along with the provided sample code or incorporate your own code into the tutorial.

### What is MLflow?

**MLflow** in an open-source tool for managing machine learning models throughout their life cycle. It has four key components that can be widely utilized, from experimenting to deploying models:

- **Tracking Server** is the core component used for tracking experiments. Results can be viewed and compared through an informative user interface or API.
	
- **Models** is for packaging the models in a unified format, making it easy to move and share them.

- **Model Registry** provides tools for registering and versioning models. The registry can also be managed through the UI.

- **Projects** is for packaging entire ML project code, enabling easy sharing and reproducibility.

By organizing your work into ***experiments*** and ***runs***, MLflow ensures that you can systematically track progress, compare results, and refine your models effectively.

For more info on components visit [MLflow documentation](https://mlflow.org/docs/latest/introduction/index.html#what-is-mlflow).

In [None]:
# Let's make sure we have the latest version of MLflow (>2.15.1) in use. If not, run pip install below:
!mlflow --version
#!pip install --upgrade mlflow

In [3]:
import pandas as pd
import requests 
import os

from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras import layers
from keras.datasets import mnist
from keras.utils import to_categorical

import mlflow

---

## MLflow Tracking Server & Models

### About Storing Artifacts and Metadata

In the example code, we use local storage for both metadata and artifacts. Additionally, MLflow supports various options for selecting the backend and artifact storage.

- **Backend Store**: This is where ***metadata*** is stored, including information about each run. By default, MLflow saves this metadata locally in the `mlruns` directory. However, it can be configured to use an external database, such as PostgreSQL or MySQL. You can read more about backend stores here: [MLflow Backend Stores](https://mlflow.org/docs/latest/tracking/backend-stores.html#backend-stores).

- **Artifact Store**: This is where the ***artifacts*** generated during runs—such as model weights, model files, and data files—are stored. The **Models component** is used to package these model files in a standardized format. Similar to the backend store, MLflow defaults to using the local `mlruns` directory for artifact storage, but it can be set to use external storage, such as S3 object storage like Allas. For more information, refer to [MLflow Artifact Stores](https://mlflow.org/docs/latest/tracking/artifacts-stores.html) and the [CSC documentation on Allas](https://docs.csc.fi/computing/allas/) and [using Allas with Python and Boto3](https://docs.csc.fi/data/Allas/using_allas/python_boto3/).

In the example code below, local directory is used for the backend and artifact storage.

In [4]:
""""
First, we activate the Tracking Server component:

Start by setting the tracking URI, which defines the path where MLflow will create the 'mlruns' directory to store metadata and artifacts generated during runs. If no path is provided, MLflow will create the 'mlruns' directory in the location where the code is executed.

Next, set up an experiment under which the upcoming training runs will be logged. If the experiment does not already exist, it will be created.
"""

# Set tracking URI
project_id = "project_2001234" # Insert your project_id here
mlruns_uri = f"scratch/{project_id}/path/to/mlruns" # URI for desired storage
mlflow.set_tracking_uri(mlruns_uri) # https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow-tracing-apis

# Set experiment
experiment_name = "MLflow tutorial"
experiment = mlflow.set_experiment(experiment_name) # https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow-tracing-apis
print(f"Artifacts are saved stored here: {experiment.artifact_location}")

SyntaxError: f-string: unmatched ']' (3660827531.py, line 11)

In [None]:
"""
Next, we will run training sessions using MLflow. We'll utilize the autolog function, which automatically logs all the data generated during the run. By default, the model will be logged as an artifact, making it easy to access later on and enabling automatic versioning. This and other features can be modified in arguments.

In the example code, we perform two training rounds with slightly different models, allowing us to compare the results in the UI.   
"""

mlflow.autolog() # https://mlflow.org/docs/latest/tracking/autolog.html#automatic-logging
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

# Let's compile two slightly different models to compare. You can either use the example code or insert your own.

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train / 255.
X_test = X_test / 255.
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

model_1 = Sequential(
    [
        layers.Flatten(input_shape=(28, 28)),
        layers.Dense(128, activation='relu'),
        layers.Dense(10, activation='softmax')
    ]
)
model_1.compile(optimizer='adam', 
                loss='categorical_crossentropy', 
                metrics=['accuracy'])

model_2 = Sequential(
    [
        layers.Flatten(input_shape=(28, 28)),
        layers.Dense(128, activation='tanh'),
        layers.Dense(64, activation='tanh'),
        layers.Dense(10, activation='softmax')
    ]
)
model_2.compile(optimizer='adam', 
                loss='categorical_crossentropy', 
                metrics=['accuracy'])

models = {model_1: "sequential_3layers", 
          model_2: "sequential_with_tanh"}

for model, name in models.items():
    
    with mlflow.start_run(): # Trigger MLflow to start tracking the run: https://mlflow.org/docs/latest/python_api/mlflow.html?highlight=autolog#mlflow.start_run
        
        run_name = name # Assign an informative name to the run; otherwise, a random name will be generated.
        mlflow.set_tag("mlflow.runName", run_name) 
        print(f"Run name: {run_name}")

        model.fit(X_train, y_train, epochs=5, batch_size=1, validation_data=(X_test, y_test))

        test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
        print('\nTest accuracy:', test_acc)
        
        mlflow.end_run() 

### User Interface

Next, we'll launch the MLflow application on Puhti and explore its user interface. You can find the MLflow icon under the Apps menu in Puhti. Ensure that the `pytorch/2.4` module is selected in the Settings section, as it includes a sufficiently recent version of MLflow. Additionally, make sure the file path points to the previously specified `mlruns` directory.

![mlflow_puhti.png](./pics/mlflow_puhti.png)

<small>Figure 1. Setup MLflow in Puhti

After launching successfully, we can view the previous runs:

![run_1.png](./pics/runs_1.png)

<small>Figure 2. Front page of MLflow UI</small>

**Image caption:**

1. All experiments are listed here. With informative names and optional descriptions, user can organize different runs into easily manageable collections.
2. When selecting an experiment, all associated runs are displayed. User can sort and group these runs in various ways.

In the Charts view (Figure 3), users can compare the performance of different models using automatically generated graphs. These graphs can be downloaded in formats like CSV. 

![run_2.png](./pics/run_2.png)

<small>Figure 3. Charts- view of runs</small>

#### Conclusion

The Tracking Server and Models components together provide an easy way to centrally monitor and store machine learning experiments in a consistent manner. These tools are user-friendly and don’t require deep expertise to get started. However, MLflow also offers the flexibility for more detailed configuration to meet the needs of more demanding use cases.


---

## MLflow Model Registry

When you identify a model through experimentation that is ready for production, you can take advantage of the **[MLflow Model Registry](https://mlflow.org/docs/latest/model-registry.html#mlflow-model-registry)**. The registry offers a centralized platform for managing, validating, and deploying models. You can assign aliases such as "staging" or "production" to your models, making it easy to retrieve the appropriate model from the registry for inference.

The Model Registry can be managed through either the Models tab in the user interface or via the [API](https://mlflow.org/docs/latest/model-registry.html#adding-an-mlflow-model-to-the-model-registry).

![register_model.png](./pics/register_model.png)

<small>Figure 4. Register model in UI</small>

**Image caption:**

1. By opening the details of the desired run, you can register it from the top left corner.
2. Name the new model or select an existing one from the menu. If the model is already registered, a new version will be created.

Once the model is registered, you can view and manage it on the Models tab (Figures 5 and 6).

![model_reg1.png](./pics/model_reg1.png)

<small>Figure 5. Model Registry front page</small>

![model_reg2.png](./pics/model_reg2.png)

<small>Figure 6. Details of a registered model</small>



In [None]:
"""
To register another trained model using the API, you need the run ID of the model you want to register. Since a previously registered model via the UI had a lower accuracy, we will now programmatically find and register the model with the highest validation_accuracy. Additionally, we will retrieve the model's name using API calls.
Finally, we assign the alias "challenger" to the model for easier identification. To accomplish this, we use the MLflow Client, which allows us to programmatically manage model aliases and streamline the process.
"""

from mlflow import MlflowClient
client = MlflowClient()

# https://mlflow.org/docs/latest/search-runs.html
runs = mlflow.search_runs([experiment.experiment_id]) 
run_id = runs.loc[runs['metrics.accuracy'].idxmax(), 'run_id']

# https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.register_model
filter_string = "name LIKE 'mnist%'"
model_uri = f"runs:/{run_id}"
model_name = (mlflow.search_registered_models(filter_string=filter_string))[0].name
mv = mlflow.register_model(model_uri, model_name) 

# https://mlflow.org/docs/latest/python_api/mlflow.client.html#mlflow.client.MlflowClient.set_model_version_tag
client.set_registered_model_alias(mv.name, "challenger", mv.version)



When multiple versions of a model are available, they can be easily compared through the user interface.

![model_reg3.png](./pics/model_reg3.png)

<small>Figure 7. Model versions to compare</small>

![model_reg4.png](./pics/model_reg4.png)

<small>Figure 7. Comparing registered models</small>

### Deployment and Inference


In [1]:

alias = "champion"
model_name = "mnist_sequential"
model = mlflow.pyfunc.load_model(f"models:{model_name}@{alias}")

print(model)

NameError: name 'mlflow' is not defined

---

## Projects

Tämä onkin mielenkiintoinen sillä en ole itse ennen käyttänyt vaan plaaplaa plaa, miksi käyttäisin ja milloin ja miten ja esimerkkikoodia perään