# MLflow Example

## Introduction

Example is enhanced version of original in [MLflow mlflow.sklearn documentation](https://mlflow.org/docs/latest/python_api/mlflow.sklearn.html). It demonstrates how to use MLflow to log and visualize the model training process.

In [1]:
import mlflow
from mlflow.models import infer_signature

import pandas as pd


## Tensorflow Example

https://mlflow.org/docs/latest/deep-learning/tensorflow/guide/index.html


## Pytorch Example

https://mlflow.org/docs/latest/deep-learning/pytorch/guide/index.html


## Sklearn Linear Regression Example

In [2]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


2024/06/12 14:49:36 INFO mlflow.tracking.fluent: Autologging successfully enabled for sklearn.


### Load dataset and set parameters

In [3]:
# Load the Iris dataset
X, y = datasets.load_iris(return_X_y=True)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Define the model hyperparameters
params = {
    "solver": "lbfgs",
    "max_iter": 1000,
    "multi_class": "auto",
    "random_state": 8888,
}


### Prepare model and mlflow tracking

Log everything manually... because you can

In [None]:
# Set our tracking server uri for logging
mlflow.set_tracking_uri(uri="http://192.168.30.21:8080")

# Create a new MLflow Experiment
mlflow.set_experiment("MLflow Quickstart")

# Start an MLflow run
with mlflow.start_run(run_name="Basic LR Model"):
    # Log the hyperparameters
    mlflow.log_params(params)

    for i in range(10):
        # Train the model
        lr = LogisticRegression(**params)
        lr.fit(X_train, y_train)
        # compute metrics for trained model
        y_train_pred = lr.predict(X_train)
        accuracy_train = accuracy_score(y_train, y_train_pred)
        precision_train = precision_score(y_train, y_train_pred, average="macro")

        # Predict on the test set
        y_pred = lr.predict(X_test)

        # Calculate metrics
        accuracy = accuracy_score(y_test, y_pred)
        precision = precision_score(y_test, y_pred, average="macro")
        # Log the loss metric
        mlflow.log_metrics({"accuracy": accuracy,
                            "precision": precision,
                            "accuracy_train": accuracy_train,
                            "precision_train": precision_train                            
                            }, step=i)
        

    # Set a tag that we can use to remind ourselves what this run was for
    mlflow.set_tag("Training Info", "Basic LR model for iris data")

    # Infer the model signature
    signature = infer_signature(X_train, lr.predict(X_train))

    # Log the model
    model_info = mlflow.sklearn.log_model(
        sk_model=lr,
        artifact_path="iris_model",
        signature=signature,
        input_example=X_train,
        registered_model_name="tracking-quickstart",
    )

## Use autologging

If you are unsure what to log, you can easily use autolog.
This will cover most of the basic standard metrics.

In [None]:
# Use autolog to log all parameters and metrics implicitly
mlflow.autolog()

with mlflow.start_run(run_name="Basic LR Model"):
    # Log the hyperparameters
    mlflow.log_params(params)
    lr = LogisticRegression(**params)
    lr.fit(X_train, y_train)

### Predict and evaluate - log table

In [7]:
results = pd.DataFrame(X_test, columns=datasets.load_iris().feature_names)
results["actual_class"] = y_test
results["predicted_class"] = lr.predict(X_test)
mlflow.log_table(data=results, artifact_file="evaluation/test_set.json", run_id="ff589d2a872d4ffcbe1a7bed0547649a")

### Load model from mlflow and infer

In [5]:
# Either downlaod the model via uri from model_info or find the uri from the model registry or manually in the UI
loaded_model = mlflow.pyfunc.load_model(model_info.model_uri)

predictions = loaded_model.predict(X_test)

iris_feature_names = datasets.load_iris().feature_names

result = pd.DataFrame(X_test, columns=iris_feature_names)
result["actual_class"] = y_test
result["predicted_class"] = predictions

result[:4]


Downloading artifacts:   0%|          | 0/10 [00:00<?, ?it/s]

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),actual_class,predicted_class
0,6.1,2.8,4.7,1.2,1,1
1,5.7,3.8,1.7,0.3,0,0
2,7.7,2.6,6.9,2.3,2,2
3,6.0,2.9,4.5,1.5,1,1
