# Sagemaker MLflow integration

This notebook aims to be an example of how to connect and track metrics and metadata in our centralized MLflow managed by Sagemaker.

This approach works from both your local machine and from AWS/Sagemaker.

## Pre-requisites

In order to interact to our MLflow Tracking Server you need to configure authentication first. 
Follow [these instructions](https://github.com/elastic/cloud/blob/master/wiki/AWS.md#configuring-okta-awscli-for-cli--api-access) in order to do so. You will need a Yubikey to enable MFA, Touch ID in MacOS no longer works.

## Installation

The first we need to do is to install the MLflow SDK and the AWS MLflow plugin:

In [1]:
!pip install mlflow==2.16.2

[0m

In [2]:
!pip install sagemaker-mlflow==0.1.0

[0m

## Connect to the MLflow Tracking Server

First, you need to authenticate yourself in our ml AWS account:

In [3]:
import boto3

ml_sagemaker_session = boto3.Session(profile_name="sagemaker")

In [4]:
import os

credentials = ml_sagemaker_session.get_credentials().get_frozen_credentials()
os.environ["AWS_ACCESS_KEY_ID"] = credentials.access_key
os.environ["AWS_SECRET_ACCESS_KEY"] = credentials.secret_key
os.environ["AWS_SESSION_TOKEN"] = credentials.token

To connect to our (ML R&D) MLflow Tracking Server we use its ARN:

In [5]:
import mlflow

mlflow_server_arn = (
    "arn:aws:sagemaker:us-east-1:879381254630:mlflow-tracking-server/ml-rd-mlflow-server"
)

mlflow.set_tracking_uri(mlflow_server_arn)

And that's almost it! 

Now you can log stuff in our MLflow Tracking Server and it will be visible for all the team. 

What to log is up to you. Usually, all the relevant metadata about a model should be logged. This includes things like hyperparameters, metrics, data sources and their specific versions, code (in the form of git commits, for example), artifacts like vocabularies, tokenizers, etc., and of course any trained model. 

## Example

In [6]:
import mlflow
import pandas as pd
from mlflow.models import infer_signature
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
from sklearn.model_selection import train_test_split

Create the mlflow experiment:

In [7]:
mlflow.set_experiment("mlflow-local-experiment")

<Experiment: artifact_location='s3://ml-rd-mlflow-artifact-storage/artifacts/3', creation_time=1733939827486, experiment_id='3', last_update_time=1733939827486, lifecycle_stage='active', name='mlflow-local-experiment', tags={}>

Load the Iris dataset:

In [8]:
X, y = datasets.load_iris(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Train a Logistic Regression model:

In [9]:
params = {"solver": "lbfgs", "max_iter": 1000, "multi_class": "auto", "random_state": 8888}

lr = LogisticRegression(**params)
lr.fit(X_train, y_train)



Calculare accuracy using the test set:

In [10]:
y_pred = lr.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
accuracy

1.0

Start an MLflow run and log parameters, metrics, and model artifacts:

In [12]:
with mlflow.start_run():
    # Log the hyperparameters
    mlflow.log_params(params)

    # Log the loss metric
    mlflow.log_metric("accuracy", accuracy)

    # Set a tag that we can use to remind ourselves what this run was for
    mlflow.set_tag("Training Info", "Basic LR model for iris data")

    # Infer the model signature
    signature = infer_signature(X_train, lr.predict(X_train))

    # Log the model
    model_info = mlflow.sklearn.log_model(
        sk_model=lr,
        artifact_path="iris_model",
        signature=signature,
        input_example=X_train,
    )

  from .autonotebook import tqdm as notebook_tqdm
Downloading artifacts: 100%|██████████| 7/7 [00:00<00:00, 12.17it/s]
2024/12/12 15:52:00 INFO mlflow.tracking._tracking_service.client: 🏃 View run melodic-cat-624 at: https://us-east-1.experiments.sagemaker.aws/#/experiments/3/runs/712d35762f004b46907c54622da770e8.
2024/12/12 15:52:00 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://us-east-1.experiments.sagemaker.aws/#/experiments/3.


Open MLflow in Sagemaker and check that your experiment and run were created successfully.

## References

- [MLflow Documentation](https://mlflow.org/docs/latest/index.html)
- [MLflow Sagemaker documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow.html)