# How to Setup Amazon SageMaker with MLflow

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/sagemaker-mlflow|sagemaker_mlflow_setup.ipynb)

## 1. Updates and Imports

In [1]:
!pip install -U --quiet boto3
!pip install --quiet mlflow==2.13.2 sagemaker-mlflow==0.1.0

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
awscli 1.35.9 requires botocore==1.35.43, but you have botocore 1.35.52 which is incompatible.[0m[31m
[0m

In [34]:
import sys, os
module_path = ".."
sys.path.append(os.path.abspath(module_path))

### Imports

In [2]:
import json
import sagemaker
import boto3

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


### Session variables

In [3]:
sess = sagemaker.Session()
bucket_name = sess.default_bucket()
role = sagemaker.get_execution_role()
region = sess.boto_region_name

iam_client = boto3.client("iam")
sts_client = boto3.client("sts")
sm_client = boto3.client("sagemaker")
account_id = sts_client.get_caller_identity()["Account"]

## 2. MLflow Permissions

### 2.1. IAM Role for the SageMaker Execution for MLflow
다음 셀을 실행하려면 이 노트북을 실행하는 데 사용되는 IAM 역할에 IAM 역할을 생성할 수 있는 권한이 있는지 확인하세요. <BR>
노트북 실행 역할의 정책에서 iam:CreateRole, iam:CreatePolicy, iam:ListPolicies, iam:AttachRolePolicy 작업을 허용해야 합니다. <BR>
SageMaker Studio에서 이 노트북을 실행 중인 경우, 다음 단계를 통해 노트북 실행 역할을 업데이트할 수 있습니다:

AWS 콘솔로 이동하여 사용 중인 도메인을 선택합니다. <BR>
도메인 아래에서 사용 중인 사용자 프로필을 선택합니다. 거기에 실행 역할이 나열되어 있을 것입니다. <BR>
IAM 콘솔로 이동하여 "역할"에서 실행 역할을 검색하고, iam:CreateRole, iam:CreatePolicy, iam:ListPolicies, iam:AttachRolePolicy 작업을 허용하는 정책으로 역할을 업데이트합니다.<BR>

SageMaker Studio 노트북을 사용하지 않는 경우, AWS CLI 구성에 사용한 역할이 IAM 역할을 생성하고 정책을 연결할 수 있는 적절한 권한을 가지고 있는지 확인하세요.

SageMaker Execution Role에 `AmazonS3FullAccess`policy를 추가하거나 혹은 아래의 인라인 policy를 추가합니다. <BR>
다음은 역할에 추가할 수 있는 인라인 정책의 예시입니다:

```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Statement1",
            "Effect": "Allow",
            "Action": [
                "iam:ListPolicies",
                "iam:CreatePolicy",
                "iam:CreateRole",
                "iam:AttachRolePolicy"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}
```

추가적으로 SageMaker 서비스 역할은 MLFlow 추적 서버에 액세스하는 클라이언트가 사용하며 MLFlow REST API를 호출할 수 있는 권한이 필요합니다. 또한 SageMaker 서비스 역할에는 추적 서버를 생성, 업데이트, 시작, 중지 및 삭제하기 위한 SageMaker API 권한이 필요합니다.
SageMaker Execution Role에는 다음 정책이 필요합니다.

```json
{
    "Version": "2012-10-17",    
    "Statement": [        
        {            
            "Effect": "Allow",            
            "Action": [
                "sagemaker-mlflow:*",
                "sagemaker:CreateMlflowTrackingServer",
                "sagemaker:UpdateMlflowTrackingServer",
                "sagemaker:DeleteMlflowTrackingServer",
                "sagemaker:StartMlflowTrackingServer",
                "sagemaker:StopMlflowTrackingServer",
                "sagemaker:CreatePresignedMlflowTrackingServerUrl"
            ],            
            "Resource": "*"        
        }        
    ]
}
```

In [16]:
from sagemaker import get_execution_role

In [17]:
sagemaker_role_name = get_execution_role().rsplit('/', 1)[-1]
print (f"SageMaker Execution Role Name: {sagemaker_role_name}")

SageMaker Execution Role Name: AmazonSageMaker-ExecutionRole-20221206T163436


In [20]:
sm_mlflow_execution_policy = {
    "Version": "2012-10-17",    
    "Statement": [        
        {            
            "Effect": "Allow",            
            "Action": [
                "sagemaker-mlflow:*",
                "sagemaker:CreateMlflowTrackingServer",
                "sagemaker:UpdateMlflowTrackingServer",
                "sagemaker:DeleteMlflowTrackingServer",
                "sagemaker:StartMlflowTrackingServer",
                "sagemaker:StopMlflowTrackingServer",
                "sagemaker:CreatePresignedMlflowTrackingServerUrl"
            ],            
            "Resource": "*"        
        }        
    ]
}

sagemaker_role_name = get_execution_role().rsplit('/', 1)[-1]
sagemaker_exe_role_arn = get_execution_role()

sm_mlflow_execution_policy_info = iam_client.create_policy(
    PolicyName="sm-mlflow-execution-policy",
    PolicyDocument=json.dumps(sm_mlflow_execution_policy)
)

sm_mlflow_execution_policy_arn = sm_mlflow_execution_policy_info["Policy"]["Arn"]

# Attach the policy to the MLflow role
iam_client.attach_role_policy(
    RoleName=sagemaker_role_name, PolicyArn=sm_mlflow_execution_policy_arn
)

{'ResponseMetadata': {'RequestId': '349bac16-33f3-4bc5-a51c-b628fea2af9f',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Thu, 31 Oct 2024 02:15:26 GMT',
   'x-amzn-requestid': '349bac16-33f3-4bc5-a51c-b628fea2af9f',
   'content-type': 'text/xml',
   'content-length': '212'},
  'RetryAttempts': 0}}

### 2.2. IAM Role for the MLflow Tracking Server

In [26]:
tracking_server_name = "mlflow-tracking-anomaly-detection"
mlflow_tracking_server_role_name = "mlflow-tracking-server"

In [28]:
mlflow_trust_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {"Service": ["sagemaker.amazonaws.com"]},
            "Action": "sts:AssumeRole",
        }
    ],
}

# Create role for MLflow
mlflow_role = iam_client.create_role(
    RoleName=mlflow_tracking_server_role_name,
    AssumeRolePolicyDocument=json.dumps(mlflow_trust_policy)
)
mlflow_role_arn = mlflow_role["Role"]["Arn"]

# Create policy for S3 and SageMaker Model Registry
sm_s3_model_registry_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:Put*",
                "s3:List*",
                "sagemaker:AddTags",
                "sagemaker:CreateModelPackageGroup",
                "sagemaker:CreateModelPackage",
                "sagemaker:UpdateModelPackage",
                "sagemaker:DescribeModelPackageGroup",
            ],
            "Resource": "*",
        }
    ],
}

mlflow_s3_sm_model_registry_iam_policy = iam_client.create_policy(
    PolicyName="mlflow-s3-sm-model-registry", PolicyDocument=json.dumps(sm_s3_model_registry_policy)
)
mlflow_s3_sm_model_registry_iam_policy_arn = mlflow_s3_sm_model_registry_iam_policy["Policy"]["Arn"]

# Attach the policy to the MLflow role
iam_client.attach_role_policy(
    RoleName=mlflow_tracking_server_role_name, 
    PolicyArn=mlflow_s3_sm_model_registry_iam_policy_arn
)

{'ResponseMetadata': {'RequestId': 'e87a1990-78e4-4fbd-90b0-940c6571fd85',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Thu, 31 Oct 2024 02:18:51 GMT',
   'x-amzn-requestid': 'e87a1990-78e4-4fbd-90b0-940c6571fd85',
   'content-type': 'text/xml',
   'content-length': '212'},
  'RetryAttempts': 0}}

## 3. Create MLflow Tracking Server

In [29]:
sm_client.create_mlflow_tracking_server(
    TrackingServerName=tracking_server_name,
    ArtifactStoreUri=f"s3://{bucket_name}/{tracking_server_name}",
    TrackingServerSize="Small",
    MlflowVersion="2.13.2",
    RoleArn=mlflow_role_arn,
    AutomaticModelRegistration=False,
)

{'TrackingServerArn': 'arn:aws:sagemaker:us-west-2:419974056037:mlflow-tracking-server/mlflow-tracking-anomaly-detection',
 'ResponseMetadata': {'RequestId': '7d52402a-09d1-494d-a611-23fb6576eb2e',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '7d52402a-09d1-494d-a611-23fb6576eb2e',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '121',
   'date': 'Thu, 31 Oct 2024 02:18:56 GMT'},
  'RetryAttempts': 0}}

In [30]:
tracking_server_arn = (
    f"arn:aws:sagemaker:{region}:{account_id}:mlflow-tracking-server/{tracking_server_name}"
)

In [31]:
sm_client.describe_mlflow_tracking_server(TrackingServerName=tracking_server_name)

{'TrackingServerArn': 'arn:aws:sagemaker:us-west-2:419974056037:mlflow-tracking-server/mlflow-tracking-anomaly-detection',
 'TrackingServerName': 'mlflow-tracking-anomaly-detection',
 'ArtifactStoreUri': 's3://sagemaker-us-west-2-419974056037/mlflow-tracking-anomaly-detection',
 'TrackingServerSize': 'Small',
 'MlflowVersion': '2.13.2',
 'RoleArn': 'arn:aws:iam::419974056037:role/mlflow-tracking-server',
 'TrackingServerStatus': 'Creating',
 'IsActive': 'Inactive',
 'TrackingServerUrl': 'https://t-gkhhsgmuuavn.us-west-2.experiments.sagemaker.aws',
 'WeeklyMaintenanceWindowStart': 'Fri:06:22',
 'AutomaticModelRegistration': False,
 'CreationTime': datetime.datetime(2024, 10, 31, 2, 18, 56, 720000, tzinfo=tzlocal()),
 'CreatedBy': {},
 'LastModifiedTime': datetime.datetime(2024, 10, 31, 2, 18, 56, 720000, tzinfo=tzlocal()),
 'LastModifiedBy': {},
 'ResponseMetadata': {'RequestId': 'c4ab25be-fbdd-47ff-9e47-578db82caff6',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'c4ab

* save tracking server arn

In [35]:
from utils.ssm import parameter_store

In [36]:
strRegionName=boto3.Session().region_name
pm = parameter_store(strRegionName)
strPrefix = pm.get_params(key="PREFIX")

In [37]:
pm = parameter_store(strRegionName)
pm.put_params(key="-".join([strPrefix, "MLFLOW-TRACKING-SERVER-ARN"]), value=tracking_server_arn, overwrite=True)

'Store suceess'

## 4. MLflow tracking test

Connect to tracking server

In [None]:
import mlflow

mlflow.set_tracking_uri(tracking_server_arn)
print (f'tracking_server_arn: {tracking_server_arn}')

In [None]:
experiment_name = 'test'
mlflow.set_experiment(experiment_name)

Log a metric

In [None]:
with mlflow.start_run():
    mlflow.log_metric("foo", 1)

See results in MLflow UI. You can either launch the MLflow UI from within SageMaker Studio, or generate a pre-signed URL like this:

In [None]:
sm_client.create_presigned_mlflow_tracking_server_url(TrackingServerName=tracking_server_name)

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.


![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/sagemaker-mlflow|sagemaker_mlflow_setup.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/sagemaker-mlflow|sagemaker_mlflow_setup.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/sagemaker-mlflow|sagemaker_mlflow_setup.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/sagemaker-mlflow|sagemaker_mlflow_setup.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/sagemaker-mlflow|sagemaker_mlflow_setup.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/sagemaker-mlflow|sagemaker_mlflow_setup.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/sagemaker-mlflow|sagemaker_mlflow_setup.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/sagemaker-mlflow|sagemaker_mlflow_setup.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/sagemaker-mlflow|sagemaker_mlflow_setup.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/sagemaker-mlflow|sagemaker_mlflow_setup.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/sagemaker-mlflow|sagemaker_mlflow_setup.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/sagemaker-mlflow|sagemaker_mlflow_setup.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/sagemaker-mlflow|sagemaker_mlflow_setup.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/sagemaker-mlflow|sagemaker_mlflow_setup.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/sagemaker-mlflow|sagemaker_mlflow_setup.ipynb)