# SageMaker Experiments Updated

SageMaker Experiments recently has updated their [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/experiments.html). The experiments process has simplified to revolve around two main components as stated in the documentation:


<b>Experiment</b>: An experiment is a collection of runs. When you initialize a run in your training loop, you include the name of the experiment that the run belongs to. Experiment names must be unique within your AWS account.

<b>Run</b>: A run consists of all the inputs, parameters, configurations, and results for one interation of model training. Initialize an experiment run for tracking a training job with Run.init(). Within the run you can log parameters, metrics, and files that are relevant for your experiment.

In this blog we will take a look at how we can take a model we are locally training and profile it on SageMaker Experiments to understand our training runs better. Please reference the official [AWS Blog](https://aws.amazon.com/blogs/machine-learning/next-generation-amazon-sagemaker-experiments-organize-track-and-compare-your-machine-learning-trainings-at-scale/) and [Experiments Code Samples](https://docs.aws.amazon.com/sagemaker/latest/dg/experiments-tutorials.html) for further information.

## Credits/References

I have used the following AWS example as a reference: https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-experiments/local_experiment_tracking/pytorch_experiment.html. Please check this example if you have a PyTorch use-case that you are locally training.

## Setup

For this example I am using a Python 3 (Data Science 3.0) ml.t3.medium Kernel on SageMaker Studio. You can execute this notebook locally on your own IDE if properly authenticated, but I would advise using Studio for the UI benefits with SageMaker Experiments.

In [2]:
import sys

In [3]:
# update boto3 and sagemaker to ensure latest SDK version
#!{sys.executable} -m pip install --upgrade pip
#!{sys.executable} -m pip install --upgrade boto3
#!{sys.executable} -m pip install --upgrade sagemaker

In [4]:
import os
import boto3
import json
import sagemaker
from sagemaker.session import Session
from sagemaker import get_execution_role
from sagemaker.experiments.run import Run
from sagemaker.experiments import load_run
from sagemaker.utils import unique_name_from_base

In [5]:
sagemaker_session = Session()
boto_sess = boto3.Session()
role = get_execution_role()
default_bucket = sagemaker_session.default_bucket()
sm = boto_sess.client("sagemaker")
region = boto_sess.region_name

## Dataset Processing

Working with the Petrol Consumption Dataset from Kaggle for a Random Forest Regression Model.

- [Dataset URL](https://www.kaggle.com/datasets/harinir/petrol-consumption)
- [Citation](https://creativecommons.org/publicdomain/zero/1.0/)

In [6]:
import pandas as pd

df = pd.read_csv("petrol_consumption.csv")
df.head()

Unnamed: 0,Petrol_tax,Average_income,Paved_Highways,Population_Driver_licence(%),Petrol_Consumption
0,9.0,3571,1976,0.525,541
1,9.0,4092,1250,0.572,524
2,9.0,3865,1586,0.58,561
3,7.5,4870,2351,0.529,414
4,8.0,4399,431,0.544,410


In [7]:
#split model data
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor

X = df.drop('Petrol_Consumption', axis = 1)
y = df['Petrol_Consumption']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
sc = StandardScaler()
X_train = sc.fit_transform(X_train) 
X_test = sc.transform(X_test)

## Train Model Locally

In [8]:
#Model Building
samp_estimators = 10
regressor = RandomForestRegressor(n_estimators=samp_estimators)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)

In [9]:
from sklearn.metrics import mean_squared_error
print(mean_squared_error(y_test,y_pred,squared=False))

60.50718965544507


## SageMaker Experiments Setup

We first setup our experiment name, this will contain all our different training runs with our different parameter combinations. We then use the load_run command to pass in our experiment name. In this case we will create a run for each of the different number of estimators we are testing in the array below.

In [10]:
experiment_name = "sm-experiments-sklearn"
estimators = [10, 20, 30, 40, 50]

In [11]:
for est in enumerate(estimators):
    run_name = f"run-{est[0]}"
    with load_run(
        experiment_name=experiment_name, run_name=run_name, sagemaker_session=Session()
    ) as run:
        run.log_parameter("estimators", est[1])
        regressor = RandomForestRegressor(n_estimators=est[1])
        regressor.fit(X_train, y_train)
        y_pred = regressor.predict(X_test)
        run.log_metric(name = "RMSE", value = mean_squared_error(y_test,y_pred,squared=False))

