# Train ML models using SageMaker Training

**SageMaker Studio Kernel**: Data Science

In this exercise you will do:
 - Run a PyTorch Training Job using Amazon SageMaker Training Job

***

# Install Dependencies

Let's install some required dependencies for our environment.

In [None]:
! pip install -U awscli boto3 sagemaker

***

# Step 1 - Import Modules

Here we’ll import some libraries and define some variables.

In [None]:
import boto3
from datetime import datetime
import sagemaker
from sagemaker.estimator import Estimator
import traceback

In [None]:
sagemaker_client = boto3.client("sagemaker")
s3_client = boto3.client("s3")

Create a SageMaker Session and save the default region and the execution role in some Python variables

In [None]:
sagemaker_session = sagemaker.Session()
region = boto3.session.Session().region_name
role = sagemaker.get_execution_role()

***

# Step 2 - Run the training job

By using the [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/), we are going to use a [PyTorch Estimator](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/sagemaker.pytorch.html) for using a built-in SageMaker container for Pytorch, which gives us the possibility to provide the execution scripts and the requirements.txt for installing additional dependencies.

In order to make sure that Amazon SageMaker will install our additional Python modules by reading `requirements.txt`, we are compressing the content of the [training](./code/training) folder and uploading it in the default S3 Bucket.

In [None]:
bucket_name = sagemaker_session.default_bucket()

In [None]:
! pygmentize ./code/training/train.py

In [None]:
! ./code/buildspec.sh training

Upload the generated `sourcedir.tar.gz` in the default S3 Bucket

In [None]:
s3_client.delete_object(Bucket=bucket_name, Key="e2e-base/artifact/training")

code_path = sagemaker_session.upload_data('./code/dist/training/sourcedir.tar.gz', key_prefix="e2e-base/artifact/training")

code_path

## Global Parameters

In order to allow users to execute the SageMaker Processing Job locally, we are defining the variable `local_mode`. If you want to test the local mode capability, please put the variable to `True`

In [None]:
local_mode = False

In [None]:
processing_output_files_path = "e2e-base/data/output"

training_artifact_path = "e2e-base/artifact/training"
training_artifact_name = "sourcedir.tar.gz"
training_artifact = "s3://{}/{}/{}".format(bucket_name, training_artifact_path, training_artifact_name)
training_output_files_path = "e2e-base/models"
training_instance_count = 1
training_hyperparameters = {
    "epochs": 25,
    "learning_rate": 0.001,
    "batch_size": 100
}

if local_mode:
    training_instance_type = "local"
else:
    training_instance_type = "ml.m5.large"
    
training_image_uri = sagemaker.estimator.image_uris.retrieve("pytorch", region=region, version="1.12", image_scope="training", instance_type=training_instance_type)

In [None]:
training_image_uri

In [None]:
training_artifact

Define the `Estimator` object

In [None]:
estimator = Estimator(
    entry_point="train.py",
    image_uri=training_image_uri,
    source_dir="./code/training",
    output_path="s3://{}/{}".format(bucket_name,
                                    training_output_files_path),
    hyperparameters=training_hyperparameters,
    role=role,
    instance_count=training_instance_count,
    instance_type=training_instance_type,
    disable_profiler=True
)

In [None]:
estimator.fit(
    inputs={
        "train": "s3://{}/{}/train".format(
            bucket_name,
            processing_output_files_path
        ),
        "test": "s3://{}/{}/test".format(
            bucket_name,
            processing_output_files_path
        )
    }
)