# Training RL Model on SageMaker

This notebook demonstrates how to train a reinforcement learning (RL) model using SageMaker. We'll initialize the SageMaker session, set up the environment for training, define the necessary S3 bucket and Docker image, and then proceed with the training job.

In [None]:
import boto3
import sagemaker
from sagemaker.estimator import Estimator
from sagemaker.session import Session
from sagemaker.session import get_execution_role
import time
import os


We start by importing essential libraries:

- boto3 and sagemaker are used to interact with AWS services.
- Estimator, Session, and get_execution_role are required for setting up the SageMaker environment.
- os and time are standard Python libraries for system operations and time tracking.

### Initialize SageMaker Session

Here, we initialize the SageMaker session and retrieve the appropriate execution role using get_execution_role().
- The S3 bucket is where the model artifacts will be stored.
- The ecr_image_uri is the URI of the Docker image that contains the RL environment and is stored in Elastic Container Registry (ECR).

In the code below change "my-account-id" for you actual account ID.

In [None]:
# Initialize SageMaker session
sagemaker_session = sagemaker.Session()
s3_client = boto3.client('s3')

# Get the execution role for SageMaker
role = get_execution_role()

# Define your S3 bucket and prefixes
s3_bucket = 'traffic-opimization-<my-account-id>-us-east-1'

# Define the Docker image URI from ECR
ecr_image_uri = 'deep-q-learning-model-<my-account-id>-us-east-1'


We define the environment variables that will be passed to the training script.
- PROCESS_TYPE is set to 'TRAIN' to specify that the job is for training.
- The S3_BUCKET is provided as part of the environment to store training artifacts.

In [None]:
# Environment variables for training
train_env = {
    'PROCESS_TYPE': 'TRAIN',
    'S3_BUCKET': s3_bucket
}

Training a good RL model takes a while, so before we run our smaller 10 episode model, let's view the output plots of the model that we will be using later for testing. 

In [None]:
from PIL import Image
import io
import matplotlib.pyplot as plt

# Initialize S3 client
s3_client = boto3.client('s3')

# Function to list and display images from S3
def display_s3_images(bucket, prefix):
    response = s3_client.list_objects_v2(Bucket=bucket, Prefix=prefix)
    if 'Contents' in response:
        for obj in response['Contents']:
            key = obj['Key']
            if key.endswith('.png'):  # Filter for PNG images
                print(f"Displaying image: {key}")
                
                # Get the image from S3
                img_obj = s3_client.get_object(Bucket=bucket, Key=key)
                img_data = img_obj['Body'].read()
                
                # Display the image using PIL and matplotlib
                image = Image.open(io.BytesIO(img_data))
                plt.figure()
                plt.imshow(image)
                plt.axis('off')  # Hide the axes
                plt.title(key)
                plt.show()
    else:
        print(f'No objects found in s3://{bucket}/{prefix}')

# Define the S3 bucket and prefix for the folder
s3_prefix = 'trained_model_plots/'  # Folder where the images are stored

# Display images from the oldplots/ folder
display_s3_images(s3_bucket, s3_prefix)


We are ready to train our own model. 

We create an RL Estimator, which is a SageMaker-specific class used for training RL models. Key parameters include:

- entry_point: The Python script that will be executed to start the training.
- image_uri: The Docker image URI containing the RL environment.
- toolKit and toolkit_version: Specify the RL toolkit and its version (Coach in this case).
- framework: Defines the machine learning framework (TensorFlow).
- instance_type: Specifies the type of EC2 instance for training (a GPU-enabled instance in this case).
- environment: The environment variables, including the S3 bucket and process type.
- instance_count: Specifies how many instances to use for training.

In [None]:
from sagemaker.rl import RLEstimator, RLToolkit, RLFramework

rl_estimator = RLEstimator(
    entry_point="sagemaker_train.py",  # Adjust to your script name
    image_uri=ecr_image_uri,
    toolkit=RLToolkit.COACH,
    toolkit_version='0.11.1',
    framework=RLFramework.TENSORFLOW,
    role=role,
    instance_type='ml.c5.4xlarge',
    instance_count = 1,
    environment=train_env,
)

In [None]:
# Start the training job
rl_estimator.fit()

Let's take a look at our plots vs the trained model ones. You can see that the delay is still alot bigger if we just use 10 episodes. 

In [None]:
s3_prefix_10ep_model = 'plots/'

display_s3_images(s3_bucket, s3_prefix_10ep_model)