# Part 1: Building and Training Your Container

## System Setup

This notebook is designed to run on **Amazon Linux**, and it includes several setup steps necessary for our project. Our objective is to train a Reinforcement Learning model using the LunarLander-v2 environment. To visually analyze the performance of our lander, we need to create videos of its execution. For this purpose, we'll install several dependencies.

In [1]:
!hostnamectl | grep "Operating System"

  Operating System: Amazon Linux 2


In this section, we will set up all necessary tools and libraries for our reinforcement learning environment. We begin by installing several crucial Python libraries, followed by `ffmpeg` for video processing.

### Installing Python Libraries

We need to install a set of Python libraries that are integral to our project's reinforcement learning environment and model training:

- **`swig`**: this tool facilitates the creation of Python bindings to C and C++ code, which is crucial for interacting with some components of the `gymnasium` library.
- **`gymnasium[box2d]`**: provides the LunarLander-v2 environment, our primary simulation tool for this project.
- **`stable-baselines3`**: offers a suite of improved implementations of reinforcement learning algorithms, based on OpenAI Baselines, which help us efficiently train our model.
- **`opencv-python`**: used for processing video frames, enabling us to visually assess and document the performance of our training models.

In [None]:
!pip install opencv-python==4.9.0.80
!pip install swig==4.2.1
!pip install gymnasium[box2d]
!pip install stable-baselines3==2.0.0a5

### Installing ffmpeg

After setting up our Python environment, we install `ffmpeg`, which is essential for processing videos. This tool allows us to capture and view the agent's maneuvers as it tries to land. Specifically, we use ffmpeg to utilize the H.264 codec, enabling us to embed high-quality video outputs directly into our Jupyter notebooks.

Execute the following commands in your notebook to run the installation script located in the `/utils` directory:

```bash
chmod +x ./utils/install_ffmpeg.sh
sudo ./utils/install_ffmpeg.sh
```

In [3]:
import cv2
import numpy as np
import gymnasium as gym

from utils.video_render import record_simulation

In [4]:
env = gym.make("LunarLander-v2")

## The Challenge

In Lunar Lander v2, our agent's task is to control a lander aiming for a smooth, safe landing at a designated spot. The environment mimics real-life physics, challenging you to manage the lander’s movements meticulously to avoid a crash.

Let's safely land a spacecraft on the moon! 🚀🌑✨

### Observation Space

The agent receives an array of eight observations from the environment, giving a comprehensive view of the current state:

- **Positions**: Horizontal and vertical.
- **Velocities**: Horizontal and vertical.
- **Angle**: Orientation of the lander.
- **Angular Velocity**: Speed of rotation.
- **Leg Contact**: Boolean values for each leg indicating contact with the ground.

These parameters vary within defined ranges, such as positions ranging from -90 to 90, providing detailed feedback on the lander's status.

### Action Space

The action space comprises four discrete actions:

1. **Do nothing**.
2. **Fire left orientation engine**.
3. **Fire main engine**.
4. **Fire right orientation engine**.

Strategic actions are required to control the lander’s engines effectively, ensuring a gentle touchdown.

### Goals

Our goal is to train a reinforcement learning **model** that can:

- Achieve a safe landing close to the target.
- Minimize fuel consumption.
- Avoid any crashes.

<div style="border:2px solid #42A891; padding: 10px; background-color: #E0F2F1; border-radius: 5px;">
    <b>LEARN MORE:</b> If you're interested in learning more about Deep Reinforcement Learning, check out this <a href="https://huggingface.co/learn/deep-rl-course/en/unit0/introduction" target="_blank" style="font-weight: bold;">AMAZING course</a> made by HuggingFace 🤗
</div>




In [5]:
print( f"""
ENVIRONMENT: Lunar Lander v2
│
├── OBSERVATION SPACE
│   ├── Shape: {env.observation_space.shape}
│   ├── Highest Values: {list(env.observation_space.high)}
│   └── Lowest Values: {list(env.observation_space.low)}
│
└── ACTION SPACE
    └── {env.action_space}
""")


ENVIRONMENT: Lunar Lander v2
│
├── OBSERVATION SPACE
│   ├── Shape: (8,)
│   ├── Highest Values: [90.0, 90.0, 5.0, 5.0, 3.1415927, 5.0, 1.0, 1.0]
│   └── Lowest Values: [-90.0, -90.0, -5.0, -5.0, -3.1415927, -5.0, -0.0, -0.0]
│
└── ACTION SPACE
    └── Discrete(4)



## Exploring Random Actions

Before we dive into sophisticated training algorithms, let's start with a simple experiment. We'll see how the lunar lander performs under completely random controls. This will give us a baseline understanding of the challenges involved in landing safely.

### Why Random Actions?

Using random actions allows us to observe the inherent difficulty of the task and the behavior of the lander in various uncontrolled scenarios. It's like throwing a novice into the pilot's seat—exciting, unpredictable, and a bit chaotic! By seeing how the lander behaves when actions are chosen without any strategy, we can better appreciate the complexity of the task and the importance of a well-trained model.

### Implementing Random Actions

We'll implement a function named `random_action`, which will select an action randomly from the available action space at each step of the simulation. This function takes the current environment and observation as inputs and returns a random action. This method simulates an unsophisticated approach to controlling the lander, providing us with a clear picture of what not to do.

Let’s see how our lander fares with randomness at the helm!

In [6]:
def random_action(env: gym.Env, obs: np.ndarray) -> int:
    return env.action_space.sample()

In [7]:
from IPython.display import Video

video_path = './videos/random.mp4'

record_simulation(random_action, video_path)
Video(video_path)

Video conversion successful: ./videos/random.mp4


## SageMaker Custom Containers: Training and Inference

As we embark on our quest to train a model for the lunar landing, we'll utilize the robust AWS SageMaker service. SageMaker streamlines the process of constructing, training, and deploying machine learning models at a large scale. But to tailor this process to our needs, we'll introduce our own containers.

### Custom Training Container

Within SageMaker, we have the capability to deploy **custom Docker containers for training jobs**. This allows us the freedom to specify our environment to the finest detail, incorporating all the necessary dependencies our model requires, from specific library versions to custom code snippets.

In the diagram below, you will see how a custom training image stored in Amazon Elastic Container Registry (ECR) is utilized by SageMaker. When a training job is invoked, SageMaker retrieves this image to run our training job, resulting in the generation of a model artifact.

![SageMaker Training Workflow](./img/byoc.png)

### Custom Inference Container

After training our model, the subsequent step is deployment. For inference purposes, we employ another Docker container designed to serve our model. This container is fine-tuned for efficiently processing incoming prediction requests rather than training.

Deploying our model to a SageMaker Endpoint prompts SageMaker to fetch the inference image and establish a scalable, consistent service endpoint for real-time predictions.

**NOTE:** it is also possible to have a single image that is used for both training and model deployment. SageMaker runs a different command depending on whether you're calling a training job or a deployment.

## Building the Training Image

Within the `training_container/` directory, you'll find all the ingredients needed to cook up your custom training image. Let's sift through these components to understand their purpose and how they come together.

```plaintext
training_container/
│
├── train
├── Dockerfile
└── requirements.txt
```

### train

At the heart of the training container is the train script. When SageMaker initiates a training job, it runs your container by executing `docker run`**`image`**`train`. You have the option to change the default behavior by modifying the container's ENTRYPOINT, but in our case, simplicity is key.

Our script employs the [PPO (Proximal Policy Optimization)](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html) algorithm from the Stable Baselines3 library to train our model. Below is a snapshot of the script's structure:

```python
#!/usr/bin/env python3
#import dependencies

class ParamsParser(BaseModel):
    """
    Parses and validates parameters for the PPO model configuration from a JSON file.
    Provides default values and type enforcement.
    """
    seed: int = None
    n_steps: int = 1024
    learning_rate: float = 0.0003
    ...
```

The script uses Pydantic to validate the hyperparameters from the `Estimator` object in SageMaker. These are provided in a JSON file located at `/opt/ml/input/config/hyperparameters.json`.

```python
def train() -> None:
    """
    Initializes and executes the training process for a PPO model using parameters
    from a JSON configuration file. Designed for use in a BYOC SageMaker training job,
    handling environment setup, model training, and model saving.
    ...
    """

    model_path = '/opt/ml/model'  # 'SM_MODEL_DIR'
    params_path = '/opt/ml/input/config/hyperparameters.json'

    # Parameter parsing with Pydantic
    ...

    # Training Environment
    env = make_vec_env('LunarLander-v2', n_envs=16)
    
    # Set device
    if torch.cuda.is_available():
        ...
    
    # Learning Algorithm
    model = PPO(
        policy='MlpPolicy',
        env=env,
        n_steps=params.n_steps,
        learning_rate=params.learning_rate,
        ...
    )

    # Training and saving artifact in '/opt/ml/model'
    ...

if __name__ == "__main__":
    train()

```

After training, SageMaker will compress the contents of `/opt/ml/model` into a `tar.gz` file and save it to the output path (i.e., an S3 URI) defined in the `Estimator`.

<div style="border:2px solid #42A891; padding: 10px; background-color: #E0F2F1; border-radius: 5px;">
    <b>LEARN MORE:</b> For detailed insights on how Amazon SageMaker processes training output, explore the corresponding <a href="https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-output.html" target="_blank" style="font-weight: bold;">SageMaker docs</a>.
</div>


### Dockerfile and requirements.txt

The `requirements.txt` file lists the necessary Python libraries for our environment. A special note on `gymnasium`: it requires `swig` to be installed beforehand, which we handle separately from the `requirements.txt`.

The Dockerfile is simple: it starts with a base Python image, installs the dependencies, and ensures the train script is ready to run. Without an `ENTRYPOINT`, we stick with the default convention where the script's name must be `train`.

```Dockerfile
FROM --platform=linux/amd64 python:3.10

COPY requirements.txt .

# Install dependencies
RUN pip install -U pip
RUN pip install swig && pip install gymnasium[box2d]
RUN pip install --no-cache-dir -r requirements.txt

ENV SM_MODEL_DIR /opt/ml/model

COPY . /opt/program

WORKDIR /opt/program

# Sagemaker runs your container by runnning: docker run <image> train
RUN chmod +x /opt/program/train

ENV PATH="/opt/program:${PATH}"
```

<div style="border:2px solid #42A891; padding: 10px; background-color: #E0F2F1; border-radius: 5px;">
    <b>LEARN MORE:</b> Discover how Amazon SageMaker runs your training image by checking out the corresponding <a href="https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-dockerfile.html" target="_blank" style="font-weight: bold;">SageMaker docs</a>.
</div>

## Build the training image

Now build your container and push it to your own ECR repository. Create your own private ECR repository and push the image. If you're not experienced with this, you can find the push commands here after you create the repository:

![](./img/ecr_push.png)

<div style="border:2px solid #42A891; padding: 10px; background-color: #E0F2F1; border-radius: 5px;">
    <b>LEARN MORE:</b> Read about pushing images to ECR <a href="https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html" target="_blank" style="font-weight: bold;">here</a>.
</div>

## Enter: SageMaker

<div style="border:2px solid #FFA500; padding: 10px; background-color: #FFF9C4; border-radius: 5px;">
    <b>IMPORTANT:</b> Ensure that you update the following code chunk with your specific details: Replace <code>role</code> with your execution role that has the required permissions, such as <a href="https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSageMakerFullAccess.html" target="_blank" style="font-weight: bold;">SagemakerFullAccess</a>. Update <code>image_uri</code> with the URI from your own ECR repository. The URI format should be <code>arn:aws:ecr:<b>aws-region</b>:<b>account-id</b>:repository/<b>repository-name</b>:<b>image-tag</b></code>. Lastly, substitute <code>output_path</code> with the address of your personal S3 bucket.
</div>

### Estimator

It's time to define our SageMaker Estimator, the driving force of our training job. The Estimator requires a few key configurations: the Docker image URI (`image_uri`), which points to our custom image in ECR; the instance specifications like *count* and *type*, determining our compute resources; and the `output_path`, directing where the trained model artifacts will be stored in S3.

Within the Estimator, we also set our training `hyperparameters`, aligning with those expected by our training script's `ParamsParser`. Additionally, we define `metric_definitions` to track key performance metrics from the training logs.

In the upcoming section, we'll explore how these metric configurations come into play as SageMaker orchestrates our model's training.

In [8]:
from sagemaker.estimator import Estimator
from utils.helpers import get_secret

import sagemaker

session = sagemaker.Session()

metric_definitions = [
    {'Name': 'ep_len_mean', 'Regex': 'ep_len_mean\s*\|\s*([\d\.]+)'},
    {'Name': 'ep_rew_mean', 'Regex': 'ep_rew_mean\s*\|\s*(-?[\d\.]+)'}
]

ecr_repo = get_secret('ecr_repository')

estimator = Estimator(
    role=get_secret('training_execution_role'),
    image_uri=f'{ecr_repo}:lunar_lander_training',
    instance_count=1,
    instance_type="ml.g4dn.xlarge",
    output_path=get_secret('s3_bucket_uri'),
    session=session,
    hyperparameters={
        'seed': 1020,
        'total_timesteps': 1500000
    },
    metric_definitions=metric_definitions
)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


### Metrics Definition

During training, our PPO model outputs various performance metrics to the logs, which SageMaker automatically monitors. These logs are printed to stdout, and from there, SageMaker relays the data to [Amazon CloudWatch](https://aws.amazon.com/es/cloudwatch/) in real-time. To capture specific metrics, we utilize regular expressions that match the patterns in these logs.

For instance, the PPO model **periodically** prints a summary table of metrics:

    ----------------------------------------
    | rollout/                |            |
    |    ep_len_mean          | 661        | <--- (our target metric)
    |    ep_rew_mean          | 57.1       | <--- (our target metric)
    | time/                   |            |
    |    fps                  | 1449       |
    |    iterations           | 19         |
    |    time_elapsed         | 214        |
    |    total_timesteps      | 311296     |
    | train/                  |            |
    |    approx_kl            | 0.00773607 |
    |    clip_fraction        | 0.0484     |
    |    clip_range           | 0.2        |
    |    entropy_loss         | -1.17      |
    |    explained_variance   | 0.748      |
    |    learning_rate        | 0.0003     |
    |    loss                 | 43.7       |
    |    n_updates            | 72         |
    |    policy_gradient_loss | -0.0025    |
    |    value_loss           | 183        |
    ----------------------------------------

To track the average episode length (`ep_len_mean`), we use the regex `'ep_len_mean\s*\|\s*([\d\.]+)'`, which isolates and captures the numerical value following the metric's name.

By setting up metric definitions in our SageMaker Estimator with these regular expressions, we enable the capture and tracking of specific metrics like `ep_len_mean` and `ep_rew_mean` during the training process. This data is then available in CloudWatch for real-time analysis and post-training evaluation.


In [None]:
training_job_name = 'lunar-lander-PPO-training'

estimator.fit(job_name=training_job_name)

### Monitoring Training

Once you've initiated the training job in SageMaker, you can track its progress and performance through the AWS Console. Navigate to SageMaker > Training > Training Jobs to find your active job.

![Training Job Overview](./img/training_job.png)

On the job's dashboard, you'll see both instance and algorithm metrics, which provide insights into the resource usage and the learning progress respectively.

![Training Job Metrics](./img/training_job_metrics.png)

For an in-depth look at your model's performance metrics, such as average episode length and reward, head over to Amazon CloudWatch. Here, you can interact with the metrics over time, drill down into specifics, and even export the data for further analysis.

![CloudWatch Metrics Detail](./img/cloudwatch_metrics.png)

We observe in the algorithm metrics an increase in the average episode length (`ep_len_mean`). This trend suggests that the model, while learning, is being more cautious and taking its time to land the spacecraft, possibly exploring various strategies to find a safe path to the lunar surface. This period of exploration is important as the agent needs to understand the consequences of its actions within the environment.

Subsequently, there's a noteworthy shift in the pattern. The average reward starts to rise consistently, implying that the model has begun to optimize its movements. It discovers more efficient ways to land the spacecraft, achieving higher rewards (`ep_rew_mean`) in shorter amounts of time. The decline in episode length paired with the increase in rewards is a strong indicator of the model learning and improving its landing strategy.

## Fetching the artifact

<div style="border:2px solid #FFA500; padding: 10px; background-color: #FFF9C4; border-radius: 5px;">
    <b>IMPORTANT:</b> Replace the function call <code>get_secret('s3_bucket_name')</code> with the actual name of your own S3 bucket.
</div>

<br>

Great job on completing the training! SageMaker has now bundled the training results from `/opt/ml/model/` into a single `tar.gz` file, which can be found in your S3 bucket. This file is your model's learned wisdom, all set for deployment or further evaluation.

To get your hands on the model artifact, we'll use the `boto3` library. It's the standard toolkit provided by AWS for Python.

In [10]:
import boto3

s3 = boto3.client('s3')
s3.download_file(get_secret('s3_bucket_name'),
                 f'{training_job_name}/output/model.tar.gz', './artifacts/model.tar.gz')

INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole


Let's get our model into action! First, we'll load the trained model by extracting it from the model.tar.gz file. Next, we grab a random observation from the environment to mimic a new scenario for our lunar lander. With the model loaded and an observation in hand, we'll predict the best action for the lander. 

In [11]:
!tar -xzf ./artifacts/model.tar.gz -C ./artifacts

tar: Ignoring unknown extended header keyword `LIBARCHIVE.creationtime'


In [12]:
from stable_baselines3 import PPO

model = PPO.load('./artifacts/model.zip')

In [13]:
observation_sample = env.observation_space.sample()
print(observation_sample)

[-87.45102    -33.222546    -4.865471     3.340176    -1.3296702
  -0.22886698   0.9492815    0.87525284]


In [14]:
action, _ = model.predict(observation_sample)

print(action)

3


The moment has arrived to put our trained model to the test. We'll load it into the environment and observe how it handles the task it was trained for 🌙🚀.

In [15]:
from utils.video_render import make_model_action

video_path = './videos/model.mp4'

model_action = make_model_action(model)
record_simulation(model_action, video_path)
Video(video_path)

Video conversion successful: ./videos/model.mp4


Congratulations on a successful mission! You've effectively navigated through building, training, and evaluating a reinforcement learning model that's mastered the delicate art of lunar landing. This journey has given us insights into the powerful capabilities of AWS SageMaker and the intricacies of model training. Ready for the next adventure? In Part 2, we'll deploy our well-trained model as a SageMaker endpoint, bringing our lunar expertise into the realm of real-time predictions.