# Modify DeepRacer track images using Stable Diffusion and analyze learning pattern


The objective of our lab will be to modify the simulated DeepRacer track images to add more real world features. We will then validate what the car is looking at by passing these images to a pre-trained model and observing the heatmap trend. To achieve this, we will deploy Stable diffusion models to perform image2image generation. We will be using Stable diffusion upscale to improve the resolution of the simulated DeepRacer track images. Next, we will add real world features to these images through the Stable Diffusion Depth model. We will also understand the prompt engineering needed to get the expected output. These models will run on Sagemaker inference endpoints, powered by Triton inference server, through a custom execution environment created by Conda-pack.



We will deploy multiple variations of Stable Diffusion on a single SageMaker Multi-Model GPU Endpoint (MME GPU) powered by NVIDIA Triton Inference Server.
> ⚠ **Warning**: This notebook requires a minimum of an `ml.m5.large` instance to build the conda environment required for hosting the Stable Diffusion models.  

Skip to:
1. [Installs and imports](#installs)
2. [Download pretrained model](#modelartifact)
3. [Packaging a conda environment](#condaenv)
4. [Deploy to SageMaker Real-Time Endpoint](#deploy)
6. [Query Models to generate real world track images](#query)
7. [Test new images with pre-trained DeepRacer model](#query)
8. [Clean up](#cleanup)


## Architecture

![](./images/lab2_arch.png)

### Part 1 - Installs and imports <a name="installs"></a>

Let us begin by installing the python dependencies

In [None]:
%pip install -U sagemaker pillow huggingface-hub conda-pack

We will then initialize the sagemaker runtimes and the bucket for the future part of the session

In [None]:
import boto3
import sagemaker
from sagemaker import get_execution_role

import time
import json
from PIL import Image
import base64
from io import BytesIO
import numpy as np

from utils import download_model

from IPython.display import display

# variables
s3_client = boto3.client("s3")
ts = time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

# sagemaker variables
role = get_execution_role()
sm_client = boto3.client(service_name="sagemaker")
runtime_sm_client = boto3.client("sagemaker-runtime")
sagemaker_session = sagemaker.Session(boto_session=boto3.Session())
bucket = sagemaker_session.default_bucket()
prefix = "stable-diffusion-mme"

### Part 2 - Save pretrained Stable Diffusion models <a name="modelartifact"></a>

We will use Stable diffusion Depth and Upscaler models for modifying the DeepRacer simulated images. The `models` directory contains the inference code and the Triton configuration file for each of the Stable Diffusion models. In addition to these, we also need to download the pretrained model weights and save them to ther respective subdirectory within `models` directory. Once we have these downloaded, we can package the inference code and the model weights into a tarball and upload it to S3. 

We start by downloading the two Stable Diffusion models needed for this lab

In [None]:
models_local_path = {
    "stabilityai/stable-diffusion-2-depth": "models/sd_depth/1/checkpoint",
    "stabilityai/stable-diffusion-x4-upscaler": "models/sd_upscale/1/checkpoint",
}

for model_name, model_local_path in models_local_path.items():
    download_model(model_name, model_local_path)

### Part 3 - Packaging a conda environment, extending Sagemaker Triton container <a name="condaenv"></a>

When using the Triton Python backend (which our Stable Diffusion model will run on), you can include your own environment and dependencies. The recommended way to do this is to use [conda pack](https://conda.github.io/conda-pack/) to generate a conda environment archive in `tar.gz` format, and point to it in the `config.pbtxt` file of the models that should use it, adding the snippet: 

```
parameters: {
  key: "EXECUTION_ENV_PATH",
  value: {string_value: "path_to_your_env.tar.gz"}
}

```
Let's start by creating the conda environment with the necessary dependencies; running these cells will output a `sd_env.tar.gz` file.

In [None]:
%%writefile environment.yml
name: mme_env
dependencies:
  - python=3.8
  - pip
  - pip:
      - numpy
      - torch
      - accelerate
      - transformers
      - diffusers
      - xformers
      - conda-pack

Now we can create the environment using the above environment yaml spec

🛈 It could take up to 5 min to create the conda environment. Make sure you are running this notebook in an `ml.m5.large` instance or above

In [None]:
%%capture captured_output
!conda env create -f environment.yml


In [None]:
# Check if the environment creation was successful
if "Solving environment" in captured_output.stdout:
    print("Conda environment created successfully!")
else:
    print("Conda environment creation failed.")

Next, let us package the conda environment into a tar file

In [None]:
!conda pack -n mme_env -o models/setup_conda/sd_env.tar.gz

### Part 4 - Deploy endpoint <a name="deploy"></a>

Now, we get the correct URI for the SageMaker Triton container image. Check out all the available Deep Learning Container images that AWS maintains [here](https://github.com/aws/deep-learning-containers/blob/master/available_images.md). 

In [None]:
# account mapping for SageMaker Triton Image
account_id_map = {
    "us-east-1": "785573368785",
    "us-east-2": "007439368137",
    "us-west-1": "710691900526",
    "us-west-2": "301217895009",
    "eu-west-1": "802834080501",
    "eu-west-2": "205493899709",
    "eu-west-3": "254080097072",
    "eu-north-1": "601324751636",
    "eu-south-1": "966458181534",
    "eu-central-1": "746233611703",
    "ap-east-1": "110948597952",
    "ap-south-1": "763008648453",
    "ap-northeast-1": "941853720454",
    "ap-northeast-2": "151534178276",
    "ap-southeast-1": "324986816169",
    "ap-southeast-2": "355873309152",
    "cn-northwest-1": "474822919863",
    "cn-north-1": "472730292857",
    "sa-east-1": "756306329178",
    "ca-central-1": "464438896020",
    "me-south-1": "836785723513",
    "af-south-1": "774647643957",
}


region = boto3.Session().region_name
if region not in account_id_map.keys():
    raise ("UNSUPPORTED REGION")

base = "amazonaws.com.cn" if region.startswith("cn-") else "amazonaws.com"
mme_triton_image_uri = (
    "{account_id}.dkr.ecr.{region}.{base}/sagemaker-tritonserver:23.03-py3".format(
        account_id=account_id_map[region], region=region, base=base
    )
)

The next step is to package the model subdirectories and weights into individual tarballs and upload them to S3. This process can take a about 5 minutes.

In [None]:
from pathlib import Path

model_root_path = Path("./models")
model_dirs = list(model_root_path.glob("*"))

In [None]:
model_upload_paths = {}
for model_path in model_dirs:
    model_name = model_path.name
    tar_name = model_path.name + ".tar.gz"
    !tar -C $model_root_path -czvf $tar_name $model_name
    model_upload_paths[model_name] = sagemaker_session.upload_data(path=tar_name, bucket=bucket, key_prefix=prefix)
    !rm $tar_name

We are now ready to configure and deploy the multi-model endpoint

In [None]:
model_data_url = f"s3://{bucket}/{prefix}/"  # s3 location where models are stored
ts = time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

container = {
    "Image": mme_triton_image_uri,
    "ModelDataUrl": model_data_url,
    "Mode": "MultiModel",
}

In [None]:
sm_model_name = f"{prefix}-mdl-{ts}"

create_model_response = sm_client.create_model(
    ModelName=sm_model_name, ExecutionRoleArn=role, PrimaryContainer=container
)

print("Model Arn: " + create_model_response["ModelArn"])

Create a SageMaker endpoint configuration.

In [None]:
endpoint_config_name = f"{prefix}-epc-{ts}"
instance_type = "ml.g5.2xlarge"

create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "InstanceType": instance_type,
            "InitialVariantWeight": 1,
            "InitialInstanceCount": 1,
            "ModelName": sm_model_name,
            "VariantName": "AllTraffic",
        }
    ],
)

print("Endpoint Config Arn: " + create_endpoint_config_response["EndpointConfigArn"])

## Create the endpoint, and wait for it to transition to `InService` state. Please do not re-run the below steps multiple times, as once endpoint gets created - we do not want to re-create it

In [None]:
endpoint_name = f"{prefix}-ep-{ts}"


create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name
)

print("Endpoint Arn: " + create_endpoint_response["EndpointArn"])

In [None]:

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Status: " + status)

while status == "Creating":
    time.sleep(30)
    resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
    status = resp["EndpointStatus"]
    print("Status: " + status)

print("Arn: " + resp["EndpointArn"])
print("Status: " + status)

### Part 5 - Query models <a name="query"></a>
The endpoint is now deployed and we can query the individual models

Prior to invoking any of the Stable Diffusion Models, we first invoke the `setup_conda` which will copy the conda environment into a directory that can be shared with all the other models. Refer to the [model.py](./models/setup_conda/1/model.py) file in the `models/setup_conda/1` directory for more details on the implementation. Post this we will invoke the endpoint to modify the images.

In [None]:
# invoke the setup_conda model to create the shared conda environment

payload = {
    "inputs": [
        {
            "name": "TEXT",
            "shape": [1],
            "datatype": "BYTES",
            "data": ["hello"],  # dummy data not used by the model
        }
    ]
}

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/octet-stream",
    Body=json.dumps(payload),
    TargetModel="setup_conda.tar.gz",
)

In [None]:
# Sample original image
# Choose a sample image from the sample_images folder, or upload your own image to this folder and set as the SOURCE_IMAGE variable
SOURCE_IMAGE = "sim-image-3-00157.png"

original_image = Image.open("sample_images/" + SOURCE_IMAGE)
original_image


If the above step ran successfully, what you would see is a low-res track image as captured by Robomaker. 

Next , we will use a sequence of SD models -> SD Upscale and SD Depth to modify the track images and make it more aligned with a real world track.

In [None]:
from utils import encode_image
from utils import decode_image
from PIL import Image
import glob
import os
img_path = "sample_images/"
output_path = "output_images/"
all_files = sorted(glob.glob(img_path + '/*.png'))
for f in all_files[:]:
        img = Image.open(f)
        basefile = os.path.basename(f)
        save_output_file = output_path+"SD_"+basefile
        print(save_output_file)
        low_res_image = img.resize((128, 128))
        inputs = dict(
            prompt="Image of a racing track with border of the track as white, center line of the track as yellow, the region out of the track in green color, and outside walls are black",
            image=encode_image(low_res_image).decode("utf8"),
        )

        payload = {
            "inputs": [
                {"name": name, "shape": [1, 1], "datatype": "BYTES", "data": [data]}
                for name, data in inputs.items()
            ]
        }

        response = runtime_sm_client.invoke_endpoint(
            EndpointName=endpoint_name,
            ContentType="application/octet-stream",
            Body=json.dumps(payload),
            TargetModel="sd_upscale.tar.gz",
        )
        output = json.loads(response["Body"].read().decode("utf8"))["outputs"]
        upscaled_image = decode_image(output[0]["data"][0])

        #### work on upscaled image
        input_image = encode_image(upscaled_image).decode("utf8")

        inputs = dict(
            prompt="Real world racing track with flood lights, the track should have dashed yellow center line, white track borders. White light reflections visible on the track ",
            image=input_image,
            gen_args=json.dumps(dict(num_inference_steps=100, strength=0.70)),
        )


        payload = {
            "inputs": [
                {"name": name, "shape": [1, 1], "datatype": "BYTES", "data": [data]}
                for name, data in inputs.items()
            ]
        }

        response = runtime_sm_client.invoke_endpoint(
            EndpointName=endpoint_name,
            ContentType="application/octet-stream",
            Body=json.dumps(payload),
            TargetModel="sd_depth.tar.gz",
        )
        output = json.loads(response["Body"].read().decode("utf8"))["outputs"]
        modified_track = decode_image(output[0]["data"][0])
        scale_down_image = encode_image(modified_track).decode("utf8")
        display(modified_track)

        #Let us resize the image back to its original size thats supported by DeepRacer framework
        modified_track = modified_track.resize((160,120))
        modified_track = modified_track.save(save_output_file)


### Part 6 - Validation with Pre-trained DeepRacer model <a name="query"></a>
Now, let us initiate a pre-trained model from the log analysis notebook, our aim for the remaining part of the notebook would be to pass these images to the log analysis notebook and understand where is the model looking at, when the modified track images are passed to it.

Let us start with the imports and the installations, especially the Open CV package which provides Computer vision functions to handle images

In [None]:
import re
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
import boto3
import logging
import shutil
import os
import glob
import math
import tarfile
import requests
import json
%matplotlib inline

In [None]:
!pip install shapely
!pip install opencv-python-headless

In [None]:
#Shapely Library
from shapely.geometry import Point, Polygon
from shapely.geometry.polygon import LinearRing, LineString

In [None]:
from log_analysis import *
from os import listdir
from os.path import isfile, join

We'll list the models you have access to in the bucket and prefix you provide.

In [None]:
# If using your own model it needs to be imported from S3, you need to either use your DeepRacer for Cloud / Deepracer on the Spot logs uploaded to S3, 
# or export logs on the AWS DeepRacer console to S3 before this step
BUCKET_NAME="" # Enter the name of your S3 bucket where your models are stored
PREFIX_NAME="" # Enter the name of the prefix location in S3 where you model is stored, do not add a / at the end.  For console trained models choose the model name, not the time/date folder when the model was downloaded

#list all folders in bucket but exclude files
try: 
    MODELS = [obj['Prefix'][:-1] for obj in s3_client.list_objects(Bucket=BUCKET_NAME, Prefix=PREFIX_NAME+str('/'), Delimiter='/')['CommonPrefixes']]
    i = 0
    print("Models in this S3 prefix are:")
    while i < len(MODELS):
        MODEL = ((MODELS[i]).split('/'))
        print(MODEL[len(MODEL)-1])
        i += 1
except Exception as error:
    print(error)
    print("The specfied S3 bucket and prefix does not exist.  Only continue it you intend using one of the pre-provided models")

Chose a model from the list above or use one of the sample models AtoZ-CCW-Centerline or AtoZ-CCW-Steering-Penalty.  Note - for console trained models you'll need to choose the folder with time/date it was downloaded to S3 (e.g. 'Fri, 23 Feb 2024 09:01:32 GMT')

In [None]:
model_name="AtoZ-CCW-Centerline" # Change to your own model or use one of the provided examples AtoZ-CCW-Centerline or AtoZ-CCW-Steering-Penalty

In [None]:
!rm -rf ./intermediate_checkpoint

Let us copy the necessary model files into the lab02 path

In [None]:
!mkdir -p intermediate_checkpoint/model-artifacts/

if model_name == "AtoZ-CCW-Centerline":
    print("Using AtoZ-CCW-Centerline demo model")
    !cp -R ../deepracer_models/AtoZ-CCW-Centerline/ intermediate_checkpoint/model-artifacts/
elif model_name == "AtoZ-CCW-Steering-Penalty":
    print("Using AtoZ-CCW-Steering-Penalty demo model")
    !cp -R ../deepracer_models/AtoZ-CCW-Steering-Penalty/ intermediate_checkpoint/model-artifacts/
else:
    print("Using your own " + model_name + " model")
    try:
        !aws s3 cp 's3://{BUCKET_NAME}/{PREFIX_NAME}/{model_name}' 'intermediate_checkpoint/model-artifacts/{model_name}' --recursive --quiet
        !cp 'intermediate_checkpoint/model-artifacts/{model_name}/model/model_metadata.json' 'intermediate_checkpoint/model-artifacts/{model_name}/model_metadata.json'
    except Exception as error:
        print("An error has occurred, check your bucket and prefix names are correct")

Next, let us read the model metadata and action space variables. These will be used in future steps while rendering the heatmap for the images

In [None]:
with open("intermediate_checkpoint/model-artifacts/{}/model_metadata.json".format(model_name),"r") as jsonin:
    model_metadata=json.load(jsonin)
sensor = [sensor for sensor in model_metadata['sensor'] if sensor != "LIDAR"][0]
model_metadata

In [None]:
# Track Segment Labels
action_names = []
for action in model_metadata['action_space']:
    action_names.append("ST"+str(action['steering_angle'])+" SP"+"%.2f"%action["speed"])
action_names

In the next step, we read the images created in this lab and store it in an array

In [None]:
import glob
img_path = "output_images"
all_files = sorted(glob.glob(img_path + '/*.png'))

We will be using TensorFlow to run the model against the newly created images. Let us install the same

In [None]:
!pip uninstall numpy -y
!pip install tensorflow

In the next two steps, we will import the model graph definition which is stored in protobuf format and also feed the new images to the model

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

In [None]:
%%capture tensor_setup_output
import logging
import tensorflow.compat.v1 as tf
from tensorflow.python.platform import gfile
from PIL import Image
tf.disable_v2_behavior()

logger = tf.get_logger()
logger.setLevel(logging.ERROR)


GRAPH_PB_PATH = 'intermediate_checkpoint/'

def load_session(pb_path):
    sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True,
                                    log_device_placement=True))
    print("load graph:", pb_path)
    with gfile.FastGFile(pb_path,'rb') as f:
        graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    sess.graph.as_default()
    tf.import_graph_def(graph_def, name='')
    graph_nodes=[n for n in graph_def.node]
    names = []
    for t in graph_nodes:
        names.append(t.name)

    # For front cameras/stereo camera use the below
    x = sess.graph.get_tensor_by_name('main_level/agent/main/online/network_0/{}/{}:0'.format(sensor, sensor))
    y = sess.graph.get_tensor_by_name('main_level/agent/main/online/network_1/ppo_head_0/policy:0')

    return sess, x, y

def rgb2gray(rgb):
    return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])


In [None]:
print(GRAPH_PB_PATH)

In [None]:
%%capture tensorflow_logs
import logging
model_inference = []
iterations = [7,8,9]
models_file_path = glob.glob("{}model-artifacts/{}/model/model*.pb".format(GRAPH_PB_PATH, model_name))
for model_file in models_file_path:

    model, obs, model_out = load_session(model_file)
    arr = []
    for f in all_files[:]:
        img = Image.open(f)
        img_arr = np.array(img)
        img_arr = rgb2gray(img_arr)
        img_arr = np.expand_dims(img_arr, axis=2)
        current_state = {"observation": img_arr} #(1, 120, 160, 1)
        y_output = model.run(model_out, feed_dict={obs:[img_arr]})[0]
        arr.append (y_output)
    model_inference.append(arr)
    model.close()
    tf.reset_default_graph()

We will use OpenCV methods on the Stable Diffusion generated images fed to the pre-trained DeepRacer model and overlay it to a heatmap, taking into consideration the DeepRacer model weights

In [None]:
import cv2

def visualize_gradcam_discrete_ppo(sess, rgb_img, category_index=0, num_of_actions=5):
    '''
    @inp: model session, RGB Image - np array, action_index, total number of actions
    @return: overlayed heatmap
    '''

    img_arr = np.array(img)
    img_arr = rgb2gray(img_arr)
    img_arr = np.expand_dims(img_arr, axis=2)

    x = sess.graph.get_tensor_by_name('main_level/agent/main/online/network_0/{}/{}:0'.format(sensor, sensor))
    y = sess.graph.get_tensor_by_name('main_level/agent/main/online/network_1/ppo_head_0/policy:0')
    feed_dict = {x:[img_arr]}

    #Get the policy head for clipped ppo in coach
    model_out_layer = sess.graph.get_tensor_by_name('main_level/agent/main/online/network_1/ppo_head_0/policy:0')
    loss = tf.multiply(model_out_layer, tf.one_hot([category_index], num_of_actions))
    reduced_loss = tf.reduce_sum(loss[0])

    # For front cameras use the below
    conv_output = sess.graph.get_tensor_by_name('main_level/agent/main/online/network_1/{}/Conv2d_4/Conv2D:0'.format(sensor))

    grads = tf.gradients(reduced_loss, conv_output)[0]
    output, grads_val = sess.run([conv_output, grads], feed_dict=feed_dict)
    weights = np.mean(grads_val, axis=(1, 2))
    cams = np.sum(weights * output, axis=3)

    ##im_h, im_w = 120, 160##
    im_h, im_w = rgb_img.shape[:2]

    cam = cams[0] #img 0
    image = np.uint8(rgb_img[:, :, ::-1] * 255.0) # RGB -> BGR
    cam = cv2.resize(cam, (im_w, im_h)) # zoom heatmap
    cam = np.maximum(cam, 0) # relu clip
    heatmap = cam / np.max(cam) # normalize
    cam = cv2.applyColorMap(np.uint8(255 * heatmap), cv2.COLORMAP_JET) # grayscale to color
    cam = np.float32(cam) + np.float32(image) # overlay heatmap
    cam = 255 * cam / (np.max(cam) + 1E-5) ##  Add expsilon for stability
    cam = np.uint8(cam)[:, :, ::-1] # to RGB

    return cam

Now, let us loop over the Stable Diffusion generated images in the output folder and pass it to the heatmap visualization function defined above. The generated heatmaps will be stored to an array called "heatmaps"

In [None]:
%%capture heatmap_cell_logs
model_path = models_file_path[0] #Change this to your model 'pb' frozen graph file

model, obs, model_out = load_session(model_path)
heatmaps = []
print(all_files)
#Just need to match up the shape of the neural network
if 'action_space_type' in model_metadata and model_metadata['action_space_type']=='continuous':
    num_of_actions=2
else:
    num_of_actions=len(action_names)

for f in all_files[:6]:
    img = np.array(Image.open(f))
    heatmap = visualize_gradcam_discrete_ppo(model, img, category_index=0, num_of_actions=num_of_actions)
    heatmaps.append(heatmap)
tf.reset_default_graph()

Finally, let us render the heatmaps to validate what the DeepRacer pre-trained model is looking at based on our newer, real world like images 

In [None]:
for i in range(len(heatmaps)):
    plt.imshow(heatmaps[i])
    plt.show()

### This concludes our lab-02. We have successfully:
  - Created simulated track images using Stable diffusion matching to more real world features
  - Fed these images to a pre-trained DeepRacer model to understand what the model looks at, thereby helping the racer make decisions to modify their reward function / model.
  
This will help us understand how Gen AI can be used to augment and support improved model training use cases.

Let us proceed to clean up of the Sagemaker endpoints

## Clean up <a name="query"></a>

In [None]:
sm_client.delete_endpoint(EndpointName=endpoint_name)
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
sm_client.delete_model(ModelName=sm_model_name)

In [None]:
#delete models in respective paths
for model_name, model_local_path in models_local_path.items():
    !rm -rf $model_local_path