## fastAI with SageMaker Bring your own Container

In this notebook, we will cover how to bring our own container with either a framework or algorithm to train a model on SageMaker. 

We will use fastai in this case and build our container with custom training code integrated into the container. The other option is to use script mode which is easily done by changing the entrypoint.

The outline of this notebook is 

1. Build docker a image for FastAI and serving and training code (provided).

2. Log into ECR, tag and push docker image to ECR 

3. Use the FastAI container image in SageMaker to train our model 

4. Deploy model to endpoint using the container image

5. Test inference using an image in couple of possible ways 

#### Container Image
Let's start with building a container image locally and then push that to ECR (Elastic Container Registry)

In [1]:
%cd ~/SageMaker/pssummitwkshp/byoc/docker

/home/ec2-user/SageMaker/pssummitwkshp/byoc/docker


In [2]:
!docker build -t fastai .

Sending build context to Docker daemon  10.75kB
Step 1/8 : FROM fastdotai/fastai:latest
 ---> c4c23d349f61
Step 2/8 : LABEL maintainer="Raj Kadiyala"
 ---> Using cache
 ---> c1feda98b8da
Step 3/8 : WORKDIR /
 ---> Using cache
 ---> 04ea84665d52
Step 4/8 : RUN pip3 install --no-cache --upgrade requests
 ---> Using cache
 ---> 2f4283db96dc
Step 5/8 : ENV PYTHONDONTWRITEBYTECODE=1     PYTHONUNBUFFERED=1     LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/local/lib"     PYTHONIOENCODING=UTF-8     LANG=C.UTF-8     LC_ALL=C.UTF-8
 ---> Using cache
 ---> c97bce1a9957
Step 6/8 : RUN pip3 install --no-cache --upgrade     sagemaker-training
 ---> Using cache
 ---> d569b2835df4
Step 7/8 : COPY code/* /opt/ml/code/
 ---> Using cache
 ---> 64ef54f07f21
Step 8/8 : ENV SAGEMAKER_PROGRAM train.py
 ---> Using cache
 ---> 4efcda38ad55
Successfully built 4efcda38ad55
Successfully tagged fastai:latest


In [3]:
!docker images

REPOSITORY                                                                                                TAG                 IMAGE ID            CREATED             SIZE
fastai                                                                                                    latest              4efcda38ad55        15 hours ago        9.17GB
650687152614.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/script-mode-container-fastai   latest              4efcda38ad55        15 hours ago        9.17GB
650687152614.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/script-mode-container-fastai   <none>              385028ea5d16        20 hours ago        9.17GB
fastdotai/fastai                                                                                          latest              c4c23d349f61        29 hours ago        9.13GB


## Set the ecr details and tags 
Lets set a few params here like ecr name space , tag name etc.

In [4]:
from sagemaker import get_execution_role
import boto3
ecr_namespace = "sagemaker-training-containers/"
prefix = "script-mode-container-fastai"

ecr_repository_name = ecr_namespace + prefix
role = get_execution_role()
account_id = role.split(":")[4]
region = boto3.Session().region_name
tag_name=account_id+'.dkr.ecr.'+region+'.amazonaws.com/'+ecr_repository_name+':latest'

In [5]:
tag_name

'650687152614.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/script-mode-container-fastai:latest'

Now we tag our image with the tag name we generated above

In [6]:
!docker tag fastai $tag_name

### ECR Repository and push steps

All of these can be scripted out but they are laid out this way for transparency and step evolution understanding

First we get a token credential to ECR. This will allow us to perform ECR operations

In [7]:
!$(aws ecr get-login --no-include-email)

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded


Here we create an ECR repository

In [8]:
!aws ecr create-repository --repository-name $ecr_repository_name


An error occurred (RepositoryAlreadyExistsException) when calling the CreateRepository operation: The repository with name 'sagemaker-training-containers/script-mode-container-fastai' already exists in the registry with id '650687152614'


Now that our ECR respoitory has been created, we can now push our docker image to it with the tag name we assigned to it

In [9]:
!docker push $tag_name

The push refers to repository [650687152614.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/script-mode-container-fastai]

[1B151aa860: Preparing 
[1Bd23a33d0: Preparing 
[1Ba4ebc26c: Preparing 
[1Bd277c3a5: Preparing 
[1Bb8a4997f: Preparing 
[1B570c96fa: Preparing 
[1B9a10a0ec: Preparing 
[1Bfdcbbf19: Preparing 
[1Bbf18a086: Preparing 
[1Bc4239216: Preparing 
[1B66e1e1d6: Preparing 
[1Be35a6476: Preparing 
[1B8fbc8492: Preparing 
[1Bfa06e06c: Preparing 
[1B62e73fa9: Preparing 
[1B491659cb: Preparing 
[1Bdc413928: Preparing 
[1Bad8f2cae: Preparing 
[1B581dbc3c: Preparing 
[3Bad8f2cae: Layer already exists [18A[2K[14A[2K[11A[2K[9A[2K[6A[2K[13A[2K[3A[2Klatest: digest: sha256:e4aab61bd7526510a3861b78a3efa015612d4c86aa00eb681b4c114b53f24370 size: 4711


This is how we get the URI of our uploaded docker image in ECR

In [10]:
container_image_uri = "{0}.dkr.ecr.{1}.amazonaws.com/{2}:latest".format(
    account_id, region, ecr_repository_name
)
print(container_image_uri)

650687152614.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/script-mode-container-fastai:latest


#### Call your custom container to train the model

In the cell below, replace **"your-unique-bucket-name"** with the name of bucket you created in the data-prep notebook

In [11]:
%%time
import sagemaker
import json

bucket = "your-unique-bucket-name"

# JSON encode hyperparameters
def json_encode_hyperparameters(hyperparameters):
    return {str(k): json.dumps(v) for (k, v) in hyperparameters.items()}


hyperparameters = json_encode_hyperparameters({"lr":1e-03})

est = sagemaker.estimator.Estimator(
    container_image_uri,
    role,
    instance_count=1,
    #train_instance_type="local",  # we use local mode
    instance_type='ml.m5.12xlarge',
    base_job_name=prefix,
    hyperparameters=hyperparameters,
)

train_config = sagemaker.session.TrainingInput(f's3://{bucket}/train')

est.fit({"train": train_config})

2021-09-23 11:38:37 Starting - Starting the training job...
2021-09-23 11:39:01 Starting - Launching requested ML instancesProfilerReport-1632397117: InProgress
......
2021-09-23 11:40:01 Starting - Preparing the instances for training......
2021-09-23 11:41:01 Downloading - Downloading input data...
2021-09-23 11:41:21 Training - Downloading the training image........................[34m2021-09-23 11:45:29,103 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-09-23 11:45:30,534 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-09-23 11:45:30,543 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-09-23 11:45:30,550 sagemaker-training-toolkit INFO     Invoking user script
[0m
[34mTraining Env:
[0m
[34m{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
        "train": "/opt/ml/input/data/train"
    },
    "current_host": "

## Inference

We are going to do inference using SageMaker Script mode. We could create a new container or extend the container we created for training our FastAI model to implement serving for predictions. However, is this case we are going to tuse the SageMaker pre-built PyTorch container and install/update the necessary libraries to be able to implement predictions in FastAI. We have created a source folder named **inference** which contains two files. A requirements.txt file (our library and version dependencies) which contains fastai v2.5.2 and a serve.py code for doing predictions.  

Lets start by importing some libraries we will be using

In [35]:
from sagemaker.pytorch import PyTorchModel
from sagemaker import get_execution_role
import boto3
from PIL import Image
import io
from io import BytesIO

prefix = "script-mode-container-fastai"

role = get_execution_role()

Here we get our SageMaker session and runtime

In [13]:
sm = boto3.client('sagemaker')
client = boto3.client('sagemaker-runtime')

We use the SageMaker session to list all of our training jobs and filter the names by the prefix and completion status to retrieve our last completed FastAI training job. 

In [14]:
list_jobs = sm.list_training_jobs()['TrainingJobSummaries']
last_completed_job = [(i) for i in list_jobs if prefix in i['TrainingJobName'] and i['TrainingJobStatus'] == 'Completed']
print(last_completed_job[0]['TrainingJobName'])

script-mode-container-fastai-2021-09-23-11-38-37-072


Now we know the training job name, we use it to get the the S3 URI location of where the trained model artifacts have been stored.

In [15]:
model_artifacts = sm.describe_training_job(TrainingJobName=last_completed_job[0]['TrainingJobName'])['ModelArtifacts']
model_uri = model_artifacts['S3ModelArtifacts']

In [26]:
%cd ..

/home/ec2-user/SageMaker/pssummitwkshp/byoc


Here we set up the PyTorch container with information it needs to run <br>
**model_data** - S3 URI location of the trained model artifacts<br>
**role** - IAM role of the SageMaker Notebook instance<br>
**entry_point** - the folder/filename of the code that implements the prediction piece<br>
**source_dir** - the folder which contains the prediction code and the requirements.txt file of the libraries that need to be installed<br>
**framework_version** - PyTorch framework version<br>
**py_version** - Python version<br><br><br>
**NOTE we are using PyTorchModel container vs using the generic SageMaker Estimator**

In [36]:
pytorch_model = PyTorchModel(model_data=model_uri, 
                             role=role, 
                             entry_point='inference/serve.py',
                             source_dir='inference',
                             framework_version='1.8',
                             py_version='py3')

Now we deploy the container. SageMaker will pull the inference code and requirements.txt to create the container environment to deploy for inference. Note that unlike what we did for training, we did not have to do a docker build and push to ECR. 

In [28]:
predictor = pytorch_model.deploy(instance_type='ml.m5.4xlarge', initial_instance_count=1)

-----!

Now its deployed, lets test inference using a test image

In [33]:
%%time

im_name="../data/test/Signal/S2.png"

response = client.invoke_endpoint(
EndpointName=predictor.endpoint_name,
ContentType='application/x-image',
Body=open(im_name, 'rb').read())

CPU times: user 5.45 ms, sys: 286 µs, total: 5.74 ms
Wall time: 565 ms


Now lets decode and print out the JSON response 

In [34]:
import json
json.loads(response['Body'].read().decode("utf-8"))

{'Prediction': 'Signal',
 'Tensor': 'tensor(2)',
 'Probabilities': 'tensor([0.0193, 0.0303, 0.9504])'}