In [1]:
!pip install -U sagemaker
#restart your kernel



### Bring your own Container

In this notebook, we will cover how to bring our own container with either a framework or algorithm to train a model on SageMaker. 

We will use fastai in this case and build our container with custom training code integrated into the container. The other option is to use script mode which is easily done by changing the entrypoint.

The outline of this notebook is 

1. Build docker a image for FastAI and serving and training code (provided).

2. Log into ECR, tag and push docker image to ECR 

3. Use the FastAI container image in SageMaker to train our model 

4. Deploy model to endpoint using the container image

5. Test inference using an image in couple of possible ways 

#### Container Image
Let's start with building a container image locally and then push that to ECR (Elastic Container Registry)

In [2]:
%cd ~/SageMaker/pssummitwkshp/byoc/docker

/home/ec2-user/SageMaker/pssummitwkshp/byoc/docker


In [3]:
!docker build -t fastai .

Sending build context to Docker daemon  6.656kB
Step 1/8 : FROM fastdotai/fastai:2021-02-11
2021-02-11: Pulling from fastdotai/fastai

[1Bcc0b8772: Pulling fs layer 
[1Bfb62ba5f: Pulling fs layer 
[1B964ece6a: Pulling fs layer 
[1B21e7e7a7: Pulling fs layer 
[1B5cfd2a87: Pulling fs layer 
[1B79fda18c: Pulling fs layer 
[1Bd338cee0: Pulling fs layer 
[1Bce662b92: Pulling fs layer 
[1Bc9683ef2: Pulling fs layer 
[1B47a309c2: Pulling fs layer 
[1Bc2420471: Pulling fs layer 
[9B21e7e7a7: Waiting fs layer 
[1B209d4da8: Pulling fs layer 
[1B6dd12925: Pulling fs layer 
[11Bcfd2a87: Waiting fs layer 
[5Bb700ef54: Waiting fs layer 
[1BDigest: sha256:c36b43104474006d8f8cd2a65f740bfd505693c670644c1d2dbedb5a6fb2de8a[2K[13A[2K[13A[2K[13A[2K[11A[2K[13A[2K[11A[2K[13A[2K[11A[2K[17A[2K[11A[2K[17A[2K[10A[2K[17A[2K[17A[2K[17A[2K[17A[2K[10A[2K[17A[2K[11A[2K[16A[2K[10A[2K[9A[2K[15A[2K[10A[2K[14A[2K[9A[2K[14A[2K[10A[2K[9A[2K[10A[2

In [4]:
!docker images

REPOSITORY         TAG          IMAGE ID       CREATED         SIZE
fastai             latest       bd0e893e4613   1 second ago    7.53GB
fastdotai/fastai   2021-02-11   c15a6ed2e7f0   15 months ago   7.43GB


## Set the ecr details and tags 
Lets set a few params here like ecr name space , tag name etc.

In [5]:
from sagemaker import get_execution_role
import boto3
ecr_namespace = "sagemaker-training-containers/"
prefix = "script-mode-container-fastai"

ecr_repository_name = ecr_namespace + prefix
role = get_execution_role()
account_id = role.split(":")[4]
region = boto3.Session().region_name
tag_name=account_id+'.dkr.ecr.'+region+'.amazonaws.com/'+ecr_repository_name+':latest'

In [6]:
tag_name

'152281701141.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/script-mode-container-fastai:latest'

Now we tag our image with the tag name we generated above

In [7]:
!docker tag fastai $tag_name

### ECR Repository and push steps

All of these can be scripted out but they are laid out this way for transparency and step evolution understanding

First we get a token credential to ECR. This will allow us to perform ECR operations

In [8]:
!$(aws ecr get-login --no-include-email)

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded


Here we create an ECR repository

In [9]:
!aws ecr create-repository --repository-name $ecr_repository_name

{
    "repository": {
        "repositoryArn": "arn:aws:ecr:us-east-1:152281701141:repository/sagemaker-training-containers/script-mode-container-fastai",
        "registryId": "152281701141",
        "repositoryName": "sagemaker-training-containers/script-mode-container-fastai",
        "repositoryUri": "152281701141.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/script-mode-container-fastai",
        "createdAt": 1652288224.0,
        "imageTagMutability": "MUTABLE",
        "imageScanningConfiguration": {
            "scanOnPush": false
        },
        "encryptionConfiguration": {
            "encryptionType": "AES256"
        }
    }
}


Now that our ECR respoitory has been created, we can now push our docker image to it with the tag name we assigned to it

In [10]:
!docker push $tag_name

The push refers to repository [152281701141.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/script-mode-container-fastai]

[1B9cf368f5: Preparing 
[1B0c5def55: Preparing 
[1Be8ff712f: Preparing 
[1B03119f42: Preparing 
[1B9d9efed5: Preparing 
[1B5dee3f41: Preparing 
[1Be46047de: Preparing 
[1Bea1e71e9: Preparing 
[1Bbf18a086: Preparing 
[1Bfc49132e: Preparing 
[1B5e116b6d: Preparing 
[1B5da50cc0: Preparing 
[1B722bdc07: Preparing 
[1Bb673a1d6: Preparing 
[10Bdee3f41: Waiting g 
[1B6268583e: Preparing 
[1Bcc6eae8b: Preparing 
[1B8881187d: Preparing 
[13B46047de: Waiting g 
[5B6268583e: Pushed   5.531GB/5.5GBGB[17A[2K[19A[2K[19A[2K[19A[2K[20A[2K[17A[2K[15A[2K[19A[2K[12A[2K[19A[2K[19A[2K[19A[2K[15A[2K[13A[2K[19A[2K[11A[2K[10A[2K[19A[2K[10A[2K[19A[2K[8A[2K[19A[2K[9A[2K[11A[2K[9A[2K[10A[2K[9A[2K[11A[2K[8A[2K[9A[2K[8A[2K[8A[2K[19A[2K[8A[2K[11A[2K[8A[2K[11A[2K[8A[2K[11A[2K[8A[2K[

This is how we get the URI of our uploaded docker image in ECR

In [11]:
container_image_uri = "{0}.dkr.ecr.{1}.amazonaws.com/{2}:latest".format(
    account_id, region, ecr_repository_name
)
print(container_image_uri)

152281701141.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/script-mode-container-fastai:latest


#### Call your custom container to train the model

In the cell below, replace **"your-unique-bucket-name"** with the name of bucket you created in the data-prep notebook

In [None]:
%%time
import sagemaker
import json

bucket = "your-unique-bucket-name"


# JSON encode hyperparameters
def json_encode_hyperparameters(hyperparameters):
    return {str(k): json.dumps(v) for (k, v) in hyperparameters.items()}


hyperparameters = json_encode_hyperparameters({"lr":1e-03})

est = sagemaker.estimator.Estimator(
    container_image_uri,
    role,
    instance_count=1,
    #train_instance_type="local",  # we use local mode
    instance_type='ml.m5.12xlarge',
    base_job_name=prefix,
    hyperparameters=hyperparameters,
)

train_config = sagemaker.session.TrainingInput(f's3://{bucket}/train')

est.fit({"train": train_config})

2022-05-11 17:02:39 Starting - Starting the training job...
2022-05-11 17:02:56 Starting - Preparing the instances for trainingProfilerReport-1652288559: InProgress
.........
2022-05-11 17:04:21 Downloading - Downloading input data...
2022-05-11 17:05:01 Training - Downloading the training image...............
2022-05-11 17:07:29 Training - Training image download completed. Training in progress.[34m2022-05-11 17:07:31,569 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-05-11 17:07:31,592 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-05-11 17:07:31,615 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-05-11 17:07:31,623 sagemaker-training-toolkit INFO     Invoking user script[0m
[34mTraining Env:[0m
[34m{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
        "train": "/opt/ml/input/data/train"
    },
    "curren

Finally let us print out the trained FastAI model location. You will need this information for the inference step

In [7]:
print(f'FastAI Model located at \n{est.output_path}{est._current_job_name}/output/model.tar.gz')

FastAI Model located at 
s3://sagemaker-us-east-1-152281701141/script-mode-container-fastai-2022-05-11-17-02-39-257/output/model.tar.gz


### Attach to a training job that has been left to run 

If your kernel becomes disconnected and your training has already started, you can reattach to the training job.<br>
Simply look up the training job name and replace the **your-training-job-name** and then run the cell below. <br>
Once the training job is finished, you can continue the cells after the training cell

In [4]:
import sagemaker
import boto3

sess = sagemaker.Session()

training_job_name = 'your-training-job-name'

est = sagemaker.estimator.Estimator.attach(training_job_name=training_job_name, sagemaker_session=sess)


2022-05-11 18:08:39 Starting - Preparing the instances for training
2022-05-11 18:08:39 Downloading - Downloading input data
2022-05-11 18:08:39 Training - Training image download completed. Training in progress.
2022-05-11 18:08:39 Uploading - Uploading generated training model
2022-05-11 18:08:39 Completed - Training job completed
