### Bring your own Container

In this notebook, we will cover how to bring our own container with either a framework or algorithm to train a model on SageMaker. 

We will use fastai in this case and build our container with custom training code integrated into the container. The other option is to use script mode which is easily done by changing the entrypoint.


#### Container Image
Let's start with building a container image locally and then push that to ECR (Elastic Container Registry)

In [1]:
%cd docker

/home/ec2-user/SageMaker/sagemaker-training-wkshp/byoc/docker


In [2]:
!docker build -t fastai .

Sending build context to Docker daemon  11.78kB
Step 1/8 : FROM fastdotai/fastai:latest
latest: Pulling from fastdotai/fastai

[1B2a97ff99: Pulling fs layer 
[1Ba9d27eb8: Pulling fs layer 
[1B9583700a: Pulling fs layer 
[1B0b9c7100: Pulling fs layer 
[1B6e51b6e4: Pulling fs layer 
[1B04e75c6c: Pulling fs layer 
[1B4a4dbdd7: Pulling fs layer 
[1Be79ecfbe: Pulling fs layer 
[1Be5820b5a: Pulling fs layer 
[1Bf41d96db: Pulling fs layer 
[1B595a9955: Pulling fs layer 
[1Bb700ef54: Pulling fs layer 
[1Bd144ba08: Pulling fs layer 
[1B95eb47d3: Pulling fs layer 
[1B72e2ab3d: Pulling fs layer 
[1B26627b53: Pulling fs layer 
[1BDigest: sha256:cf49b16e87576c75c590e4eaef54e5f2ec544fa7e476259b7ab6f76267b56e54[2K[14A[2K[17A[2K[10A[2K[10A[2K[11A[2K[10A[2K[10A[2K[13A[2K[10A[2K[13A[2K[10A[2K[17A[2K[13A[2K[17A[2K[9A[2K[9A[2K[9A[2K[15A[2K[9A[2K[10A[2K[13A[2K[10A[2K[13A[2K[14A[2K[9A[2K[14A[2K[9A[2K[10A[2K[13A[2K[14A[2K[14A[2K

In [3]:
!docker images

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
fastai              latest              7fd6292940fe        7 minutes ago       9.17GB
fastdotai/fastai    latest              3d37e90c53a5        16 hours ago        9.08GB


## Set the ecr details and tags 
Lets set a few params here like ecr name space , tag name etc.

In [4]:
from sagemaker import get_execution_role
import boto3
ecr_namespace = "sagemaker-training-containers/"
prefix = "script-mode-container-fastai"

ecr_repository_name = ecr_namespace + prefix
role = get_execution_role()
account_id = role.split(":")[4]
region = boto3.Session().region_name
tag_name=account_id+'.dkr.ecr.'+region+'.amazonaws.com/'+ecr_repository_name+':latest'

In [5]:
tag_name

'463759052894.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/script-mode-container-fastai:latest'

In [6]:
!docker tag fastai $tag_name

### ECR Repository and push steps

All of these can be scripted out but they are laid out this way for transparency and step evolution understanding

In [7]:
!$(aws ecr get-login --no-include-email)

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded


In [8]:
!aws ecr create-repository --repository-name $ecr_repository_name

{
    "repository": {
        "repositoryArn": "arn:aws:ecr:us-east-1:463759052894:repository/sagemaker-training-containers/script-mode-container-fastai",
        "registryId": "463759052894",
        "repositoryName": "sagemaker-training-containers/script-mode-container-fastai",
        "repositoryUri": "463759052894.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/script-mode-container-fastai",
        "createdAt": 1628633147.0,
        "imageTagMutability": "MUTABLE",
        "imageScanningConfiguration": {
            "scanOnPush": false
        },
        "encryptionConfiguration": {
            "encryptionType": "AES256"
        }
    }
}


In [9]:
!docker push $tag_name

The push refers to repository [463759052894.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/script-mode-container-fastai]

[1B429c48f4: Preparing 
[1B80e8aba9: Preparing 
[1B81373199: Preparing 
[1Bdaf5c999: Preparing 
[1B1becab14: Preparing 
[1B8e6c71b0: Preparing 
[1B84391bf1: Preparing 
[1B31109c52: Preparing 
[1Bbf18a086: Preparing 
[1Bd6a6ebd3: Preparing 
[1B0f582258: Preparing 
[1B9b3ad234: Preparing 
[1B0593cfdd: Preparing 
[1B742c2604: Preparing 
[1B62e73fa9: Preparing 
[11Be6c71b0: Waiting g 
[11B4391bf1: Waiting g 
[1Bad8f2cae: Preparing 
[7B0593cfdd: Waiting g 
[5B491659cb: Pushed   7.191GB/7.158GB[19A[2K[19A[2K[19A[2K[20A[2K[16A[2K[19A[2K[15A[2K[15A[2K[14A[2K[13A[2K[19A[2K[19A[2K[15A[2K[19A[2K[12A[2K[11A[2K[13A[2K[11A[2K[10A[2K[8A[2K[10A[2K[8A[2K[9A[2K[8A[2K[9A[2K[8A[2K[19A[2K[10A[2K[8A[2K[9A[2K[10A[2K[9A[2K[10A[2K[8A[2K[10A[2K[19A[2K[10A[2K[19A[2K[8A[2K[7A[2K[

In [10]:
container_image_uri = "{0}.dkr.ecr.{1}.amazonaws.com/{2}:latest".format(
    account_id, region, ecr_repository_name
)
print(container_image_uri)

463759052894.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/script-mode-container-fastai:latest


#### Call your custom container to train the model

In [12]:
import sagemaker
import json

# JSON encode hyperparameters
def json_encode_hyperparameters(hyperparameters):
    return {str(k): json.dumps(v) for (k, v) in hyperparameters.items()}


hyperparameters = json_encode_hyperparameters({"lr":1e-03})

est = sagemaker.estimator.Estimator(
    container_image_uri,
    role,
    instance_count=1,
    #train_instance_type="local",  # we use local mode
    instance_type='ml.m5.4xlarge',
    base_job_name=prefix,
    hyperparameters=hyperparameters,
)

train_config = sagemaker.session.TrainingInput('s3://rkadiy-data-bucket/train')

est.fit({"train": train_config})

2021-08-10 22:48:49 Starting - Starting the training job...
2021-08-10 22:49:11 Starting - Launching requested ML instancesProfilerReport-1628635728: InProgress
...
2021-08-10 22:49:44 Starting - Preparing the instances for training......
2021-08-10 22:50:49 Downloading - Downloading input data...
2021-08-10 22:51:12 Training - Downloading the training image..........................[34m2021-08-10 22:55:29,484 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-08-10 22:55:29,503 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-08-10 22:55:29,513 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-08-10 22:55:29,521 sagemaker-training-toolkit INFO     Invoking user script
[0m
[34mTraining Env:
[0m
[34m{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
        "train": "/opt/ml/input/data/train"
    },
    "current_host": "a

UnexpectedStatusException: Error for Training job script-mode-container-fastai-2021-08-10-22-48-48-719: Failed. Reason: AlgorithmError: ExecuteUserScriptError:
Command "/opt/conda/bin/python train.py --lr 0.001"