## Building and pushing image to an EC Repository

In [58]:
!cd .. && scripts/build_and_push.sh

scripts/build_and_push.sh: line 1: ·!/bin/bash: No such file or directory
ECR Login
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Building image
Sending build context to Docker daemon  2.457GB
Step 1/10 : FROM nvcr.io/nvidia/pytorch:20.08-py3
 ---> c710aa2340b4
Step 2/10 : ENV PYTHONDONTWRITEBYTECODE=1
 ---> Using cache
 ---> d2f5f51c1af0
Step 3/10 : ENV PYTHONUNBUFFERED=1
 ---> Using cache
 ---> 3cc2d1502556
Step 4/10 : RUN apt-get update && apt-get install -y --no-install-recommends nginx curl
 ---> Using cache
 ---> c0cc25b67199
Step 5/10 : WORKDIR /opt/ml/
 ---> Using cache
 ---> dcd4566bb2a9
Step 6/10 : RUN pip install sagemaker-training
 ---> Using cache
 ---> 9de135ef4fbd
Step 7/10 : COPY src/ /opt/ml/code
 ---> 660895e48d9f
Step 8/10 : RUN mkdir /opt/ml/checkpoints
 ---> Running in f21a2cd3c392
Removing intermediate container f21a2cd3c392
 ---> cbaed926c46a
Step 9/10 : ENV SAGEMAKER_PROGRAM train
 ---> Running in aa89a3176b8a
Rem

## Sagemaker config

In [59]:
from sagemaker.session import get_execution_role, Session
import os

sagemaker_role = get_execution_role()
sagemaker_session = Session()

In [60]:
from time import gmtime, strftime

bucket_name = sagemaker_session.default_bucket()
key_name = "TiendaApp"
s3_uri_data = "s3://{}/{}/{}/".format(bucket_name, key_name, "data")
s3_uri_output = "s3://{}/{}/{}/".format(bucket_name, key_name, "model")
account = sagemaker_session.boto_session.client('sts').get_caller_identity()['Account']
region = sagemaker_session.boto_session.region_name
image_name = "yolo_train"
image_uri = "{0}.dkr.ecr.{1}.amazonaws.com/{2}".format(account, region, image_name)
#image_uri = "registry.hub.docker.com/repository/docker/cvillad/{0}".format(image_name)
base_job_name = "test-training-job-{}".format(strftime("%Y-%m-%d-%H-%M-%S", gmtime()))
os.environ["account"] = account
os.environ["s3_uri"] = s3_uri_data
print("Training Job name : {}".format(base_job_name))
print("S3 uri input: {}".format(s3_uri_data))
print("S3 uri output: {}".format(s3_uri_output))
print("image uri: {}".format(image_uri))
!aws s3 ls $s3_uri
!aws ecr describe-repositories --registry-id $account

Training Job name : test-training-job-2020-09-11-22-33-47
S3 uri input: s3://sagemaker-us-west-2-430127992102/TiendaApp/data/
S3 uri output: s3://sagemaker-us-west-2-430127992102/TiendaApp/model/
image uri: 430127992102.dkr.ecr.us-west-2.amazonaws.com/yolo_train
                           PRE training/
                           PRE validating/
2020-09-10 17:34:47         92 obj.data
2020-09-10 17:34:47        532 obj.names
2020-09-10 17:34:47      58638 train.txt
2020-09-10 17:34:47      26908 val.txt
{
    "repositories": [
        {
            "repositoryArn": "arn:aws:ecr:us-west-2:430127992102:repository/yolo_train",
            "registryId": "430127992102",
            "repositoryName": "yolo_train",
            "repositoryUri": "430127992102.dkr.ecr.us-west-2.amazonaws.com/yolo_train",
            "createdAt": 1599690799.0,
            "imageTagMutability": "MUTABLE",
            "imageScanningConfiguration": {
                "scanOnPush": false
            },
            "enc

## Starting a sagemaker training job

In [61]:
from sagemaker.estimator import Estimator

yolov3_estimator = Estimator(image_name=image_uri, 
                            role=sagemaker_role,
                            train_instance_count=1,
                           train_instance_type="ml.g4dn.xlarge",
                            train_volume_size=35,
                            sagemaker_session=sagemaker_session,
                            base_job_name = base_job_name,
                            hyperparameters={"test": "this is a test", "batch": 32},
                            tags=[{"Key": "Name", "Value": "test-job"},
                                 {"Key": "Description", "Value": "Test training job"}])


yolov3_estimator.fit(inputs={"training": s3_uri_data}, job_name=base_job_name, wait=True)

Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.


2020-09-11 22:33:48 Starting - Starting the training job...
2020-09-11 22:33:50 Starting - Launching requested ML instances......
2020-09-11 22:34:59 Starting - Preparing the instances for training...
2020-09-11 22:35:48 Downloading - Downloading input data.........
[34m== PyTorch ==[0m
[0m
[34mNVIDIA Release 20.08 (build 15516749)[0m
[34mPyTorch Version 1.7.0a0+8deb4fe
[0m
[34mContainer image Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
[0m
[34mCopyright (c) 2014-2020 Facebook Inc.[0m
[34mCopyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)[0m
[34mCopyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)[0m
[34mCopyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)[0m
[34mCopyright (c) 2011-2013 NYU                      (Clement Farabet)[0m
[34mCopyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)[0m
[34mCopyright (c) 2006      Idiap Research Institute (Sa