# Training an external algorithm in SageMaker

In this lab we will leverage an external training script we maybe got from a client, and dispatch a ProcessingJob to train it.
SageMaker is very specific into how to trigger these so tak your time in checking the details!

In [12]:
import boto3
import sagemaker
from sagemaker import get_execution_role

region = boto3.session.Session().region_name
session = sagemaker.Session()
role = get_execution_role()
bucket = "sagemaker-course-di"
prefix = "datasets"
data_dir="forecasting"
filename = "energy-train.csv"
data_s3_location = "s3://{}/{}/{}".format(bucket, prefix, filename)  # S3 URL

In [2]:
# We need to have a docker directory to have the dockerfile to build our training script!
!mkdir docker

mkdir: cannot create directory ‘docker’: File exists


In [2]:
# This will install prohet, a Facebook model for forecasting!
%%writefile docker/Dockerfile

FROM python:3.7
    
RUN apt-get -y update  && apt-get install -y \
  python3-dev \
  apt-utils \
  python-dev \
  build-essential \
&& rm -rf /var/lib/apt/lists/*

RUN pip install --upgrade setuptools
RUN pip install cython
RUN pip install numpy
RUN pip install matplotlib
RUN pip install pystan==2.19.1.1
RUN pip install pandas
RUN pip install LunarCalendar>=0.0.9 convertdate>=2.1.2 holidays>=0.10.2 python-dateutil>=2.8.0 tqdm>=4.36.1
RUN pip install fbprophet

ENV PYTHONUNBUFFERED=TRUE

ENTRYPOINT ["python3"]



Overwriting docker/Dockerfile


In [3]:
import boto3

account_id = boto3.client("sts").get_caller_identity().get("Account")
ecr_repository = "sagemaker-processing-container"
tag = ":latest"

uri_suffix = "amazonaws.com"
processing_repository_uri = "{}.dkr.ecr.{}.{}/{}".format(
    account_id, region, uri_suffix, ecr_repository + tag
)

# Create ECR repository and push docker image
!docker build -t $ecr_repository docker
!$(aws ecr get-login --region $region --registry-ids $account_id --no-include-email)
!aws ecr create-repository --repository-name $ecr_repository
!docker tag {ecr_repository + tag} $processing_repository_uri
!docker push $processing_repository_uri

Sending build context to Docker daemon   2.56kB
Step 1/12 : FROM python:3.7
3.7: Pulling from library/python

[1B15a668ce: Pulling fs layer 
[1Bef5f69a5: Pulling fs layer 
[1Ba9f2bd51: Pulling fs layer 
[1Ba22ee906: Pulling fs layer 
[1Bd51a9262: Pulling fs layer 
[1B74b7d363: Pulling fs layer 
[1Bbca59c25: Pulling fs layer 
[1B7ee0e52d: Pulling fs layer 
[1B753bc742: Pull complete 349MB/2.349MBB[8A[2K[7A[2K[8A[2K[7A[2K[7A[2K[6A[2K[9A[2K[6A[2K[9A[2K[6A[2K[9A[2K[5A[2K[6A[2K[5A[2K[6A[2K[5A[2K[6A[2K[5A[2K[6A[2K[5A[2K[9A[2K[5A[2K[9A[2K[5A[2K[9A[2K[6A[2K[6A[2K[9A[2K[6A[2K[5A[2K[4A[2K[5A[2K[3A[2K[5A[2K[6A[2K[5A[2K[6A[2K[5A[2K[9A[2K[5A[2K[2A[2K[5A[2K[9A[2K[1A[2K[9A[2K[6A[2K[9A[2K[9A[2K[6A[2K[5A[2K[9A[2K[5A[2K[6A[2K[5A[2K[9A[2K[5A[2K[9A[2K[5A[2K[9A[2K[9A[2K[9A[2K[9A[2K[5A[2K[9A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[9A[2K[5A[2K[8A[2K[8A[2K[8A[2K

What we did up to now is Create the dockerfile, build the image, and 

In [4]:
from sagemaker.processing import ScriptProcessor

# Create the ScriptProcessor configuration
script_processor = None  # Fillme

In [16]:
data_s3_location

's3://sagemaker-course-di/datasets/energy-train.csv'

In [21]:
from sagemaker.processing import ProcessingInput, ProcessingOutput

# Configure the input and output configuration to mounts the S3 files
script_processor.run(
    code="train_prophet.py",
    inputs=None,  # Fillme
    outputs=None  # Fillme
)
script_processor_job_description = script_processor.jobs[-1].describe()
print(script_processor_job_description)


Job Name:  sagemaker-processing-container-2021-09-07-17-54-05-689
Inputs:  [{'InputName': 'input-1', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-course-di/datasets/energy-train.csv', 'LocalPath': '/opt/ml/processing/input', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'code', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-253957294717/sagemaker-processing-container-2021-09-07-17-54-05-689/input/code/train_prophet.py', 'LocalPath': '/opt/ml/processing/input/code', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}]
Outputs:  [{'OutputName': 'Prophet.pkl', 'AppManaged': False, 'S3Output': {'S3Uri': 's3://sagemaker-us-east-1-253957294717/sagemaker-processing-container-2021-09-07-17-54-05-689/output/Prophet.pkl', 'LocalPath': '/opt/ml/processing/output/model', 'S3UploadMode': 'EndOfJob'}}, 