# Building your own algorithm container


In [1]:
!cat container/Dockerfile

# Build an image that can do training and inference in SageMaker
# This is a Python 3 image that uses the nginx, gunicorn, flask stack
# for serving inferences in a stable way.

FROM python:3.10-slim-buster

MAINTAINER Amazon AI <sage-learner@amazon.com>


RUN apt-get -y update && apt-get install -y --no-install-recommends \
         wget \
         python3-pip \
         python3-setuptools \
         nginx \
         ca-certificates \
    && rm -rf /var/lib/apt/lists/*

RUN ln -s /usr/bin/python3 /usr/bin/python
RUN ln -s /usr/bin/pip3 /usr/bin/pip

# Here we get all python packages.
# There's substantial overlap between scipy and numpy that we eliminate by
# linking them together. Likewise, pip leaves the install caches populated which uses
# a significant amount of space. These optimizations save a fair amount of space in the
# image, which reduces start up time.
RUN pip --no-cache-dir install numpy==1.26.4 scipy==1.12.0 scikit-learn==1.4.1.post1 pandas==1.5

In [2]:
%%sh

# The name of our algorithm
algorithm_name=sagemaker-custom-spacy-new-training-job

cd container

chmod +x decision_trees/train
chmod +x decision_trees/serve

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${region}|docker login --username AWS --password-stdin ${fullname}

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



Login Succeeded
Sending build context to Docker daemon  57.34kB
Step 1/12 : FROM python:3.10-slim-buster
 ---> 93b9055430ce
Step 2/12 : MAINTAINER Amazon AI <sage-learner@amazon.com>
 ---> Using cache
 ---> 5a15fbf1c972
Step 3/12 : RUN apt-get -y update && apt-get install -y --no-install-recommends          wget          python3-pip          python3-setuptools          nginx          ca-certificates     && rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> bca9c2d558f4
Step 4/12 : RUN ln -s /usr/bin/python3 /usr/bin/python
 ---> Using cache
 ---> c3d8a1f7953b
Step 5/12 : RUN ln -s /usr/bin/pip3 /usr/bin/pip
 ---> Using cache
 ---> 8f9d358ac77d
Step 6/12 : RUN pip --no-cache-dir install numpy==1.26.4 scipy==1.12.0 scikit-learn==1.4.1.post1 pandas==1.5.3 xlsxwriter==3.2.0 openpyxl==3.1.2 boto3==1.34.69 flask gunicorn
 ---> Using cache
 ---> f436a4bf1a76
Step 7/12 : RUN pip install spacy==3.7.4
 ---> Using cache
 ---> 641b8738a939
Step 8/12 : ENV PYTHONUNBUFFERED=TRUE
 ---> Using cache
 

## Part 2: Using your Algorithm in Amazon SageMaker

Once you have your container packaged, you can use it to train models and use the model for hosting or batch transforms. Let's do that with the algorithm we made above.

## Set up the environment

Here we specify a bucket to use and the role that will be used for working with SageMaker.

In [3]:
# S3 prefix
prefix = "DEMO-Spacy-byoc"

# Define IAM role
import boto3
import re

import os
import numpy as np
import pandas as pd
from sagemaker import get_execution_role

role = get_execution_role()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


## Create the session

The session remembers our connection parameters to SageMaker. We'll use it to perform all of our SageMaker operations.

In [4]:
import sagemaker as sage
from time import gmtime, strftime

sess = sage.Session()

## Upload the data for training

When training large models with huge amounts of data, you'll typically use big data tools, like Amazon Athena, AWS Glue, or Amazon EMR, to create your data in S3. For the purposes of this example, we're using some the classic [Iris dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set), which we have included. 

We can use use the tools provided by the SageMaker Python SDK to upload the data to a default bucket. 

In [5]:
WORK_DIRECTORY = "data"

data_location = sess.upload_data(WORK_DIRECTORY, key_prefix=prefix)

In [6]:
data_location

's3://sagemaker-eu-west-2-661082688832/DEMO-Spacy-byoc'

## Create an estimator and fit the model

In order to use SageMaker to fit our algorithm, we'll create an `Estimator` that defines how to use the container to train. This includes the configuration we need to invoke SageMaker training:

* The __container name__. This is constructed as in the shell commands above.
* The __role__. As defined above.
* The __instance count__ which is the number of machines to use for training.
* The __instance type__ which is the type of machine to use for training.
* The __output path__ determines where the model artifact will be written.
* The __session__ is the SageMaker session object that we defined above.

Then we use fit() on the estimator to train against the data that we uploaded above.

In [7]:
account = sess.boto_session.client("sts").get_caller_identity()["Account"]
region = sess.boto_session.region_name
image = "{}.dkr.ecr.{}.amazonaws.com/sagemaker-custom-spacy-new-training-job:latest".format(account, region)

tree = sage.estimator.Estimator(
    image,
    role,
    1,
    "ml.c4.2xlarge",
    output_path="s3://{}/output".format(sess.default_bucket()),
    sagemaker_session=sess,
)

tree.fit(data_location)

INFO:sagemaker:Creating training-job with name: sagemaker-custom-spacy-new-training-job-2024-04-05-13-13-31-316


2024-04-05 13:13:31 Starting - Starting the training job...
2024-04-05 13:13:51 Starting - Preparing the instances for training......
2024-04-05 13:15:01 Downloading - Downloading the training image
2024-04-05 13:15:01 Training - Training image download completed. Training in progress...[34mStarting the training.[0m
[34mtrainning Parameeter:------ {}[0m
[34m#015  0%|          | 0/608 [00:00<?, ?it/s]#015100%|██████████| 608/608 [00:00<00:00, 18766.05it/s][0m
[34m*********** Starting*****************[0m
[34m22nd april 2020[0m
[34m597[0m
[34m611[0m
[34mIncepDate[0m
[34m*********** Ending*****************[0m
[34m*********** Starting*****************[0m
[34m1 december 2021[0m
[34m415[0m
[34m429[0m
[34mIncepDate[0m
[34m*********** Ending*****************[0m
[34m*********** Starting*****************[0m
[34m1 january 2021[0m
[34m381[0m
[34m394[0m
[34mIncepDate[0m
[34m*********** Ending*****************[0m
[34m*********** Starting*****************[0m

[34m  2     200        918.79   1966.97   89.52   85.45   94.00    0.90[0m
[34m  4     400         13.25     27.90   92.16   90.38   94.00    0.92[0m
[34m  6     600         85.12     30.17   93.33   89.09   98.00    0.93[0m
[34m  8     800         14.61     19.09   89.91   83.05   98.00    0.90[0m
[34m 10    1000         22.47     12.56   89.72   84.21   96.00    0.90[0m
[34m 12    1200          8.86      7.82   90.57   85.71   96.00    0.91[0m
[34m 14    1400         10.20      7.74   91.43   87.27   96.00    0.91[0m
[34m 16    1600          0.13      0.17   92.45   87.50   98.00    0.92[0m
[34m 18    1800          0.00      0.00   91.43   87.27   96.00    0.91[0m
[34m 20    2000          0.00      0.00   91.43   87.27   96.00    0.91[0m
[34m 22    2200          0.00      0.00   91.43   87.27   96.00    0.91[0m
[34m#033[38;5;2m✔ Saved pipeline to output directory#033[0m[0m
[34moutput/model-last[0m
[34mTraining complete.[0m

2024-04-05 13:17:33 Uploading - 

In [8]:
data_location

's3://sagemaker-eu-west-2-661082688832/DEMO-Spacy-byoc'