### The Dockerfile

The Dockerfile describes the image that we want to build. You can think of it as describing the complete operating system installation of the system that you want to run. A Docker container running is quite a bit lighter than a full operating system, however, because it takes advantage of Linux on the host machine for the basic operations. 

For the Python science stack, we will start from a standard Ubuntu installation and run the normal tools to install the things needed by scikit-learn. Finally, we add the code that implements our specific algorithm to the container and set up the right environment to run under.

Along the way, we clean up extra space. This makes the container smaller and faster to start.

Let's look at the Dockerfile:

In [1]:
!cat container/Dockerfile

# Build an image that can do training and inference in SageMaker
# This is a Python 2 image that uses the nginx, gunicorn, flask stack
# for serving inferences in a stable way.

FROM ubuntu:16.04

#RUN apt purge python2.7
RUN apt-get update
RUN apt-get install -y software-properties-common vim
RUN add-apt-repository ppa:jonathonf/python-3.6
RUN apt-get update

RUN apt-get install -y build-essential python3.6 python3.6-dev python3-pip python3.6-venv
RUN apt-get install -y git

# update pip
RUN python3.6 -m pip install pip --upgrade
RUN python3.6 -m pip install wheel

MAINTAINER Amazon AI <sage-learner@amazon.com>


RUN apt-get -y update && apt-get install -y --no-install-recommends \
         wget \
         python \
         nginx \
         ca-certificates \
    && rm -rf /var/lib/apt/lists/*

RUN touch ~/.bashrc
RUN echo "alias python=python3.6" >> ~/.bashrc

# Here we get all python packages.
# There's substantial overlap between scipy and numpy that

### Building and registering the container

The following shell code shows how to build the container image using `docker build` and push the container image to ECR using `docker push`.

This code looks for an ECR repository in the account you're using and the current default region (if you're using a SageMaker notebook instance, this will be the region where the notebook instance was created). If the repository doesn't exist, the script will create it.

Let's look at the shell script

In [2]:
!cat container/build_and_push.sh

#!/usr/bin/env bash

# This script shows how to build the Docker image and push it to ECR to be ready for use
# by SageMaker.

# The argument to this script is the image name. This will be used as the image on the local
# machine and combined with the account and region to form the repository name for ECR.
image=$1

if [ "$image" == "" ]
then
    echo "Usage: $0 <image-name>"
    exit 1
fi

chmod +x decision_trees/train
chmod +x decision_trees/serve

# Get the account number associated with the current IAM credentials
account=$(aws sts get-caller-identity --query Account --output text)

if [ $? -ne 0 ]
then
    exit 255
fi


# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-eu-west-1}


fullname="${account}.dkr.ecr.${region}.amazonaws.com/${image}:latest"

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names

In [35]:
!cd container && sh build_and_push.sh scikit-byo

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Sending build context to Docker daemon  46.59kB
Step 1/19 : FROM ubuntu:16.04
 ---> a3551444fc85
Step 2/19 : RUN apt-get update
 ---> Using cache
 ---> 0725f9d10fdc
Step 3/19 : RUN apt-get install -y software-properties-common vim
 ---> Using cache
 ---> 86aa5657868c
Step 4/19 : RUN add-apt-repository ppa:jonathonf/python-3.6
 ---> Using cache
 ---> 0bc0410820c0
Step 5/19 : RUN apt-get update
 ---> Using cache
 ---> 663ba85054c4
Step 6/19 : RUN apt-get install -y build-essential python3.6 python3.6-dev python3-pip python3.6-venv
 ---> Using cache
 ---> 6e75518194dc
Step 7/19 : RUN apt-get install -y git
 ---> Using cache
 ---> 7a1fe7c33b20
Step 8/19 : RUN python3.6 -m pip install pip --upgrade
 ---> Using cache
 ---> 4d91e62d1cf5
Step 9/19 : RUN python3.6 -m pip install wheel
 ---> Using cache
 ---> a228ff5b0e81
Step 10/19 : MAINTAINER Amazon AI <sage-learner@amazon.com>
 ---> Using cache
 --

In [36]:
!docker image ls

REPOSITORY                                                TAG                 IMAGE ID            CREATED             SIZE
scikit-byo                                                latest              07dc1ac932a6        About an hour ago   871MB
scikit_byo                                                latest              07dc1ac932a6        About an hour ago   871MB
326583654609.dkr.ecr.eu-west-1.amazonaws.com/scikit-byo   latest              07dc1ac932a6        About an hour ago   871MB
326583654609.dkr.ecr.eu-west-1.amazonaws.com/scikit_byo   latest              07dc1ac932a6        About an hour ago   871MB
326583654609.dkr.ecr.eu-west-1.amazonaws.com/scikit_byo   <none>              01938b5ba18f        About an hour ago   871MB
326583654609.dkr.ecr.eu-west-1.amazonaws.com/scikit_byo   <none>              ff85d764ca4c        About an hour ago   871MB
326583654609.dkr.ecr.eu-west-1.amazonaws.com/scikit_byo   <none>              d1146c9eb94f        About an hour ago   871MB
3

In [37]:
!cd container/local_test && sh train_local.sh scikit-byo

Starting the training.
Training complete.


In [38]:
!cd container/local_test && sh predict.sh payload.csv text/csv

*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> POST /invocations HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.60.0
> Accept: */*
> Content-Type: text/csv
> Content-Length: 2900
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< Server: nginx/1.10.3 (Ubuntu)
< Date: Wed, 15 May 2019 09:56:31 GMT
< Content-Type: text/csv; charset=utf-8
< Content-Length: 270
< Connection: keep-alive
< 
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
* Connection #0 to host localhost left intact


In [39]:
# S3 configurations
bucket = 'sagemakerimmersionday-ajitsr'
prefix = 'lab4-scikit-byo-iris'

# Define IAM role
import boto3
import re

import os
import numpy as np
import pandas as pd
from sagemaker import get_execution_role

role = get_execution_role()

In [40]:
import sagemaker as sage
from time import gmtime, strftime

sess = sage.Session()

WORK_DIRECTORY = 'data'
data_location = sess.upload_data(WORK_DIRECTORY, bucket=bucket, key_prefix=prefix)

In [42]:
account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/scikit-byo:latest'.format(account, region)
print(image)

tree = sage.estimator.Estimator(image,
				role, 1, 'ml.c4.2xlarge',
				output_path='s3://{}/output'.format(bucket, prefix),
				sagemaker_session=sess)

tree.fit(data_location)

326583654609.dkr.ecr.eu-west-1.amazonaws.com/scikit-byo:latest
2019-05-15 11:18:37 Starting - Starting the training job...
2019-05-15 11:18:37 Starting - Launching requested ML instances......
2019-05-15 11:19:38 Starting - Preparing the instances for training......
2019-05-15 11:20:59 Downloading - Downloading input data
2019-05-15 11:20:59 Training - Downloading the training image...
2019-05-15 11:21:36 Uploading - Uploading generated training model
[31mStarting the training.[0m
[31mTraining complete.[0m

2019-05-15 11:21:42 Completed - Training job completed
Billable seconds: 45


In [43]:
from sagemaker.predictor import csv_serializer
predictor = tree.deploy(1, 'ml.t2.large', serializer=csv_serializer)

---------------------------------------------------------------------------!

In [44]:
shape=pd.read_csv("data/iris.csv", header=None)

import itertools

a = [50*i for i in range(3)]
b = [40+i for i in range(10)]
indices = [i+j for i,j in itertools.product(a,b)]

test_data=shape.iloc[indices[:-1]]
test_X=test_data.iloc[:,1:]
test_y=test_data.iloc[:,0]

In [45]:
for i in range(len(test_X)):
	print('Expected:%s' % test_y.iloc[i])
	print('Predicted:%s' % predictor.predict(test_X.iloc[i].values).decode('utf-8'))

Expected:setosa
Predicted:setosa

Expected:setosa
Predicted:setosa

Expected:setosa
Predicted:setosa

Expected:setosa
Predicted:setosa

Expected:setosa
Predicted:setosa

Expected:setosa
Predicted:setosa

Expected:setosa
Predicted:setosa

Expected:setosa
Predicted:setosa

Expected:setosa
Predicted:setosa

Expected:setosa
Predicted:setosa

Expected:versicolor
Predicted:versicolor

Expected:versicolor
Predicted:versicolor

Expected:versicolor
Predicted:versicolor

Expected:versicolor
Predicted:versicolor

Expected:versicolor
Predicted:versicolor

Expected:versicolor
Predicted:versicolor

Expected:versicolor
Predicted:versicolor

Expected:versicolor
Predicted:versicolor

Expected:versicolor
Predicted:versicolor

Expected:versicolor
Predicted:versicolor

Expected:virginica
Predicted:virginica

Expected:virginica
Predicted:virginica

Expected:virginica
Predicted:virginica

Expected:virginica
Predicted:virginica

Expected:virginica
Predicted:virginica

Expected:virginica
Predicted:virginica



In [46]:
sess.delete_endpoint(predictor.endpoint)

In [47]:
python 

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



KeyboardInterrupt: 

In [49]:
import sys
print(sys.real_prefix)

AttributeError: module 'sys' has no attribute 'real_prefix'