# Building your own algorithm container

With Amazon SageMaker, you can package your own algorithms that can than be trained and deployed in the SageMaker environment. This notebook will guide you through an example that shows you how to build a Docker container for SageMaker and use it for training and inference.

_TODO: Insert TOC here_

## When should I build my own algorithm container?

## The example

## The presentation

This presentation is divided into two parts: building the container and using the container.

# Part 1: Packaging and Uploading your Algorithm for use with Amazon SageMaker 

### An overview of Docker

If you're familiar with Docker already, you can skip ahead to the next section.

### How Amazon SageMaker runs your Docker container during training

The container is run with the argument "train"

The container gets some special files:

TODO: Insert overview of file system here

### How Amazon SageMaker runs your Docker container during hosting

The container is run with the argument "serve". 



### The parts of the sample container

In the `container` directory are all the components you need to package the sample algorithm for Amazon SageMager:

    .
    ├── Dockerfile
    ├── build_and_push.sh
    └── decision_trees
        ├── nginx.conf
        ├── predictor.py
        ├── serve
        ├── train
        └── wsgi.py

Let's discuss each of these in turn:

* __`Dockerfile`__ describes how to build your Docker container image. More details below.
* __`build_and_push.sh`__ is a script that users the Dockerfile to build your container images and then pushes it to ECR. We'll invoke the commands directly later in this notebook, but you can just copy and run the script for your own algorithms.
* __`decision_trees`__ is the directory which contains the files that will be installed in the container.
* __`local_test`__ is a directory that shows how to test your new container on any computer that can run Docker, including an Amazon SageMaker notebook instance. Using this method, you can quickly iterate using small datasets to eliminate any structural bugs before you use the container with Amazon SageMaker. We'll walk through local testing later in this notebook.

In this simple application, we only install five files in the container. You may only need that many or, if you have many supporting routines, you may wish to install more. These five show the standard structure of our Python containers, although you are free to choose a different toolset and therefore could have a different layout. If you're writing in a different programming language, you'll certainly have a different layout depending on the frameworks and tools you choose.

The files that we'll put in the container are:

* __`nginx.conf`__ is the configuration file for the nginx front-end. Generally, you should be able to take this file as-is.
* __`predictor.py`__ is the program that actually implements the Flask web server and the decision tree predictions for this app. You'll want to customize the actual prediction parts to your application. Since this algorithm is simple, we do all the processing here in this file, but you may choose to have separate files for implementing your custom logic.
* __`serve`__ is the program started when the container is started for hosting. It simply launches the gunicorn server which runs multiple instances of the Flask app defined in `predictor.py`. You should be able to take this file as-is.
* __`train`__ is the program that is invoked when the container is run for training. You will modify this program to implement your training algorithm.
* __`wsgi.py`__ is a small wrapper used to invoke the Flask app. You should be able to take this file as-is.

In summary, the two files you will probably want to change for your application are `train` and `predictor.py`.




In [None]:
!cat container/Dockerfile

In [None]:
%%sh

# The name of our algorithm
algorithm_name=decision_trees_sample

cd container

#set -e # stop if anything fails

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.
docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

## Testing your algorithm on your local machine or on an Amazon SageMaker notebook instance

# Part 2: Training and Hosting your Algorithm in Amazon SageMaker

In [None]:
%%time

import os
import numpy as np
import pandas as pd

import sagemaker as sage
from time import gmtime, strftime

role='IMRole'

bucket='mead-testing' # customize to your bucket

## Upload the data for training

In [None]:
import boto3
WORK_DIRECTORY = "leaves"

def upload_file(bucket, prefix, filename):
    s3 = boto3.resource('s3')
    key='{}/{}'.format(prefix, filename)
    s3.Object(bucket, key).put(Body=open(os.path.join(WORK_DIRECTORY, filename), 'rb'))

prefix = 'scikit-leaves'

upload_file(bucket, prefix, 'data_Tex_64.txt')

data_location = 's3://{}/{}'.format(bucket, prefix)

In [None]:
sess = sage.Session()

In [None]:
tree = sage.estimator.Estimator('890154581112.dkr.ecr.us-west-2.amazonaws.com/decision_trees_sample:latest',
                       role, 1, 'ml.c4.2xlarge',
                       output_path="s3://{}/output".format(bucket),
                       sagemaker_session=sess)
tree.fit(data_location)

In [None]:
from sagemaker.predictor import csv_serializer
predictor = tree.deploy(1, 'ml.c4.xlarge', serializer=csv_serializer)

In [None]:
shape=pd.read_csv("leaves/data_Tex_64.txt", header=None)

import itertools

a = [16*i for i in range(100)]
b = [12+i for i in range(4)]
indices = [i+j for i,j in itertools.product(a,b)]

test_data=shape.ix[indices[:-1]]
test_X=test_data.ix[:,1:]
test_y=test_data.ix[:,0]

In [None]:
print(predictor.predict(test_X.values).decode('utf-8'))