# Part 1: Packaging your Algorithm 

Amazon SageMaker allows you to package your own algorithms and train and host it on Sagemaker. Here we package a scikit-learn implementation of decision trees for use with SageMaker.

### Contents of the container
* **Dockerfile** describes how to build your Docker container image. 
* **build_and_push.sh** script uses the Dockerfile to build your container images and then pushes it to ECR.
* **decision_trees** contains the files that will be installed in the container.

### The files in the container are:

* **nginx.conf**,the configuration file for the nginx front-end.
* **predictor.py**,program implements the Flask web server and the decision tree predictions for this app.
* **serve** program starts when the container is started for hosting, launches the gunicorn server which runs multiple instances of the Flask app defined in predictor.py. 
* **train**, program that is invoked when the container is run for training.
* SageMaker will look to run an executable program named "train" for training and "serve" for hosting.
* Or you can specify any ENTRYPOINT in your Dockerfile which has train() and serve() functions defined within.
* **wsgi.py**,a small wrapper used to invoke the Flask app.

In [None]:
!cat container/Dockerfile

## Building and registering the container
* Build the container image using docker build 
* Push the container image to ECR using docker push. 
* Get the region defined in the current configuration (default to us-west-2 if none defined)
* Looks for an ECR repository in the current account and current default region. If the repository doesn't exist, the script will create it.
* Get the login command from ECR and execute it directly
* Build the docker image locally with the image name 
* Push it to ECR with the full name.
* On a SageMaker Notebook Instance, the docker daemon may need to be restarted in order to detect your network configuration correctly.(This is a known issue.)

In [1]:
%%sh

algorithm_name=decision-trees-sample

cd container

chmod +x decision_trees/train
chmod +x decision_trees/serve

account=$(aws sts get-caller-identity --query Account --output text)

region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

$(aws ecr get-login --region ${region} --no-include-email)

if [ -d "/home/ec2-user/SageMaker" ]; then
  sudo service docker restart
fi

docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

Login Succeeded
Stopping docker: [  OK  ]
Starting docker:	.[  OK  ]
Sending build context to Docker daemon  19.46kB
Step 1/9 : FROM ubuntu:16.04
 ---> 0458a4468cbc
Step 2/9 : MAINTAINER Amazon AI <sage-learner@amazon.com>
 ---> Using cache
 ---> 58e140ab7e46
Step 3/9 : RUN apt-get -y update && apt-get install -y --no-install-recommends          wget          python          nginx          ca-certificates     && rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> b4f7f7e8b16f
Step 4/9 : RUN wget https://bootstrap.pypa.io/get-pip.py && python get-pip.py &&     pip install numpy scipy scikit-learn pandas flask gevent gunicorn &&         (cd /usr/local/lib/python2.7/dist-packages/scipy/.libs; rm *; ln ../../numpy/.libs/* .) &&         rm -rf /root/.cache
 ---> Using cache
 ---> 7fea61c87b2d
Step 5/9 : ENV PYTHONUNBUFFERED TRUE
 ---> Using cache
 ---> fa6656aadb98
Step 6/9 : ENV PYTHONDONTWRITEBYTECODE TRUE
 ---> Using cache
 ---> b25daae16532
Step 7/9 : ENV PATH "/opt/program:${PATH}"

# Part 2: Training and Hosting the Algorithm

Once you have your container packaged, you can use it to train and serve models. 

### Import packages 
* os -  provides a portable way of using operating system dependent functionality.
* gmtime - Convert a time expressed in seconds since the epoch to a struct_time in UTC.
* strftime - Convert a tuple or struct_time representing a time as returned by gmtime()

In [2]:
import os
from time import gmtime, strftime

### Importing some standard python packages 
* csv- module provides objects to read and write sequences.
* itertools - module implements a number of iterator building blocks.
* numpy  - package for scientific computing with Python.
* pandas - library providing data structures and data analysis tools for Python.

In [3]:
import csv
import itertools
import numpy as np
import pandas as pd

### Importing amazon packages
* boto3 - The AWS SDK for Python to write software that uses Amazon services like S3 and EC2.
* psycopg2 - popular PostgreSQL database adapter for the Python
* sagemaker - Python SDK for training and deploying machine learning models on Amazon SageMaker.
* get_execution_role - Return the role ARN whose credentials are used to call the API.
* csv_serializer - Defines csv as the behavior for serialization of input data.

In [4]:
import boto3
import psycopg2
import sagemaker as sage
from sagemaker.predictor import csv_serializer
from sagemaker import get_execution_role

In [5]:
con=psycopg2.connect(dbname= 'loonydb1', host='myloony-db.cwwbfulhovv2.us-east-2.redshift.amazonaws.com', 
port= '5439', user= 'masteruser', password= 'Password123')

In [6]:
cur = con.cursor()

In [7]:
query="select * from public.irisdata ;"

In [8]:
cur.execute(query)

In [9]:
results = cur.fetchall()

In [10]:
fp = open('iris.csv','w')

In [11]:
c= csv.writer(fp, lineterminator='\n') 

In [12]:
for row in results:
    print (row)
    c.writerow(row)

('setosa', 5.1, 3.5, 1.4, 0.2)
('setosa', 4.7, 3.2, 1.3, 0.2)
('setosa', 5.0, 3.6, 1.4, 0.2)
('setosa', 4.6, 3.4, 1.4, 0.3)
('setosa', 4.4, 2.9, 1.4, 0.2)
('setosa', 5.4, 3.7, 1.5, 0.2)
('setosa', 4.8, 3.0, 1.4, 0.1)
('setosa', 5.8, 4.0, 1.2, 0.2)
('setosa', 5.4, 3.9, 1.3, 0.4)
('setosa', 5.7, 3.8, 1.7, 0.3)
('setosa', 5.4, 3.4, 1.7, 0.2)
('setosa', 4.6, 3.6, 1.0, 0.2)
('setosa', 4.8, 3.4, 1.9, 0.2)
('setosa', 5.0, 3.4, 1.6, 0.4)
('setosa', 5.2, 3.4, 1.4, 0.2)
('setosa', 4.8, 3.1, 1.6, 0.2)
('setosa', 5.2, 4.1, 1.5, 0.1)
('setosa', 4.9, 3.1, 1.5, 0.2)
('setosa', 5.5, 3.5, 1.3, 0.2)
('setosa', 4.4, 3.0, 1.3, 0.2)
('setosa', 5.0, 3.5, 1.3, 0.3)
('setosa', 4.4, 3.2, 1.3, 0.2)
('setosa', 5.1, 3.8, 1.9, 0.4)
('setosa', 5.1, 3.8, 1.6, 0.2)
('setosa', 5.3, 3.7, 1.5, 0.2)
('versicolor', 7.0, 3.2, 4.7, 1.4)
('versicolor', 6.9, 3.1, 4.9, 1.5)
('versicolor', 6.5, 2.8, 4.6, 1.5)
('versicolor', 6.3, 3.3, 4.7, 1.6)
('versicolor', 6.6, 2.9, 4.6, 1.3)
('versicolor', 5.0, 2.0, 3.5, 1.0)
('versicolor', 

In [None]:
fp.close()

## Upload the data for training
* Set the bucket path
* Create a sagemaker session
* Create a bucket and upload the training data.

In [23]:
prefix = 'scikit-byoc'

In [24]:
sess = sage.Session()

In [25]:
data_location = sess.upload_data('iris.csv', key_prefix=prefix)

## Hosting the model 
* Get the account and region information
* Get the container image
* Get the IAM role credentials
* Instantiate an estimator
* Invoke the fit method to train the model
* Deploy the model

In [26]:
account = sess.boto_session.client('sts').get_caller_identity()['Account']
account

'324118574079'

In [27]:
region = sess.boto_session.region_name
region

'us-east-2'

In [28]:
image = '{}.dkr.ecr.{}.amazonaws.com/decision-trees-sample'.format(account, region)

In [29]:
role = get_execution_role()
role

'arn:aws:iam::324118574079:role/service-role/AmazonSageMaker-ExecutionRole-20180209T192191'

In [30]:
tree = sage.estimator.Estimator(image,
                       role, 1, 'ml.c4.2xlarge',
                       output_path="s3://{}/output".format(sess.default_bucket()),
                       sagemaker_session=sess)

In [31]:
tree.fit(data_location)

INFO:sagemaker:Creating training-job with name: decision-trees-sample-2018-03-10-09-49-15-351


.......................................................
[31mStarting the training.[0m
[31mTraining complete.[0m
===== Job Complete =====


In [None]:
predictor = tree.deploy(1, 'ml.m4.xlarge', serializer=csv_serializer)

## Validate the model
* extract some of the data we used for training
* pass in the data to the predictor object

In [None]:
shape=pd.read_csv("iris.csv", header=None)

In [None]:
df = shape[50:110]

In [None]:
names = df[0].values.T.tolist()
names

In [None]:
test_X =df.drop(df.columns[0], axis=1) 

In [None]:
test_X

In [None]:
results = predictor.predict(test_X.values).decode('utf-8')
results=results.split()

In [None]:
print(results)

In [None]:
print (np.array(names) == np.array(results))

## Delete endpoint

In [None]:
sess.delete_endpoint(predictor.endpoint)