# SageBuild Tutorial

This notebook will walk you through on how to use Sagebuild to build and deploy custom models on-demand or in response to events. We will reuse the code from the "scikit_bring_your_own" example notebook.

## Helpfull Links
* [Blog Post]() to see the details of how SageBuild works. 
* [See here](/notebooks/sample-notebooks/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb) for details of how to write Dockerfiles for your own algorithms.

## Table of Contents
1. [Setup](#SetUp)
2. [Deploy](#Deploy)
3. [Wait](#Wait)
4. [Use](#Use)
5. [Conclusion](#Conclusion)

## SetUp <a name="SetUp"></a>
The following sets up the packages and variables we need. Note, the region and StackName variables have been filled in for you by the cloudformation template.

In [None]:
import boto3
import json
from subprocess import check_output as run
from subprocess import STDOUT
from time import sleep
import numpy as np
import pandas as pd
from io import StringIO

cf = boto3.client('cloudformation')
sns = boto3.client('sns')
step = boto3.client('stepfunctions')
s3 = boto3.resource('s3')
ssm = boto3.client('ssm')
sagemaker = boto3.client('sagemaker-runtime')
Lambda=boto3.client('lambda')

region='${AWS::Region}'
StackName='${AWS::StackName}'
data='../../sample-notebooks/advanced_functionality/scikit_bring_your_own/data/iris.csv'

#Get outputs from build stack
result=cf.describe_stacks(
    StackName=StackName
)
#Put Outputs in a dict for easy use
outputs={}
for output in result['Stacks'][0]['Outputs']:
    outputs[output['OutputKey']]=output['OutputValue']
print("Stack Outputs")
print(json.dumps(outputs,indent=4))

The follow shell commands will configure git to be able to access AWS CodeCommit and clone down the example repo. 

In [None]:
#configure git to be able to access CodeCommit,uses SageMaker Instance's role for permissions.
!git config --global credential.helper '!aws codecommit credential-helper $@'
!git config --global credential.UseHttpPath true

#clone down our example code
!git clone https://github.com/C24IO/aws-sagemaker-build.git


## configuration

Both the training-job and endpoint have various configuration parameters. The build generates those parameters by calling two lambda functions with the current build state. The CloudFormation template initializes these lambdas with responable defaults but if you want to edit these, use different instances types, add more data channels, or use hyper parameters then you will need to change/update the function code.

- The Dockfile path lambdas output the path to the directory containing the Dockerfile for the images in the code repository

- The training lambda must output an object that matchs the input params for the create training job function in AWS js sdk. see [here](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/SageMaker.html#createTrainingJob-property)

- The endpoint lambda must output an object that matchs the input params for the create endpoint config function in AWS js sdk. see [here](createEndpointConfig)



you can look at the code in the lambda functions in the console to get an idea of were to start.


In [None]:
import zipfile
import io
#creates a lambda deployment zip of python script
def getZip(name):
    buffer=io.BytesIO()
    with zipfile.ZipFile(buffer, mode='w') as zf:
        zf.write(f"./aws-sagemaker-build/example/config/{name}.py",arcname="index.py") 
    return buffer.getvalue()

#creates a lambda deployment zip of a string
def zipString(text):
    buffer=io.BytesIO()
    info = zipfile.ZipInfo("index.py")
    info.external_attr=0o777 << 16 
    with zipfile.ZipFile(buffer, mode='w') as zf:
        zf.writestr(info,text) 
    return buffer.getvalue()

#Update lambda with code in a string
Lambda.update_function_code(
    FunctionName=outputs['TrainingDockerfilePathLambda'],
    ZipFile=zipString("""
def handler(event,context):
    return "example/train"    
""")
);
print("Training Dockerfile path Lambda Updated")

Lambda.update_function_code(
    FunctionName=outputs['InferenceDockerfilePathLambda'],
    ZipFile=zipString("""
def handler(event,context):
    return "example/inference"    
""")
);
print("Inference Dockerfile path Lambda Updated")

#Update lambda with code in a file
Lambda.update_function_code(
    FunctionName=outputs['TrainingConfigLambda'],
    ZipFile=getZip("training")
);
print("Training Config Lambda Updated")

Lambda.update_function_code(
    FunctionName=outputs['EndpointConfigLambda'],
    ZipFile=getZip("endpoint")
);
print("Endpoint Config Lambda Updated")

## Deploy! <a name="Deploy"></a>
The following will 
- add the CodeCommit repo created by the cloudformation template as a remote named deploy
- push example code to repo (will trigger a build)
- upload our data to the DataBucket created by the Cloudformation template (will trigger a build)

Once a build has started no new build can be started till the first one finishes

In [None]:
#push our Dockerfile code to the "deploy" CodeCommit repo
run("cd aws-sagemaker-build && git remote add deploy {0}; git push deploy master".format(outputs['RepoUrl']),
    stderr=STDOUT,
    shell=True) 
print("code Pushed")

#upload the data to the DataBucket
object = s3.Object(outputs["DataBucket"],'train/data.csv')
object.upload_file(data) 
print("data uploaded")

You can also trigger a build by publishing to the launch topic directly

In [None]:
result=sns.publish(
    TopicArn=outputs['LaunchTopic'],
    Message="start" #message is not important, just publishing to topic starts build
)
print("message published")

## Wait <a name="Wait"></a>


You can use the following code to get a notification 

In [None]:
result=sns.subscribe(
    TopicArn=outputs['TrainStatusTopic'],
    Protocol="SMS",
    Endpoint="x-xxx-xxx-xxxx" #put your phone number here
)
print("subscribed to topic")

We can get the status of StateMachine as it builds and deploys our custom model. We can then setup a some code to wait for our build to complete

In [None]:
%%time 
#list all executions for our StateMachine to get our current running one
result=step.list_executions(
    stateMachineArn=outputs['StateMachine'],
    statusFilter="RUNNING"
)['executions']

if len(result) > 0:
    response = step.describe_execution(
        executionArn=result[0]['executionArn']
    )
    status=response['status']
    print(status,response['name'])
    #poll status till execution finishes
    while status == "RUNNING":
        print('.',end="")
        sleep(5)
        status=step.describe_execution(executionArn=result[0]['executionArn'])['status']
    print()
    print(status)
else:
    print("no running tasks")


## Use <a name="Use"></a>
Next we get some data and send to our newly deployed endpoint!

In [None]:
%%time 
test_data=pd.read_csv(data, header=None).sample(10)
test_X=test_data.iloc[:,1:]
test_y=test_data.iloc[:,0]

#convert test_X to csv
Body=str.encode(test_X.to_csv(header=False,index=False))

result=sagemaker.invoke_endpoint(
    EndpointName=outputs['SageMakerEndpoint'],
    Body=Body,    
    ContentType="text/csv",
    Accept="text/csv"
)

print(pd.read_csv(StringIO(result['Body'].read().decode('utf-8')),header=None))

## Conclusion <a name="Conclusion"></a>

Hopefully SageBuild can help you develop and deploy SageMaker custom models faster and easier. If you have any problems please lets us none in our github issues [here](). Feel free to send us pull request!