# Simple First SageMaker Tutorial - Hello World

This tutorial is based on the tutorial here
https://aws.amazon.com/getting-started/hands-on/build-train-deploy-machine-learning-model-sagemaker/


In this tutorial, you will:

1. Create a notebook instance
2. Prepare the data
3. Train the model to learn from the data
4. Deploy the model
5. Add Arize Tracking
6. Evaluate your ML model's performance


<a href="http://aws.amazon.com/getting-started/hands-on/build-train-deploy-machine-learning-model-sagemaker/"> <img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/step1.png"> </a>

<a href="http://aws.amazon.com/getting-started/hands-on/build-train-deploy-machine-learning-model-sagemaker/"> <img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/step2.png"> </a>

<a href="http://aws.amazon.com/getting-started/hands-on/build-train-deploy-machine-learning-model-sagemaker/"> <img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/step2b.png"> </a>



<a href="http://aws.amazon.com/getting-started/hands-on/build-train-deploy-machine-learning-model-sagemaker/"> <img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/step2c.png"> </a>

<a href="http://aws.amazon.com/getting-started/hands-on/build-train-deploy-machine-learning-model-sagemaker/"> <img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/step2d.png"> </a>

<a href="http://aws.amazon.com/getting-started/hands-on/build-train-deploy-machine-learning-model-sagemaker/"> <img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/step2e.png"> </a>

<a href="http://aws.amazon.com/getting-started/hands-on/build-train-deploy-machine-learning-model-sagemaker/"> <img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/step3.png"> </a>

<a href="http://aws.amazon.com/getting-started/hands-on/build-train-deploy-machine-learning-model-sagemaker/"> <img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/step3b.png"> </a>

 <img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/step3c.png"> 

In [None]:
# import libraries
import boto3, re, sys, math, json, os, sagemaker, urllib.request
from sagemaker import get_execution_role
import numpy as np                                
import pandas as pd                               
import matplotlib.pyplot as plt                   
from IPython.display import Image                 
from IPython.display import display               
from time import gmtime, strftime                 
from sagemaker.predictor import csv_serializer   

# Define IAM role
role = get_execution_role()
prefix = 'sagemaker/DEMO-xgboost-dm'
containers = {'us-west-2': '433757028032.dkr.ecr.us-west-2.amazonaws.com/xgboost:latest',
              'us-east-1': '811284229777.dkr.ecr.us-east-1.amazonaws.com/xgboost:latest',
              'us-east-2': '825641698319.dkr.ecr.us-east-2.amazonaws.com/xgboost:latest',
              'eu-west-1': '685385470294.dkr.ecr.eu-west-1.amazonaws.com/xgboost:latest'} # each region has its XGBoost container
my_region = boto3.session.Session().region_name # set the region of the instance
print("Success - the MySageMakerInstance is in the " + my_region + " region. You will use the " + containers[my_region] + " container for your SageMaker endpoint.")



<img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/step3d.png">

In [None]:
bucket_name = 'jlopatec-s3-bucket-1' # <--- CHANGE THIS VARIABLE TO A UNIQUE NAME FOR YOUR BUCKET
s3 = boto3.resource('s3')
try:
    if  my_region == 'us-east-1':
      s3.create_bucket(Bucket=bucket_name)
    else: 
      s3.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={ 'LocationConstraint': my_region })
    print('S3 bucket created successfully')
except Exception as e:
    print('S3 error: ',e)

 <img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/step3e.png"> 

In [None]:
try:
  urllib.request.urlretrieve ("https://d1.awsstatic.com/tmt/build-train-deploy-machine-learning-model-sagemaker/bank_clean.27f01fbbdf43271788427f3682996ae29ceca05d.csv", "bank_clean.csv")
  print('Success: downloaded bank_clean.csv.')
except Exception as e:
  print('Data load error: ',e)

try:
  model_data = pd.read_csv('./bank_clean.csv',index_col=0)
  print('Success: Data loaded into dataframe.')
except Exception as e:
    print('Data load error: ',e)
    

<img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/step3f.png">

In [None]:
train_data, test_data = np.split(model_data.sample(frac=1, random_state=1729), [int(0.7 * len(model_data))])
print(train_data.shape, test_data.shape)

<img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/step4.png">

In [None]:
pd.concat([train_data['y_yes'], train_data.drop(['y_no', 'y_yes'], axis=1)], axis=1).to_csv('train.csv', index=False, header=False)
boto3.Session().resource('s3').Bucket(bucket_name).Object(os.path.join(prefix, 'train/train.csv')).upload_file('train.csv')
s3_input_train = sagemaker.s3_input(s3_data='s3://{}/{}/train'.format(bucket_name, prefix), content_type='csv')

<img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/step4b.png">

In [None]:
sess = sagemaker.Session()
xgb = sagemaker.estimator.Estimator(containers[my_region],role, train_instance_count=1, train_instance_type='ml.m4.xlarge',output_path='s3://{}/{}/output'.format(bucket_name, prefix),sagemaker_session=sess)
xgb.set_hyperparameters(max_depth=5,eta=0.2,gamma=4,min_child_weight=6,subsample=0.8,silent=0,objective='binary:logistic',num_round=100)

In [None]:
p_dat = pd.concat([train_data['y_yes'], train_data.drop(['y_no', 'y_yes'], axis=1)], axis=1)

In [None]:
print(p_dat)

<img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/step4c.png">

In [None]:
xgb.fit({'train': s3_input_train})

<img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/step5.png">

<img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/step5a.png">

In [None]:
xgb_predictor = xgb.deploy(initial_instance_count=1,instance_type='ml.m4.xlarge')

<img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/step5b.png">

In [None]:
test_data_array = test_data.drop(['y_no', 'y_yes'], axis=1).values #load the data into an array
xgb_predictor.content_type = 'text/csv' # set the data type for an inference
xgb_predictor.serializer = csv_serializer # set the serializer type
predictions = xgb_predictor.predict(test_data_array).decode('utf-8') # predict!
predictions_array = np.fromstring(predictions[1:], sep=',') # and turn the prediction into an array
print(predictions_array.shape)


In [None]:
print(predictions_array)


In [None]:
print(test_data_array)

In [None]:
print(test_data.columns)


At this point we will install the Arize package for use in Jupyter.
https://github.com/Arize-ai/client_python

<img src="https://storage.googleapis.com/arize-assets/tutorials/sagemaker/hello_world/arize.png">

In [None]:
!pip3 install arize

In [None]:
from arize.api import Client


In [None]:
API_KEY_STRING = 'zhmXurtoizUt8jidhJMQ'

In [None]:
API_KEY = os.environ.get(API_KEY_STRING)

In [None]:
arize = Client(space_key='space_key', api_key=API_KEY_STRING)


TEST LOG TO ARIZE

In [None]:
arize.log(model_id='sage-maker-hello-world-1', model_version='v0.1', prediction_ids='plED4eERDCasd9797ca3512', prediction_labels=prediction_val, features=labels, actual_labels=None)

In [None]:
column_names = test_data.drop(['y_no', 'y_yes'], axis=1).columns

HELPER FUNCTION FOR BULK UPDATES TO ARIZE

In [None]:
def map_labels(pred_num, column_names, data_array ):
    labels = {}
    for i,name in enumerate(column_names):
        labels[name] = str(data_array[pred_num][i])
    return labels

In [None]:
labels = map_labels(1, column_names, test_data_array)

The map_labels function creates a set of labels that look like this -

In [None]:
labels

## Now we log the full data set to Arize
Lets log the full set below

In [None]:
for pred_num in range(len(test_data_array)):
    prediction_id='testset_' + str(pred_num)
    prediction_value = str(predictions_array[pred_num])
    labels = map_labels(pred_num, column_names, test_data_array)
    arize.log(model_id='sage-maker-hello-world-1', model_version='v0.2', prediction_ids=prediction_id, prediction_labels=prediction_value, features=labels, actual_labels=None)
    

### Next log into the platform to get analytics on the model

http://www.arize.com