## Invoke SageMaker Enpoint from outside of AWS environment using SageMaker SDK

Model used: XGBoost Bike Rental Prediction Trained in the XGBoost Lectures  
  
This example uses the IAM user: ml_user_predict. The user was setup in the housekeeping lecture of the course.  

Refer to the lecture: Configure IAM Users, Setup Command Line Interface (CLI)

Ensure xgboost-biketrain-v1 Endpoint is deployed before running this example  
  
To create an endpoint using SageMaker Console:  
1. Select "Models" under "Inference" in navigation pane
2. Search for model using this prefix: xgboost-biketrain-v1
3. Select the latest model and choose create endpoint
4. Specify endpoint name as: xgboost-biketrain-v1
5. Create a new endpoint configuration
6. Create a new endpoint
7. After this lab is completed, delete the endpoint to avoid unnecessary charges

In [1]:
# Install SageMaker 2.x version.
#!pip install --upgrade sagemaker

In [2]:
import boto3
import sagemaker
import math
import dateutil
import re
import numpy as np

# SDK 2 serializers and deserializers
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import JSONDeserializer

In [3]:
# Establish a session with AWS
# Specify credentials and region to be used for this session.
# We will use a ml_user_predict credentials that has limited privileges

#boto_session = boto3.Session(profile_name='ml_user_predict',region_name='us-east-1')

boto_session = boto3.Session()

In [4]:
type(boto_session)

boto3.session.Session

In [5]:
sess = sagemaker.Session(boto_session=boto_session)

In [6]:
# Create a predictor and point to an existing endpoint

# Get Predictor using SageMaker SDK
# Specify Your Endpoint Name
#endpoint_name = 'xgboost-biketrain-v1'
endpoint_name = 'linear-learner-2022-11-12-18-34-08-772'

predictor = sagemaker.predictor.Predictor(endpoint_name=endpoint_name,
                                                 sagemaker_session=sess)

In [7]:
# We are sending data for inference in CSV format
predictor.serializer = CSVSerializer()
predictor.deserializer = JSONDeserializer()
#predictor.deserializer = json_deserializer

In [8]:
#datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed
# Actual=562
sample_one = '2012-12-19 17:00:00,4,0,1,1,16.4,20.455,50,26.0027'
# Actual=569
sample_two = '2012-12-19 18:00:00,4,0,1,1,15.58,19.695,50,23.9994'
# Actual=4
sample_three = '2012-12-10 01:00:00,4,0,1,2,14.76,18.94,100,0'

In [9]:
# Raw Data Structure: 
# datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,casual,registered,count

# Model expects data in this format (it was trained with these features):
# season,holiday,workingday,weather,temp,atemp,humidity,windspeed,year,month,day,dayofweek,hour

def transform_data(data):
    features = data.split(',')
    
    # Extract year, month, day, dayofweek, hour
    dt = dateutil.parser.parse(features[0])

    features.append(str(dt.year))
    features.append(str(dt.month))
    features.append(str(dt.day))
    features.append(str(dt.weekday()))
    features.append(str(dt.hour))
    
    # Return the transformed data. skip datetime field
    return ','.join(features[1:])

In [10]:
print('Raw Data:\n',sample_one)
print('Transformed Data:\n',transform_data(sample_one))

Raw Data:
 2012-12-19 17:00:00,4,0,1,1,16.4,20.455,50,26.0027
Transformed Data:
 4,0,1,1,16.4,20.455,50,26.0027,2012,12,19,2,17


In [11]:
X_test = np.array([[ 4.5],
                   [ 3. ],
                   [11.5],
                   [ 2.9],
                   [ 9.5],
                   [ 5.3],
                   [ 9. ]])

In [12]:
X_test

array([[ 4.5],
       [ 3. ],
       [11.5],
       [ 2.9],
       [ 9.5],
       [ 5.3],
       [ 9. ]])

In [13]:
# Let's invoke prediction now
#predictor.predict(transform_data(sample_one))

result = predictor.predict(X_test)
result

{'predictions': [{'score': 62765.6953125},
  {'score': 48423.0625},
  {'score': 129697.984375},
  {'score': 47466.88671875},
  {'score': 110574.46875},
  {'score': 70415.1015625},
  {'score': 105793.59375}]}

In [14]:
# Since the result is in json format, we access the scores by iterating through the scores in the predictions

predictions = np.array([r['score'] for r in result['predictions']])

In [15]:
predictions

array([ 62765.6953125 ,  48423.0625    , 129697.984375  ,  47466.88671875,
       110574.46875   ,  70415.1015625 , 105793.59375   ])

In [18]:
# Don't forget to delete the endpoint
# From SageMaker Console, Select "Endpoints" under Inference and Delete the Endpoint