# Random predictions to SageMaker endpoint

**Note:** Please set kernel to `Python 3 (Data Science)`

---

🔥 Note: Before running this notebook, ensure that notebook 03 ran successfully and the updated files were committed to CodeCommit. Also, ensure the SageMaker Pipeline execution is successful and the staging endpoint shows `InService`. You can check the endpoint status in the SageMaker project page, under the Endpoints tab. 🔥

In the screenshot below, my project name is `BankDM` and the endpoint is `BankDM-staging`.

![endpoint](img/check-endpoint1.png)

---

## Introduction

Now that the model is trained and deployed, the next step is to use the model to do predictions. You will first connect to RedShift and retrieve the data. For simplicity, this demo uses data that is already stored in RedShift and the same data was used to do training/testing. In real world, the data used for prediction should not be seen before by the model. Before doing predictions, you need to do preprocessing of the data to ensure it is of the same format as the training data. 



### Variables
Variable name for secret in Secret Manager and SageMaker endpoint. RedShift, Athena and Glue information are stored in the secret.

In [None]:
secret_name='bankdm_redshift_login' 

# The endpoint name below is the default based on the SageMaker project name
# endpoint name is case sensitive
endpoint_name = 'BankDM-staging'

### Install and import libraries

In [None]:
!pip install -q SQLAlchemy==1.3.13
!pip install psycopg2-binary pyathena
!pip install -U pip
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from pyathena import connect
from botocore.exceptions import ClientError
import numpy as np
import pandas as pd
import time
import json
import boto3
import sagemaker
from sagemaker.serializers import CSVSerializer
import random
import math

### Create client session


In [None]:
# Get region 
session = boto3.session.Session()
region_name = session.region_name

# Get SageMaker session & default S3 bucket
sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket()

s3 = boto3.client('s3')
redshift = boto3.client('redshift')
secretsmanager = boto3.client('secretsmanager')

session = boto3.session.Session()
region = session.region_name

### Get credentials & connection information from Secret Manager

In [None]:
try:
    get_secret_value_response = secretsmanager.get_secret_value(
            SecretId=secret_name
        )
    secret_arn=get_secret_value_response['ARN']

except ClientError as e:
    print("Error retrieving secret. Error: " + e.response['Error']['Message'])
    
else:
    # Depending on whether the secret is a string or binary, one of these fields will be populated.
    if 'SecretString' in get_secret_value_response:
        secret = get_secret_value_response['SecretString']
    else:
        secret = base64.b64decode(get_secret_value_response['SecretBinary'])
            
secret_json = json.loads(secret)
master_user_name = secret_json['username']
master_user_pw = secret_json['password']
redshift_port = secret_json['port']
redshift_cluster_identifier = secret_json['dbClusterIdentifier']
redshift_endpoint_address = secret_json['host']

database_name_redshift = secret_json['database_name_redshift']
database_name_glue = secret_json['database_name_glue']

schema_redshift = secret_json['schema_redshift']
schema_athena = secret_json['schema_athena']

table_name_glue = secret_json['table_name_glue']
table_name_redshift = secret_json['table_name_redshift']

# print(master_user_name)

## RedShift

### Connect to RedShift

In [None]:
response = redshift.describe_clusters(ClusterIdentifier=redshift_cluster_identifier)
iam_role = response['Clusters'][0]['IamRoles'][0]['IamRoleArn']

engine = create_engine('postgresql://{}:{}@{}:{}/{}'.format(master_user_name, master_user_pw, redshift_endpoint_address, redshift_port, database_name_redshift))
session = sessionmaker()
session.configure(bind=engine)


### Get data from RedShift and shuffle it
As mentioned earlier, for simplicity, the data is the same as the one used for training/testing purpose.

In [None]:
statement = f"""
select *  
    FROM {schema_redshift}.{table_name_redshift} order by random()
"""

# print(statement)

data = pd.read_sql_query(statement, engine)
data.head()

### Pre-processing of data

In [None]:
data['no_previous_contact'] = np.where(data['pdays'] == 999, 1, 0)                                 # Indicator variable to capture when pdays takes a value of 999
data['not_working'] = np.where(np.in1d(data['job'], ['student', 'retired', 'unemployed']), 1, 0)   # Indicator for individuals not actively employed
model_data = pd.get_dummies(data)                                                                  # Convert categorical variables to sets of indicators

model_data = model_data.drop(['duration', 'emp_var_rate', 'cons_price_idx', 'cons_conf_idx', 'euribor3m'], axis=1)
df = pd.concat([model_data['y_yes'], model_data.drop(['y_no', 'y_yes'], axis=1)], axis=1)
df.columns

Extract out the target column to another dataframe and drop the column in the existing dataframe.

In [None]:
df_y_yes = df['y_yes']
df = df.drop(['y_yes'], axis=1)
df.columns

Check the target value dataframe is correct

In [None]:
df_y_yes.head()

Check the original dataframe

In [None]:
arr_test =df[df.columns[:]].values
arr_test

In [None]:
predictor = sagemaker.predictor.Predictor(endpoint_name=endpoint_name)
predictor.serializer = CSVSerializer()

Get a prediction

In [None]:
# Randomly select a user and predicts
i = random.randint(1, len(arr_test)) 
print(i)
pred = predictor.predict(arr_test[:][i])
pred

Check the actual result

In [None]:
# actual value
df_y_yes[i]

Compare if the pred and actual result are similar with a tolerance of 0.1

In [None]:
math.isclose(float(pred), df_y_yes[i], abs_tol=0.1)

---

## Next steps

Now that prediction using a SageMaker endpoint works, the next step is to use RedShift ML to do prediction as well.