# Advanced model for SageMaker inference

In this lab we will see how to build a model that can be used to make inference, using SageMaker.
So your goal is to use again a classifier from the previous lab (the **RandomForestClassifier**), build a model of it, and deploy it to some endpoint, which can be called to make **inferences**.

## Creation of the Model

Create the classifier that can be used by SageMaker to make inferences.
1. Use the **AM_Sagemaker_RF.py** as template ###immagino sarà fornito!
2. Add arguments to the **parser**:  
    a. **'--output-data-dir'**.  
    b. **'--model-dir'**.  
    c. **'--train'**.  
3. They must be of the exact **type** and by default have **environment variables** (the right ones).
4. Read the **input files** from the **train path** (a folder) that you will take from the **parser**.
5. Split the **train dataset** into **X** and **y** as you did in the previous labs.
6. Import **RandomForestClassifier** and build the model as you did in the previous lab.
7. Set the **hyperparameters** of the classifier with the **best hyperparameters** found in the previous lab.
8. **Fit** the model.
9. Save the model by using the **joblib** library (from **sklearn.externals**) in a file **model.joblib** into the **model_dir** directory.

At the end, you have to define four important functions in order to use the model:  

#### 1. def model_fn(model_dir):
This function is used to load the **model.joblib** created before.  
Load the model and return it.  

#### 2. def input_fn(request_body, request_content_type):
This function maps the content of the inference request.  
Every single inference is separated by a **'\n' (LF)**.  
Each value of the feature is separated by a **',' (comma)**.  
Each value of the feature must be a **float**.  
At the end you have to return a **numpy array** with a shape like the following:  

        [[7.3,0.59,0.26,1.8,0.084,51.0,0.7,9.4,16.0,3.16,0.9969,0,1],
        [7.0,0.5,0.25,2.0,0.07,22.0,0.63,9.2,3.0,3.25,0.9963,0,1],
        [7.6,0.59,0.06,2.5,0.079,10.0,0.56,9.8,5.0,3.39,0.9967,0,0]]

A "list" of inferences with each inference with a list of values, one for each feature.  

#### 3. def predict_fn(input_data, model):
Simply return the model taht predict the input_data.  

#### 4. def output_fn(prediction, content_type):   
Get the results of the inferences.  
Map values: 0 - bad and 1 - good.  
Return a single string with each result separated with a **\n (LF)**.  

If you need any suggestion you can found many examples here:  
https://github.com/aws/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk

## SKLearn

Now you can model your classifier using **sagemaker.sklearn**:  
https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/sagemaker.sklearn.html

1. From **sagemaker** import **get_execution_role** and retrieve the **role**.
2. Prepare the train path: this must **S3 URI** to the **tmp_train** folder created before.
3. As **ENTRY_POINT** use **'AM_Sagemaker_RF.py'** (this is the python file in which you have created the classifier).
4. As **FRAMEWORK_VERSION** use **'0.20.0'**.
5. As **INSTANCE_TYPE** use **'ml.m4.xlarge'**.
6. Create a **SKLearn estimator** with these parameters and fit it by passing the path of the train set. This operation will take a few minutes.

In [1]:
import os
import boto3 
#import sagemaker

import pandas as pd

from sagemaker import get_execution_role
from sagemaker.sklearn.estimator import SKLearn

#Change bucket name
bucket = 'sagemaker-diamonds-dataset/'
#Path to folder of the csv train file
input_path = 'diamond/input/data/train/'

#sess = sagemaker.Session()
role = get_execution_role()

#Ultimate path to the folder of the csv train file
train_path = os.path.join('s3://', bucket, input_path)

print('The train path is:')
print(train_path)

#Entrypoint of the SKLearn Model
ENTRY_POINT = './scripts/AM_Sagemaker_RF.py'
#Use this FRAMEWORK VERSION
FRAMEWORK_VERSION = '0.20.0'
#Use this instance type
INSTANCE_TYPE = 'ml.m4.xlarge'

sklearn_estimator = SKLearn(entry_point=ENTRY_POINT,
                            framework_version=FRAMEWORK_VERSION,
                            instance_type=INSTANCE_TYPE,
                            role=role)

#To the estimator you have to pass the path of the folder that contains train file(s)
sklearn_estimator.fit({'train': train_path})

The train path is:
s3://sagemaker-diamonds-dataset/diamond/input/data/train/
2021-07-07 06:07:43 Starting - Starting the training job...
2021-07-07 06:08:07 Starting - Launching requested ML instancesProfilerReport-1625638062: InProgress
...
2021-07-07 06:08:36 Starting - Preparing the instances for training.........
2021-07-07 06:10:08 Downloading - Downloading input data...
2021-07-07 06:10:36 Training - Downloading the training image...
2021-07-07 06:11:08 Training - Training image download completed. Training in progress.[34m2021-07-07 06:10:58,057 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training[0m
[34m2021-07-07 06:10:58,060 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-07-07 06:10:58,071 sagemaker_sklearn_container.training INFO     Invoking user training script.[0m
[34m2021-07-07 06:10:58,442 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m202

## Deployment of the model

1. Deploy the SKLearn model created before. 
2. As **INSTANCE_TYPE** use **'ml.m4.xlarge'**.
3. This operation requires about 10 minutes.
4. Save the **predictor** returned by the **deploy** function.

In [2]:
import time

#Use this instance type
INSTANCE_TYPE = 'ml.m4.xlarge'

start = time.time()

#Deploy the model to the endpoint
#It requires about 10 minutes.
# questa funzione è usata per deployare il nostro servizio!
predictor = sklearn_estimator.deploy(initial_instance_count=1,
                                     instance_type=INSTANCE_TYPE)

done = time.time()
elapsed = done - start
print('\nTime passed: ' + str(elapsed))

-------------!
Time passed: 391.9823639392853


## Inference the endpoint

Now you can inference the endpoint!
To get started, try doing it here.
1. Get SageMaker client by using **boto3**.
2. Invoke the endpoint by passing **'text/csv'** as **content_type** and the **name of the endpoint** just created as **EndpointName** (you can retrieve it from SageMaker Console).
3. You can use **tmp_test.csv** rows to make the inference (but you have to format them in the right way) or build your own.
4. Print the **prediction**!

In [3]:
import boto3
import pandas as pd
import numpy as np

#Load test data
test = pd.read_csv('./datasets/TEST_TO_SHARE.CSV')
test['cut'] = test['cut'].astype('category')
test['color'] = test['color'].astype('category')
test['clarity'] = test['clarity'].astype('category')
test = pd.get_dummies(test, columns=['cut','color','clarity'])
test = test.drop(columns='carat_class')
test = test.fillna(0)

#Create request body to inference model
request_body = ""

#Inference first 10 elements
for i in range(50) :
    request_body += ",".join([str(n) for n in test.loc[i]]) + "\n"
request_body = request_body[:-1]
print(request_body)

#Create sagemaker client using boto3
client = boto3.client('sagemaker-runtime')

#Specify endpoint name of the model that you have deployed
# si trova in Amazon Sagemaker -> processi di addestramento -> è il nome del processo
# sagemaker-scikit-learn-2021-05-25-10-13-27-769
ENDPOINT_NAME = 'sagemaker-scikit-learn-2021-07-06-17-55-43-693' # cambia per ogni studente! 

#Specify content type
CONTENT_TYPE = 'text/csv'

#Call the endpoint, inference the model
response = client.invoke_endpoint(EndpointName=ENDPOINT_NAME,
                                  ContentType=CONTENT_TYPE,
                                  Body=request_body)

#Print out expected and returned labels
print("Predicted value:")
print(response['Body'].read().decode())

57.86416954016789,60.87701497949929,312.0339837292398,5.12911162680965,4.053713659652848,2.357029681593428,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
64.60854967190642,54.823918402462596,321.45349594722285,5.225114298353891,4.298208827919106,2.687131344094826,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
63.5156100305875,56.243781881978684,329.00679741607865,4.946554719107646,3.988157308334861,2.388179521091394,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
60.5215777364182,61.48193099864292,322.0795633836433,4.89261127392537,3.802079226294389,2.2907111002555154,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
60.880892867195705,57.88615433027361,318.5356953194801,5.500656339638785,4.42500403890553,2.8258928152588285,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
64.74730753223605,56.417584445912276,337.2818841272482,5.2

ValidationError: An error occurred (ValidationError) when calling the InvokeEndpoint operation: Endpoint sagemaker-scikit-learn-2021-07-06-17-55-43-693 of account 786148629435 not found.

## Close the endpoint

Call **delete_endpoint()** function if you no longer need to use the endpoint in order to avoid waste of resources (and money).

In [None]:
#Clean up, delete endpoint if you don't use it anymore
# si trova in Amazon Sagemaker -> processi di addestramento -> è il nome del processo
# sagemaker-scikit-learn-2021-05-25-10-13-27-769
predictor.delete_endpoint()