# SkLearn Script Mode + Bring Your Own Model

- [Documentation](https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/using_sklearn.html)
- Dataset: [Petrol Consumption](https://www.kaggle.com/harinir/petrol-consumption)

# Data Reading

In [47]:
import pandas as pd
import numpy as np

df = pd.read_csv("petrol_consumption.csv")
df.head()

Unnamed: 0,Petrol_tax,Average_income,Paved_Highways,Population_Driver_licence(%),Petrol_Consumption
0,9.0,3571,1976,0.525,541
1,9.0,4092,1250,0.572,524
2,9.0,3865,1586,0.58,561
3,7.5,4870,2351,0.529,414
4,8.0,4399,431,0.544,410


In [5]:
#Splitting data in 80-20 split to use testing data for model inference later
train = df.iloc[:35,:]
test = df.iloc[36:,:]

In [6]:
#Train and test csv
train.to_csv('train.csv', index=False)
test.to_csv('test.csv', index=False)

# Upload Data to S3

In [7]:
#Create a sagemaker session to be able to upload data to s3
import boto3
import sagemaker
sagemaker_session = sagemaker.Session()

#Uploading data to S3 bucket titled "tf-iris-data"
prefix = "sklearn-petrol-data"
training_input_path = sagemaker_session.upload_data('train.csv', key_prefix=prefix + '/training')

In [8]:
#verify data uploaded properly
training_data = pd.read_csv(training_input_path, sep = ',')
training_data.head()

Unnamed: 0,Petrol_tax,Average_income,Paved_Highways,Population_Driver_licence(%),Petrol_Consumption
0,9.0,3571,1976,0.525,541
1,9.0,4092,1250,0.572,524
2,9.0,3865,1586,0.58,561
3,7.5,4870,2351,0.529,414
4,8.0,4399,431,0.544,410


# Create Estimator

In [9]:
#Sagemaker role, make sure you've allowed access to any S3 bucket
role = sagemaker.get_execution_role()
role

'arn:aws:iam::474422712127:role/sagemaker-role-BYOC'

In [39]:
#Docs: https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/sagemaker.sklearn.html
from sagemaker.sklearn import SKLearn


sk_estimator = SKLearn(entry_point='train.py', 
                          role=role,
                          instance_count=1, 
                          instance_type='ml.c5.18xlarge',
                          py_version='py3',
                          framework_version='0.23-1',
                          script_mode=True,
                          hyperparameters={
                              'estimators': 20
                          }
                         )

#Training
sk_estimator.fit({'train': training_input_path})

2021-08-02 03:58:08 Starting - Starting the training job...
2021-08-02 03:58:31 Starting - Launching requested ML instancesProfilerReport-1627876688: InProgress
...
2021-08-02 03:59:01 Starting - Preparing the instances for training.........
2021-08-02 04:00:32 Downloading - Downloading input data...
2021-08-02 04:01:05 Training - Training image download completed. Training in progress.
2021-08-02 04:01:05 Uploading - Uploading generated training model.[34m2021-08-02 04:00:59,448 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training[0m
[34m2021-08-02 04:00:59,451 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-08-02 04:00:59,458 sagemaker_sklearn_container.training INFO     Invoking user training script.[0m
[34m2021-08-02 04:00:59,721 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-08-02 04:01:02,770 sagemaker-training-toolkit INFO     No GPUs detected 

# Endpoint Creation

In [40]:
#Creating endpoint and deploying model
import time
sk_endpoint_name = 'sklearn-rf-model'+time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
sk_predictor = sk_estimator.deploy(initial_instance_count=1,instance_type='ml.m5.4xlarge',
                                   endpoint_name=sk_endpoint_name)

---------------!

# Test Endpoint
- Can use [invoke endpoint](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime.html) or [predictor](https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/sagemaker.sklearn.html#scikit-learn-predictor), using invoke endpoint for this example. 
- For predictor make sure to [serialize](https://sagemaker.readthedocs.io/en/stable/api/inference/serializers.html) properly.

In [41]:
import boto3
client = boto3.client('sagemaker-runtime')
content_type = "application/json"
request_body = {"Input": [[9.0, 3571, 1976, .525]]}
endpoint_name = "sklearn-rf-model2021-08-02-04-02-29"
print(request_body)

{'Input': [[9.0, 3571, 1976, 0.525]]}


In [46]:
import json
data = json.loads(json.dumps(request_body))
payload = json.dumps(data)
response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType=content_type,
    Body=payload)
result = json.loads(response['Body'].read().decode())['Output']
result

555