<div align="center" dir="auto">
<p dir="auto">

<a href="https://colab.research.google.com/github/write-with-neurl/modelbit-articles/blob/main/modelbit-02/code/SageMaker_Sample_Deployment_Vs_Modelbit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

</p>

## 🚀 Comparative Analysis: Deploying a model with Modelbit vs with SageMaker

In [None]:
"""import the necessary libraries"""

import os
import sys
import json
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker import image_uris
import pandas as pd
from sklearn.model_selection import train_test_split
import numpy as np

Create your **AWS SageMaker session** and initialize the **IAM execution role**:

In [None]:
session=sagemaker.Session()
role=get_execution_role()

Download the data from your S3 location. For this, you can use the AWS SDK for Python and Boto3:

In [None]:
s3=boto3.client("s3")
s3.download_file('sagemaker-poc-bucket-12345', 'diabetes_data.csv', 'diabetes_data.csv')

In the code snippet provided, the objective is to retrieve the `diabetes_data.csv` file from the "**SageMaker-poc-bucket-12345**" bucket and save it locally with the same name, "diabetes_data.csv."

## 🛢️ Loading The Dataset

Load your dataset into the dev environment using Pandas:

In [None]:
diabetes_dataset=pd.read_csv("diabetes_data.csv")
diabetes_dataset.head()

Unnamed: 0,Diabetes_binary,HighBP,HighChol,CholCheck,BMI,Smoker,Stroke,HeartDiseaseorAttack,PhysActivity,Fruits,Veggies,HvyAlcoholConsump,AnyHealthcare,NoDocbcCost,GenHlth,MentHlth,PhysHlth,DiffWalk,Sex,Age,Education,Income
0,0.0,1.0,0.0,1.0,26.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,3.0,5.0,30.0,0.0,1.0,4.0,6.0,8.0
1,0.0,1.0,1.0,1.0,26.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,3.0,0.0,0.0,0.0,1.0,12.0,6.0,8.0
2,0.0,0.0,0.0,1.0,26.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,10.0,0.0,1.0,13.0,6.0,8.0
3,0.0,1.0,1.0,1.0,28.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,3.0,0.0,3.0,0.0,1.0,11.0,6.0,8.0
4,0.0,0.0,0.0,1.0,29.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,2.0,0.0,0.0,0.0,0.0,8.0,5.0,8.0


Select the relevant columns from the dataset and eliminate any rows with missing values.

In [None]:
# Select specific columns
diabetes_selected = diabetes_dataset[['HighBP','HighChol','Smoker','Age','Sex','BMI','Fruits','HvyAlcoholConsump','HeartDiseaseorAttack','PhysActivity','Diabetes_binary']].dropna()


Partition the data into three distinct subsets: training, validation, and test sets. These subsets will be saved as separate CSV files in your local environment:

In [None]:
train,validation,test=np.split(diabetes_selected.sample(frac=1,random_state=52),[int(0.7*len(diabetes_dataset)),int(0.9*len(diabetes_dataset))])

train.to_csv("train.csv",index=False,header=False)
validation.to_csv("validation.csv",index=False,header=False)
test.to_csv("test.csv",index=False,header=False)

## 👟 Training the Model

In [None]:
container=image_uris.retrieve("xgboost",region="us-east-1",version="latest")

Amazon SageMaker provides a default S3 bucket to access using `SageMaker.Session().default_bucket()`. To streamline the process, use the following code block to upload the CSV files you downloaded locally in your Jupyter instance to this default bucket. This step is essential for making the data accessible within the SageMaker environment.

In [None]:
bucket=sagemaker.Session().default_bucket()
boto3.Session().resource("s3").Bucket(bucket).Object("train/train.csv").upload_file("train.csv")
boto3.Session().resource("s3").Bucket(bucket).Object("validation/validation.csv").upload_file("validation.csv")
boto3.Session().resource("s3").Bucket(bucket).Object("test/test.csv").upload_file("test.csv")

INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole


In [None]:
input_train=sagemaker.inputs.TrainingInput(s3_data=f"s3://{bucket}/train",content_type="csv")
input_validation=sagemaker.inputs.TrainingInput(s3_data=f"s3://{bucket}/validation",content_type="csv")

With the data successfully uploaded to the default S3 bucket, the next step is to train your model and fine-tune the parameters of XGBoost.

In [None]:
xgb=sagemaker.estimator.Estimator(container,role,instance_count=1,instance_type="ml.m4.xlarge",
                                 output_path=f"s3://{bucket}/output",sagemaker_session=session)
xgb.set_hyperparameters(max_depth=3,
                       eta=0.2,
                       gamma=5,
                       min_child_weight=5,
                       sub_sample=0.8,
                       silent=0,
                       objective="binary:logistic",
                       num_round=100)



In [None]:
xgb.fit({"train":input_train,"validation":input_validation})

INFO:sagemaker:Creating training-job with name: xgboost-2023-10-05-13-26-14-627


2023-10-05 13:26:14 Starting - Starting the training job...
2023-10-05 13:26:39 Starting - Preparing the instances for training.........
2023-10-05 13:27:52 Downloading - Downloading input data...
2023-10-05 13:28:22 Training - Downloading the training image...
2023-10-05 13:29:13 Training - Training image download completed. Training in progress...[34mArguments: train[0m
[34m[2023-10-05:13:29:24:INFO] Running standalone xgboost training.[0m
[34m[2023-10-05:13:29:24:INFO] File size need to be processed in the node: 2.76mb. Available memory size in the node: 8536.14mb[0m
[34m[2023-10-05:13:29:24:INFO] Determined delimiter of CSV input is ','[0m
[34m[13:29:24] S3DistributionType set as FullyReplicated[0m
[34m[13:29:24] 49484x10 matrix with 494840 entries loaded from /opt/ml/input/data/train?format=csv&label_column=0&delimiter=,[0m
[34m[2023-10-05:13:29:24:INFO] Determined delimiter of CSV input is ','[0m
[34m[13:29:24] S3DistributionType set as FullyReplicated[0m
[34m[13

In [None]:
xgb_deploy=xgb.deploy(initial_instance_count=1,instance_type="ml.m4.xlarge")

INFO:sagemaker:Creating model with name: xgboost-2023-10-05-13-32-29-634
INFO:sagemaker:Creating endpoint-config with name xgboost-2023-10-05-13-32-29-634
INFO:sagemaker:Creating endpoint with name xgboost-2023-10-05-13-32-29-634


-------!

Congratulations! You have successfully deployed your AWS SageMaker model. Confirm deployment by heading to the **SageMaker console** >> **Inference** >> **Endpoints**.

## 🧑‍🍳 Test the SageMaker Inference endpoint

In [None]:
# Replace the credentials below
sagemaker_runtime = boto3.client(
    "runtime.sagemaker", aws_access_key_id="<your_access_key>",
    aws_secret_access_key="<your_secret_access_key>", region_name='us-east-1')

# The endpoint name must be unique within 
# an AWS Region in your AWS account. 
endpoint_name='<your_endpoint_name>'
# Gets inference from the model hosted at the specified endpoint:
response = sagemaker_runtime.invoke_endpoint(
    EndpointName=endpoint_name, 
    Body=bytes(' 1., 1., 1., 9., 0., 30., 1., 0., 1., 0.', 'utf-8')
    )

response['Body'].read().decode('utf-8')

## 🧹 Delete your endpoint

Remember to delete your endpoint when you are done with this demo to save costs. Delete the endpoint in your notebook and the configuration files:

In [None]:
xgb_deploy.delete_endpoint()

See Modelbit's blog for more: https://www.modelbit.com/blog