<div align="center" dir="auto">
<p dir="auto">

<a href="https://colab.research.google.com/github/write-with-neurl/modelbit-articles/blob/main/modelbit-02/code/Modelbit_Sample_Deployment_Vs_SageMaker.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

</p>

# 🚀 Comparative Analysis: Deploying a model with Modelbit vs with SageMaker

## 🧑‍💻 Installation and Set Up

In this walkthrough, you will deploy a simple XGBoost classifier same machine learning model directly from this Google Colab notebook using [Modelbit][modelbit-getting-started].

> Modelbit simplifies deploying and managing your machine learning models in production. It integrates with version control tools like Git and Gitlab, CI/CD tools like Github Actions and Azure DevOps; and machine learning tools like Weights & Biases and neptune.ai to move models from development to production quickly.

To get started with this notebook, create [an account][modelbit-create-an-account] with Modelbit, if you haven't.

Install the Modelbit package via `pip` in your Google Colab (or Jupyter) notebook:

[modelbit-getting-started]: https://doc.modelbit.com/getting-started/

[modelbit-create-an-account]: https://app.modelbit.com/

In [2]:
# Using latest version of pip
!pip install --upgrade pip

!pip install modelbit


Collecting pip
  Downloading pip-23.3-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m21.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.1.2
    Uninstalling pip-23.1.2:
      Successfully uninstalled pip-23.1.2
Successfully installed pip-23.3
[0m

In [None]:

%%sh
ls

diabetes_data.csv
sample_data
test.csv
train.csv
validation.csv


## 🛢️Download the sample data

The sample dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective is to predict based on diagnostic measurements whether a patient has diabetes.

**Features**

- Pregnancies: Number of times pregnant

- Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test

- BloodPressure: Diastolic blood pressure (mm Hg)

- SkinThickness: Triceps skin fold thickness (mm)

- Insulin: 2-Hour serum insulin (mu U/ml)
BMI: Body mass index (weight in kg/(height in m)^2)

- DiabetesPedigreeFunction: Diabetes pedigree function

- Age: Age (years)

**Label**

- Outcome: Class variable (0 or 1)

Find out [more](https://www.kaggle.com/datasets/mathchi/diabetes-data-set) on the dataset source.

In [5]:
!wget https://raw.githubusercontent.com/write-with-neurl/modelbit-articles/main/modelbit-02/data/diabetes_data.csv

--2023-10-18 20:18:02--  https://raw.githubusercontent.com/write-with-neurl/modelbit-articles/main/modelbit-02/data/diabetes_data.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6347570 (6.1M) [text/plain]
Saving to: ‘diabetes_data.csv’


2023-10-18 20:18:02 (57.3 MB/s) - ‘diabetes_data.csv’ saved [6347570/6347570]



## 🧪 Quickly build a test model

In [6]:
"""Importing the necessary tools"""

import json
import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.metrics import accuracy_score

"""Reading the data"""
diabetes_dataset=pd.read_csv("diabetes_data.csv")

# Select specific columns
diabetes_selected = diabetes_dataset[['HighBP','HighChol','Smoker','Age','Sex','BMI','Fruits','HvyAlcoholConsump','HeartDiseaseorAttack','PhysActivity','Diabetes_binary']].dropna()

"""Splitting the data into training and test set"""
train,validation,test=np.split(diabetes_selected.sample(frac=1,random_state=52),[int(0.7*len(diabetes_dataset)),int(0.9*len(diabetes_dataset))])
train.to_csv("train.csv",index=False,header=False)
validation.to_csv("validation.csv",index=False,header=False)
test.to_csv("test.csv",index=False,header=False)

"""Read from the train and test data defined"""
train_data = pd.read_csv("train.csv", header=None)
validation_data = pd.read_csv("validation.csv", header=None)
test_data = pd.read_csv("test.csv", header=None)

"""Split the data into features(X) and target(Y)"""
X_train, y_train = train_data.iloc[:, :-1], train_data.iloc[:, -1]
X_validation, y_validation = validation_data.iloc[:, :-1], validation_data.iloc[:, -1]
X_test, y_test = test_data.iloc[:, :-1], test_data.iloc[:, -1]

"""Define parameters for xgboost """
params = {
    'objective': 'binary:logistic',
    'max_depth': 3,
    'learning_rate': 0.1,
    'n_estimators': 100,
    'eval_metric': 'logloss'
}
"""Create an XGBoost model"""
model = xgb.XGBClassifier(**params)
"""Train the model"""
model.fit(X_train, y_train)

# Make predictions on the validation set
validation_preds = model.predict(X_validation)
# Make predictions on the test set
test_preds = model.predict(X_test)

#Evaluate model accuracy on test
test_accuracy = accuracy_score(y_test, test_preds)


## 🚢 Ship the model to a REST API endpoint with Modelbit

###🔐 Log into Modelbit

Use the [`modelbit`](https://doc.modelbit.com/deployments/) library for model deployment and management. Modelbit offers the flexibility to deploy ML models direclty from your notebooks to a production environment powered by REST APIs. With fully custom Python environments. Backed by your git repo.

In [8]:
import modelbit


# Ensure you create a "dev" branch in Modelbit or use the "main" branch for your deployment
mb = modelbit.login(branch="main")

### 🛠️ Define the prediction function

In [9]:
def predict(bloodpressure:float,
            cholesterol:float,
            smoker:float,
            age:float,
            sex:float,
            bmi:float,
            fruits:float,
            alcohol:float,
            heartattack:float,
            activity:float)->list:
            return model.predict([[bloodpressure, cholesterol, smoker, age, sex, bmi, fruits, alcohol, heartattack, activity]])

mb.deploy(predict)

Uploading 'model': 100%|██████████| 19.8k/19.8k [00:00<00:00, 21.3kB/s]


You can test your REST Endpoint by [sending single or batch requests](https://doc.modelbit.com/deployments/rest-api/single-inference) to it for scoring.

Use the `requests` package to POST a request to the API and use `json` to format the response to print nicely:


> ⚠️ Replace the `<ENTER WORKPACE NAME>` placeholder with your workspace name.

In [11]:
"""Testing the endpoints"""
import requests
import json

url = "https://<ENTER WORKPACE NAME>.app.modelbit.com/v1/predict/latest"
headers = {
    'Content-Type': 'application/json'
}
data = {
    "data": [1.,  1.,  1.,  9.,  0., 30.,  1.,  0.,  1.,  0.,]
}

response = requests.post(url, headers=headers, json=data)
response_json = response.json()

print(json.dumps(response_json, indent=4))

{
    "data": [
        1
    ]
}


With two steps: `modelbit.login()` and `modelbit.deploy()`, you have a live production endpoint that:
- auto-scales,
- responds in real-time,
- consumes batch traffic; all from within your notebook environment and with you maintained your existing stack.

Need to ship a new model to the endpoint? Simply update the `git` branch, and that's it! 😎
See Modelbit's blog for more: https://www.modelbit.com/blog