# Deploy ML model

Machine learning models are only valuable if they can be used to make predictions or classifications on new data. Therefore, deploying a machine learning model is an essential step in the machine learning workflow. This repository provides a practical guide on how to save a machine learning model using Pickle, a Python library that allows the serialization and deserialization of Python objects, and deploy it using Streamlit, a popular web application framework.

We'll start by exploring the basics of Pickle, including how to use it to save and load machine learning models. We'll then cover how to use Streamlit to create a user interface for the machine learning model. Streamlit is a powerful tool that allows developers to create interactive web applications quickly and easily. We'll use Streamlit to create a simple web application that allows users to input data and receive predictions from the saved machine learning model.

By the end of this repository, you'll have a good understanding of how to save a machine learning model using Pickle and deploy it using Streamlit. You'll be able to use these techniques to create your own web applications that use machine learning models to make predictions or classifications on new data.

In [1]:
# import some libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score

In [2]:
#load the data
diabetes_dataset = pd.read_csv('diabetes.csv') 

In [3]:
# In this case we have a slightly imbalanced dataset
diabetes_dataset['Outcome'].value_counts()

0    500
1    268
Name: Outcome, dtype: int64

0 --> Non-Diabetic

1 --> Diabetic

In [4]:
diabetes_dataset.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


As we can see in our dataset we have a specific columns in our dataset like Pregnancies, SkinThickness and DiabetesPedigreeFunction that are hard to know so for this example we going to drop this columns. Also the age column because want a model that dont take this in consideration, just for demostration purposes.

In [5]:
# Remove three columns as index base
diabetes_dataset.drop(diabetes_dataset.columns[[0,3,6,7]], axis=1, inplace=True)

In [6]:
diabetes_dataset.head()

Unnamed: 0,Glucose,BloodPressure,Insulin,BMI,Outcome
0,148,72,0,33.6,1
1,85,66,0,26.6,0
2,183,64,0,23.3,1
3,89,66,94,28.1,0
4,137,40,168,43.1,1


 ---

In [7]:
# separating the data and labels
X = diabetes_dataset.drop(columns = 'Outcome', axis=1)
Y = diabetes_dataset['Outcome']

Now we going to split our data into train and test data, we going to use 80% of the data for training and 20% for testing.

In [8]:
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.2, stratify=Y, random_state=2)

## Select algorithm and train model

In [9]:
# we going to chose the SVM classifier because it is a good classifier for binary classification
classifier = svm.SVC(kernel='linear')

In [10]:
#training the support vector Machine Classifier
classifier.fit(X_train, Y_train)

## Model Evaluation

In this stage we going to evaluate our model for ensure that our model is good enough for production. We going to use Accuracy Score for this. This value is the percentage of correct predictions that our model made.

In [11]:
# accuracy score on the training data
X_train_prediction = classifier.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)

In [12]:
print('Accuracy score of the training data : ', training_data_accuracy)

Accuracy score of the training data :  0.762214983713355


In [13]:
# accuracy score on the test data
X_test_prediction = classifier.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)

In [14]:
print('Accuracy score of the test data : ', test_data_accuracy)

Accuracy score of the test data :  0.7662337662337663


76% average accuracy is not bad, you can try to improve this value by changing the algorithm or by tuning the hyperparameters of the algorithm. for this example we think it is good enough.

Now it's time to make a single prediction. We will use the test data to make a prediction and compare it with the actual value.

In [15]:
# In this sample we're telling thah the glucose level is high, the blood pressure is normal, insulin level is high and the BMI is high, this sample is new for the model
input_data = (166,72,175,25.8)

# changing the input_data to numpy array
input_data_as_numpy_array = np.asarray(input_data)

# reshape the array as we are predicting for one instance
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1) #predict Expected 2D array, got 1D array instead
#Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

In [16]:
prediction = classifier.predict(input_data_reshaped)
print(prediction)

if (prediction[0] == 0):
  print('The person is not diabetic')
else:
  print('The person is diabetic')

[1]
The person is diabetic




 ---

## Saving the trained model

Now we going to save our model using pickle. Pickle is a Python library that allows the serialization and deserialization of Python objects. It is used to save a machine learning model so that it can be used to make predictions on new data.

In [17]:
import pickle


In [18]:
# Save the trained model as a pickle string.
filename = 'trained_model.sav'
pickle.dump(classifier, open(filename, 'wb'))

And that's it! We have successfully saved our machine learning model using Pickle. In the file main.py we have the code to load the saved model and using streamlit to create a web application that allows users to input data and receive predictions from the saved machine learning model.