**Predicting Diabetes Using Machine Learning**
---
Introduction:

Welcome to the "Predicting Diabetes Using Machine Learning" project! In this endeavor, we embark on a journey to harness the power of data and predictive modeling to tackle a crucial healthcare challenge: diabetes prediction. Diabetes, a chronic metabolic disorder, affects millions of individuals
worldwide. Early detection and intervention are paramount to managing this condition effectively.

This project centers around the idea of utilizing machine learning techniques to predict whether a person is likely to have diabetes based on a set of input features. By analyzing relevant medical data and building a predictive model, we aim to create a tool that can assist healthcare professionals in identifying potential cases of diabetes early on.



In [1]:
# Built by Sarthak Mishra
# As a fun self Project to test out datapreprocessing and regression techniques 
# August 2023

**Importing The necessary libraries:**
---
---

Here we are just importing the necessary libraries for the project to make use of predefined functions and make our code redable and short.


**Libraries include:**


*   NumPy
*   Pandas
*   sklearn




In [None]:
import numpy as np
!pip install gradio
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score

Collecting gradio
  Downloading gradio-3.47.1-py3-none-any.whl (20.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.3/20.3 MB[0m [31m26.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl (15 kB)
Collecting fastapi (from gradio)
  Downloading fastapi-0.103.2-py3-none-any.whl (66 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m66.3/66.3 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting ffmpy (from gradio)
  Downloading ffmpy-0.3.1.tar.gz (5.5 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting gradio-client==0.6.0 (from gradio)
  Downloading gradio_client-0.6.0-py3-none-any.whl (298 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m298.8/298.8 kB[0m [31m23.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting httpx (from gradio)
  Downloading httpx-0.25.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

**Loading the Diabetes Dataset**
---
---
This is a collection of medical records stored in a form of ***csv*** file with ***9 features*** and contains ***100,000 rows*** which are:  

1. **Gender**: The sex of the patient.
2. **Age**: The age of the patient in years.
3. **Hypertension**: Whether the patient has hypertension (high blood pressure).
4. **Heart_disease**: Whether the patient has heart disease.
5. **Smoking_history**: Whether the patient has a history of smoking.
6. **BMI**: The body mass index (BMI) of the patient. BMI is a measure of body weight relative to height.
7. **HbA1c_level**: The HbA1c level of the patient. HbA1c is a measure of blood sugar control over the past 3 months.
8. **Blood_glucose_level**: The blood glucose level of the patient. Blood glucose is a measure of the amount of sugar in the blood.
9. **Diabetes**: Whether the patient has diabetes.



In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
data = pd.read_csv("/content/drive/MyDrive/Dataset/diabetes_prediction_dataset.csv")

Last bottom rows of the dataset after converting the ***csv*** file to a pandas dataframe

In [None]:
data.tail() #Alternatively .head() can be used to take a look at the first five rows.


In [None]:
# Number and rows and columns in the dataset
data.shape

Now we wish to look at the statistical interpretation of the dataset to get further insights.


In [None]:
data.describe()

Diabetes
---
---
0 -> Not Diabetic

1 -> Diabetic


In [None]:
data['diabetes'].value_counts()

As we can see from the data smoking_history is of the type text we can't really work with that we have to create some sort of numerical mapping for our model to understand.

In [None]:
data['num_smoking_history'] = data['smoking_history'].map({'No Info':-1,'never':0,'former':1,'current':2,'not current':3,'ever':4})


Now we replace the text data with our numerical mapping data


In [None]:
data.drop('smoking_history' ,axis = 1 )


Now we map the gender data as binary 0: Female, 1: Male

In [None]:
data['num_gender'] = data['gender'].map({'Female':0,'Male':1 })
data.drop('gender' ,axis = 1 )

# Reorganising the data
first_column = data.pop('num_gender')
data.insert(0,'num_gender' , first_column)
second_column = data.pop('num_smoking_history')
data.insert(1,'num_smoking_history' , second_column)
data.pop('gender' )
data.pop('smoking_history')
data = data.fillna(0)
data['num_gender'].isnull().sum().sum()

Now we split out data into training & test data and their respective labels

In [None]:
# Spliting into X : Parameter data (8 features) and Y : Label (Diabetes boolean)
# Then upon construction of our model and training it will conver M(x_i) = Y_i
# and compare it withe the actual valu (Y) and thus being able to calculate the accuracy

#Declaring all the data we wanna split as X
param_X = [
    'num_gender',
    'num_smoking_history',
    'age',
    'hypertension',
    'heart_disease',
    'bmi',
    'HbA1c_level',
    'blood_glucose_level'
]

param_Y = ['diabetes']

X = data[param_X]
Y = data[param_Y]


Spliting the Xs and Ys
---
---
Now we split the data into train data and test data respectively with a **80:20**, train:test ratio for good accuracy.

We will let the model train on ***(X_train)_i*** and give its coressponding label ***(Y_train)_i*** then after it has gone through it all, we will introduce it to the data that we kept ***hidden*** i.e. the testing data and we will test its prediction of ***(X_test)_j*** by comparing it with ***(Y_test)_j*** and thus, checking the model's accuracy.

In [None]:
X_train , X_test , Y_train , Y_test = train_test_split(X,Y,test_size = 0.2 , random_state = 69420)

In [None]:
# Importing the logistic regression model
# This is a inbuilt model from sklearn library.
# However we can define the hardcode the code the lib is running mannualy too.

from sklearn.linear_model import LogisticRegression
Model = LogisticRegression()

In [None]:
# Similar to a Mathematical function
# We are passing our input traing data X to the Model (treat it as a func.) | M(X_train)
# And giving the expected output Y_train as well according to the dataset.
Model.fit(X_train,Y_train)

Eureka!! 😄
---
---
Our model is trained now time to test it.


In [None]:
# Notice the Y_test != Y_test_pred basically we are just predicting here on the basis of what it has learned.
# And storing its value for future computation and analysis

Y_test_pred = Model.predict(X_test)

# Creating a DataFrame to understand the result visually



# 🥁🥁🥁

---
Now  for the scores...

In [None]:
# Train Data Score
Y_train_pred = Model.predict(X_train)
print("Model accuracy on test data : ", accuracy_score(Y_train,Y_train_pred))

In [None]:
# Test Data Score
print("Model accuracy on train data : ", accuracy_score(Y_test,Y_test_pred))

In [None]:
#Female	60	0	0	never	27.32	7.5	300 -- > 1
print(Model.predict(np.array([0,60,0,0,0,27.32,7.5,300]).reshape(1,-1))[0])

In [None]:

import gradio as gr
title = " Diabetes Detector & health advisor"

def process_input(gender, age , hypertension, smoking_history, heart_disease , bmi , Hb , blood_glucose_lvl ):
 return Model.predict(np.array([gender, age , hypt, smoking_history, heart_disease , bmi , Hb , blood_glucose_lvl]).reshape(1,-1))[0]
model_gui = gr.Interface(
  fn = process_input,
  inputs = [gr.Slider(0,1,step = 1) , gr.Slider(0,100, step = 1) , gr.Slider(0,1 , step = 1 ) , gr.Slider(-1,4 , step = 1 ) , gr.Slider(0,1 , step = 1 ) , gr.Slider(10, 50 , step = 0.01) , gr.Slider(3,9 , step = 0.01) , gr.Slider(50,300 , step = 1 ) ],
  outputs = ["bool"],
  title=title,

)
model_gui.launch()