**Problem Statement**

This diabetes prediction model is a binary classification tool designed to assess whether an individual has diabetes or not, leveraging specific parameters as input for its analysis. By considering factors such as glucose levels, BMI, age, and other relevant health indicators, the model generates predictions with the aim of identifying individuals at risk of or already affected by diabetes. Deployable in healthcare settings, it assists healthcare professionals in early diagnosis and intervention, thereby potentially improving patient outcomes through timely management and treatment strategies tailored to individual needs.

Importing Dependencies

In [None]:
import numpy as np
import pandas as pd
from sklearn import svm
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn .metrics import accuracy_score

In [None]:
# Loading the dataset into a pandas DataFrame
diabetes_dataset = pd.read_csv('/content/diabetes_dataset.csv')

In [None]:
# Checking first 5 rows of the DataFrame
diabetes_dataset.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [None]:
# Checking the number of rows and columns in the DataFrame
diabetes_dataset.shape

(768, 9)

In [None]:
# Checking the statistical measures of the DataFrame
diabetes_dataset.describe()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


In [None]:
# Checking whether there are some empty values or not
diabetes_dataset.isnull().sum()

Pregnancies                 0
Glucose                     0
BloodPressure               0
SkinThickness               0
Insulin                     0
BMI                         0
DiabetesPedigreeFunction    0
Age                         0
Outcome                     0
dtype: int64

In [None]:
# Segregating the data into features and labels
x = diabetes_dataset.drop(columns='Outcome',axis=1)
y = diabetes_dataset['Outcome']

As the data in feature columns is not in similar range so we will standardize the data

In [None]:
scaler = StandardScaler()

In [None]:
scaler.fit_transform(x)
# This will fit the feature data into StandarScaler variable and transfrom it into a standardize format

array([[ 0.63994726,  0.84832379,  0.14964075, ...,  0.20401277,
         0.46849198,  1.4259954 ],
       [-0.84488505, -1.12339636, -0.16054575, ..., -0.68442195,
        -0.36506078, -0.19067191],
       [ 1.23388019,  1.94372388, -0.26394125, ..., -1.10325546,
         0.60439732, -0.10558415],
       ...,
       [ 0.3429808 ,  0.00330087,  0.14964075, ..., -0.73518964,
        -0.68519336, -0.27575966],
       [-0.84488505,  0.1597866 , -0.47073225, ..., -0.24020459,
        -0.37110101,  1.17073215],
       [-0.84488505, -0.8730192 ,  0.04624525, ..., -0.20212881,
        -0.47378505, -0.87137393]])

Splitting the data into training and testing data

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.2,stratify=y,random_state=2)
# In this diabetes_dataset we have the information of people who have diabetes and also the people who don't therefore we don't want our model to just train with one type of data so we use the stratify parameter

Initialising and training the ML model

In [None]:
classifier = svm.SVC(kernel='linear')

In [None]:
# Training the model with Training Data
classifier.fit(x_train,y_train)

Testing the data and calculating its model accuracy

In [None]:
# Testing on training_data
training_prediction = classifier.predict(x_train)
training_accuracy = accuracy_score(training_prediction,y_train)

In [None]:
print("The model accuracy for training data is: ",training_accuracy)

The model accuracy for training data is:  0.7833876221498371


In [None]:
# Testing on test_data
testing_prediction = classifier.predict(x_test)
testing_accuracy = accuracy_score(testing_prediction,y_test)


In [None]:
print("The model accuracy for testing data is: ",testing_accuracy)

The model accuracy for testing data is:  0.7727272727272727


Making our own prediction system

In [None]:
# Taking actual features as input, for --> 0
input=[3,126,88,41,235,39.3,0.704,27]

In [None]:
# Converting the input list into numpy array for fast and accurate functioning
input_array = np.asarray(input)

In [None]:
# Reshaped array
reshaped_array = input_array.reshape(1,-1)

In [None]:
# Making prediction
prediction = classifier.predict(reshaped_array)



In [None]:
# Finding Answer
if (prediction[0] == '0'):
  print("The person doesn't have diabetes")
else:
  print("The person has diabetes")

The person has diabetes
