**Machine Learning Project | Diabetes Prediction System**

I’m excited to share my latest Machine Learning project where I built a Diabetes Prediction Model using real patient health data.

Project Overview

This system takes medical inputs such as glucose level, blood pressure, BMI, insulin, age, and other health indicators, then predicts whether a person is diabetic or non-diabetic using a trained ML model.

Sample Prediction

Based on the input data provided, the model successfully predicted

✅ Diabetic

⚙️ Technologies Used

Python

Scikit-learn

StandardScaler (Data Normalization)

Pandas & NumPy

Machine Learning Classification Algorithms

In [99]:
#Diabetes Prediction using Machine Learning

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score

Data Collection and data analysis

In [100]:
data = pd.read_csv('/content/Diabetes .csv')

In [101]:
data

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1
...,...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,32.9,0.171,63,0
764,2,122,70,27,0,36.8,0.340,27,0
765,5,121,72,23,112,26.2,0.245,30,0
766,1,126,60,0,0,30.1,0.349,47,1


In [102]:
data.shape

(768, 9)

In [103]:
data.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


Statistical measures of the data

In [104]:
#getting stasticle mesasures of the data

data.describe()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


In [105]:
#value count
data['Outcome'].value_counts()

Unnamed: 0_level_0,count
Outcome,Unnamed: 1_level_1
0,500
1,268


In [106]:
data.groupby('Outcome').mean()

Unnamed: 0_level_0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
Outcome,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,3.298,109.98,68.184,19.664,68.792,30.3042,0.429734,31.19
1,4.865672,141.257463,70.824627,22.164179,100.335821,35.142537,0.5505,37.067164


In [107]:
#separting the data set
x = data.drop(columns = 'Outcome', axis = 1)
y = data['Outcome']

In [108]:
print(x)

     Pregnancies  Glucose  BloodPressure  SkinThickness  Insulin   BMI  \
0              6      148             72             35        0  33.6   
1              1       85             66             29        0  26.6   
2              8      183             64              0        0  23.3   
3              1       89             66             23       94  28.1   
4              0      137             40             35      168  43.1   
..           ...      ...            ...            ...      ...   ...   
763           10      101             76             48      180  32.9   
764            2      122             70             27        0  36.8   
765            5      121             72             23      112  26.2   
766            1      126             60              0        0  30.1   
767            1       93             70             31        0  30.4   

     DiabetesPedigreeFunction  Age  
0                       0.627   50  
1                       0.351   31  


In [109]:
print(y)

0      1
1      0
2      1
3      0
4      1
      ..
763    0
764    0
765    0
766    1
767    0
Name: Outcome, Length: 768, dtype: int64


In [110]:
#data standardization

scaler = StandardScaler()


In [111]:
scaler.fit(x_train)   # X_train is a pandas DataFrame

In [112]:
standardized_data = scaler.transform(x)



In [113]:
print(standardized_data)

[[  5.99123942 149.00945133  71.22887685 ...  33.08490712   0.62367687
   50.05179198]
 [  1.00009371  85.57123675  65.29144819 ...  26.18748865   0.35124282
   31.03157878]
 [  7.9876977  184.25290387  63.31230531 ...  22.93584852   0.66809547
   32.03264264]
 ...
 [  4.99301028 121.82164508  71.22887685 ...  25.79335045   0.24661235
   30.03051493]
 [  1.00009371 126.85642401  59.35401954 ...  29.63619789   0.34926866
   47.04860042]
 [  1.00009371  93.62688304  69.24973396 ...  29.93180153   0.31570794
   23.02306797]]


In [114]:
x = standardized_data
y = data['Outcome']

In [115]:
print(x)
print(y)

[[  5.99123942 149.00945133  71.22887685 ...  33.08490712   0.62367687
   50.05179198]
 [  1.00009371  85.57123675  65.29144819 ...  26.18748865   0.35124282
   31.03157878]
 [  7.9876977  184.25290387  63.31230531 ...  22.93584852   0.66809547
   32.03264264]
 ...
 [  4.99301028 121.82164508  71.22887685 ...  25.79335045   0.24661235
   30.03051493]
 [  1.00009371 126.85642401  59.35401954 ...  29.63619789   0.34926866
   47.04860042]
 [  1.00009371  93.62688304  69.24973396 ...  29.93180153   0.31570794
   23.02306797]]
0      1
1      0
2      1
3      0
4      1
      ..
763    0
764    0
765    0
766    1
767    0
Name: Outcome, Length: 768, dtype: int64


In [116]:
#split data traing data and test data
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2, stratify=y, random_state=2)


In [117]:
print(x.shape, x_test.shape, y_train.shape, y_test.shape)

(768, 8) (154, 8) (614,) (154,)


In [118]:
#training the model

classifier = svm.SVC(kernel='linear')

In [119]:
#Traning svm classifer
classifier.fit(x_train, y_train)

In [120]:
#Model evaluation

#accuracy score
x_train_prediction = classifier.predict(x_train)
training_data_accuracy = accuracy_score(x_train_prediction, y_train)


In [121]:
print('Accuracy score of the traning data: ', training_data_accuracy)

Accuracy score of the traning data:  0.7866449511400652


In [122]:
#test data accuracy score
x_test_prediction = classifier.predict(x_test)
test_data_accuracy = accuracy_score(x_test_prediction, y_test)


In [123]:
print('Accuracy score of the test data: ',test_data_accuracy)

Accuracy score of the test data:  0.7727272727272727


In [125]:
input_data = input('Enter your datas (comma-separated): ')

# Convert input string to a list of floats
input_data_list = [float(x) for x in input_data.split(',')]

# Change input data to numpy array
input_data_as_numpy_array = np.array(input_data_list)

# Reshaping array as we are predicting for one instance
input_data_reshaped = input_data_as_numpy_array.reshape(1, -1)

# Standardize data
std_data = scaler.transform(input_data_reshaped)
print(std_data)

prediction = classifier.predict(std_data)
print(prediction)

if (prediction[0] == 0):
  print('The person is not diabetic')
elif (prediction[0] == 1):
  print('The person is diabetic')

Enter your datas (comma-separated): 2,197,70,45,543,30.5,0.158,30
[[1.99832285e+00 1.98350285e+02 6.92497340e+01 4.46312487e+01
  5.27850289e+02 3.00303361e+01 1.60736393e-01 3.00305149e+01]]
[1]
The person is diabetic
