## Assignment No : 4
Implement K-Nearest Neighbors algorithm on diabetes.csv dataset. Compute confusion 
matrix, accuracy, error rate, precision and recall on the given dataset. 

Dataset link : https://www.kaggle.com/datasets/abdallamahgoub/diabetes

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score, accuracy_score

In [2]:
# Load dataset
data = pd.read_csv("diabetes.csv")

In [3]:
# Replace 0 with NaN in selected columns and fill with mean
cols_to_replace = ["Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI"]
for col in cols_to_replace:
    data[col] = data[col].replace(0, np.nan)
    data[col] = data[col].fillna(round(data[col].mean(skipna=True)))

In [9]:
# Replace zero values with NaN, then fill with the mean
for column in data.columns[1:-3]:
    data[column] = data[column].replace(0, np.nan)
    data[column] = data[column].fillna(round(data[column].mean(skipna=True)))

data.head(10)

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,Pedigree,Age,Outcome
0,6,148.0,72.0,35.0,156.0,33.6,0.627,50,1
1,1,85.0,66.0,29.0,156.0,26.6,0.351,31,0
2,8,183.0,64.0,29.0,156.0,23.3,0.672,32,1
3,1,89.0,66.0,23.0,94.0,28.1,0.167,21,0
4,0,137.0,40.0,35.0,168.0,43.1,2.288,33,1
5,5,116.0,74.0,29.0,156.0,25.6,0.201,30,0
6,3,78.0,50.0,32.0,88.0,31.0,0.248,26,1
7,10,115.0,72.0,29.0,156.0,35.3,0.134,29,0
8,2,197.0,70.0,45.0,543.0,30.5,0.158,53,1
9,8,125.0,96.0,29.0,156.0,32.0,0.232,54,1


In [5]:
# Features and Target
X = data.drop("Outcome", axis=1)
Y = data["Outcome"]

# Train-Test Split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=0)

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# KNN model
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, Y_train)
knn_pred = knn.predict(X_test)

In [6]:
# Evaluation
print("Confusion Matrix\n", confusion_matrix(Y_test, knn_pred))
print("Accuracy Score:", accuracy_score(Y_test, knn_pred))
print("Recall Score:", recall_score(Y_test, knn_pred))
print("F1 Score:", f1_score(Y_test, knn_pred))
print("Precision Score:", precision_score(Y_test, knn_pred))

Confusion Matrix
 [[88 19]
 [18 29]]
Accuracy Score: 0.7597402597402597
Recall Score: 0.6170212765957447
F1 Score: 0.6105263157894737
Precision Score: 0.6041666666666666
