**Implement K-Nearest Neighbors algorithm on diabetes.csv dataset. Compute confusion matrix, accuracy, error rate, precision and recall on the given dataset.**
Dataset link : https://www.kaggle.com/datasets/abdallamahgoub/diabetes

**Step 1: Import Libraries and Load the Dataset**

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score

In [2]:
data = pd.read_csv('diabetes.csv')
print(data.head())

   Pregnancies  Glucose  BloodPressure  SkinThickness  Insulin   BMI  \
0            6      148             72             35        0  33.6   
1            1       85             66             29        0  26.6   
2            8      183             64              0        0  23.3   
3            1       89             66             23       94  28.1   
4            0      137             40             35      168  43.1   

   Pedigree  Age  Outcome  
0     0.627   50        1  
1     0.351   31        0  
2     0.672   32        1  
3     0.167   21        0  
4     2.288   33        1  


**Step 2: Prepare the Data**

In [3]:
# Features and target variable
X = data.drop('Outcome', axis=1)
y = data['Outcome']

In [4]:
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [5]:
# Normalize the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

**Step 3: Initialize and Train the KNN Model**

In [6]:
# Initialize the KNN classifier
knn = KNeighborsClassifier(n_neighbors=5)

In [7]:
# Train the model
knn.fit(X_train, y_train)

**Step 4: Make Predictions and Evaluate the Model**

In [8]:
# Make predictions on the test set
y_pred = knn.predict(X_test)

In [9]:
# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

Confusion Matrix:
[[79 20]
 [27 28]]


In [10]:
# Accuracy Score
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy*100:.2f}%")

Accuracy: 69.48%


In [11]:
# Error Rate
error_rate = 1 - accuracy
print(f"Error Rate: {error_rate*100:.2f}%")

Error Rate: 30.52%


In [12]:
# Precision Score
precision = precision_score(y_test, y_pred)
print(f"Precision: {precision:.2f}")

Precision: 0.58


In [13]:
# Recall Score
recall = recall_score(y_test, y_pred)
print(f"Recall: {recall:.2f}")

Recall: 0.51
