# KNN - Predict whether a person has diabetes or not


This block performs a specific operation as part of the machine learning workflow.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### 🔧 Importing Required Libraries
This block imports necessary libraries such as `pandas`, `numpy`, and tools from `sklearn` for data processing and machine learning.

In [None]:
import pandas as pd
import numpy as np
# Preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Training
from sklearn.neighbors import KNeighborsClassifier

# Testing
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score

### 📥 Loading the Dataset
The dataset is loaded using `pandas.read_csv()`.

In [None]:
dataset = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Datasets/diabetes.csv')
print(f"length of dataset: {len(dataset)} ")
print(dataset.head())

length of dataset: 768 
   Pregnancies  Glucose  BloodPressure  SkinThickness  Insulin   BMI  \
0            6      148             72             35        0  33.6   
1            1       85             66             29        0  26.6   
2            8      183             64              0        0  23.3   
3            1       89             66             23       94  28.1   
4            0      137             40             35      168  43.1   

   DiabetesPedigreeFunction  Age  Outcome  
0                     0.627   50        1  
1                     0.351   31        0  
2                     0.672   32        1  
3                     0.167   21        0  
4                     2.288   33        1  


## Preprocessing the Data

### 💡 Code Explanation
This block performs a specific operation as part of the machine learning workflow.

In [None]:
# Replace Zeroes
zero_not_accepted = ['Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI']
for column in zero_not_accepted:
  dataset[column] = dataset[column].replace(0, np.NaN)
  mean = int(dataset[column].mean(skipna=True))
  dataset[column] = dataset[column].replace(np.NaN, mean)

### 🧪 Splitting Data into Train and Test Sets
Dividing the dataset into training and test subsets using an 80-20 split.

In [None]:
# Split dataset

X = dataset.iloc[:, 0:8] # Picks column 0-7 from dataset [X data]
y = dataset.iloc[:, 8] # Picks only column 8 from dataset [Y data]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.2)

### 💡 Code Explanation
This block performs a specific operation as part of the machine learning workflow.

In [None]:
# Feature Scaling
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

##KNN Model

### 🤖 KNN Model Training
Training a K-Nearest Neighbors classifier on the training dataset.

In [None]:
classifier = KNeighborsClassifier(n_neighbors=11, p=2, metric='euclidean')

### 💡 Code Explanation
This block performs a specific operation as part of the machine learning workflow.

In [None]:
classifier.fit(X_train, y_train)

## Evaluating The Model

### 💡 Code Explanation
This block performs a specific operation as part of the machine learning workflow.

In [None]:
y_pred = classifier.predict(X_test)

### 📉 Evaluating the Model
Evaluating model performance using confusion matrix, accuracy, precision, and recall.

In [None]:
cm = confusion_matrix(y_test, y_pred)
print(cm)

[[94 13]
 [15 32]]


### 💡 Code Explanation
This block performs a specific operation as part of the machine learning workflow.

In [None]:
print(f1_score(y_test, y_pred))

0.6956521739130435
