The dataset contains various features like **Cholesterol**, **Glucose**, **BMI**, and a **Diabetes** column, which will be the target variable indicating whether the patient has diabetes or not.

We can proceed to build a K-Nearest Neighbors (KNN) model to detect diabetes based on the features provided.

In [27]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the dataset
file_path = 'diabetes.csv'
data = pd.read_csv(file_path)

# Drop irrelevant columns
data = data.drop(columns=['Patient number'])

# Convert categorical variables to numerical (if any)
label_encoders = {}
categorical_columns = ['Gender', 'Diabetes']

for column in categorical_columns:
    label_encoders[column] = LabelEncoder()
    data[column] = label_encoders[column].fit_transform(data[column])

# Define the target variable and features
X = data.drop('Diabetes', axis=1)
y = data['Diabetes']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train a K-Nearest Neighbors (KNN) model
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

# Make predictions
y_pred = knn.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print(f'Accuracy: {accuracy * 100:.2f}%','\n')
print('Confusion Matrix:\n','\n', conf_matrix,'\n')
print('Classification Report:\n','\n', class_report)


Accuracy: 83.33% 

Confusion Matrix:
 
 [[ 5 11]
 [ 2 60]] 

Classification Report:
 
               precision    recall  f1-score   support

           0       0.71      0.31      0.43        16
           1       0.85      0.97      0.90        62

    accuracy                           0.83        78
   macro avg       0.78      0.64      0.67        78
weighted avg       0.82      0.83      0.81        78



**Explanation:**

**Preprocessing:**

*We drop the 'Patient number' column, as it's not useful for prediction.

*Categorical variables like 'Gender' and 'Diabetes' are converted to numerical values using 'LabelEncoder'.

**Model Training:**

*The dataset is split into training and testing sets.

*Features are standardized to improve the performance of the KNN model.

A* KNN model is trained with the training data.

**Evaluation:**

The model's accuracy, confusion matrix, and classification report are printed to assess its performance.

**[Let me know if you need further assistance!](https://isiotech.com)**