### K-Nearest Neighbors (KNN) Classification

K-Nearest Neighbors (KNN) is a simple, instance-based learning algorithm used for both classification and regression tasks. It makes predictions based on the k most similar training examples in the feature space.

#### Concept

The main idea of KNN is to classify a new data point based on its similarity to the k-nearest data points in the training set. The similarity is usually measured using distance metrics such as Euclidean distance.

#### Steps:
1. Choose the number of neighbors \( k \).
2. Calculate the distance between the new data point and all training data points.
3. Select the k-nearest neighbors.
4. Assign the class label based on the majority class among the k-nearest neighbors.

In [5]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target

# Convert to DataFrame for better readability (optional)
df = pd.DataFrame(X, columns=breast_cancer.feature_names)
df['target'] = y

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Initialize and train the model
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train_scaled, y_train)

# Make predictions
y_pred = knn.predict(X_test)
y_pred_proba = knn.predict_proba(X_test_scaled)[:, 1]

# Custom thresholds
thresholds = np.arange(0, 1.1, 0.1)

# Initialize lists for confusion matrix components
tps = []
fps = []
tns = []
fns = []

# Calculate true positives, false positives, true negatives, and false negatives for each threshold
for threshold in thresholds:
    temp_prediction = [1 if y >= threshold else 0 for y in y_pred_proba]
    tn, fp, fn, tp = confusion_matrix(y_test, temp_prediction).ravel()
    tps.append(tp)
    fps.append(fp)
    tns.append(tn)
    fns.append(fn)

# Create the dataframe with thresholds, true positives, false positives, true negatives, and false negatives
df_confusion_matrix = pd.DataFrame({'Thresholds': thresholds, 
                                    'TruePositive': tps, 'FalsePositive': fps,
                                    'TrueNegative': tns, 'FalseNegative': fns})

# Calculate recall, precision, F1 score, and false positive ratio
df_confusion_matrix["recall"] = df_confusion_matrix["TruePositive"] / (df_confusion_matrix["TruePositive"] + df_confusion_matrix["FalseNegative"])
df_confusion_matrix["precision"] = df_confusion_matrix["TruePositive"] / (df_confusion_matrix["TruePositive"] + df_confusion_matrix["FalsePositive"])
df_confusion_matrix["f1_score"] = 2 * (df_confusion_matrix["precision"] * df_confusion_matrix["recall"]) / (df_confusion_matrix["precision"] + df_confusion_matrix["recall"])



In [10]:
df_confusion_matrix

Unnamed: 0,Thresholds,TruePositive,FalsePositive,TrueNegative,FalseNegative,recall,precision,f1_score
0,0.0,108,63,0,0,1.0,0.631579,0.774194
1,0.1,108,10,53,0,1.0,0.915254,0.955752
2,0.2,108,10,53,0,1.0,0.915254,0.955752
3,0.3,106,6,57,2,0.981481,0.946429,0.963636
4,0.4,106,6,57,2,0.981481,0.946429,0.963636
5,0.5,105,4,59,3,0.972222,0.963303,0.967742
6,0.6,104,3,60,4,0.962963,0.971963,0.967442
7,0.7,104,3,60,4,0.962963,0.971963,0.967442
8,0.8,104,3,60,4,0.962963,0.971963,0.967442
9,0.9,93,2,61,15,0.861111,0.978947,0.916256


In [9]:
# Import the graph_objects module from the Plotly library
import plotly.graph_objects as go

# Create a new figure for plotting
fig = go.Figure()

# Round the values in the df_confusion_metrix DataFrame to two decimal places
df_confusion_matrix_graph = df_confusion_matrix.round(2)

fig.add_trace(go.Scatter(x=df_confusion_matrix_graph['Thresholds'], y=df_confusion_matrix_graph['recall'], mode='lines+markers', name='Recall'))
fig.add_trace(go.Scatter(x=df_confusion_matrix_graph['Thresholds'], y=df_confusion_matrix_graph['precision'], mode='lines+markers', name='Precision'))
fig.add_trace(go.Scatter(x=df_confusion_matrix_graph['Thresholds'], y=df_confusion_matrix_graph['f1_score'], mode='lines+markers', name='F1 Score'))
fig.update_layout(title='Recalls, Precisions and F1 Score', xaxis_title='Thresholds', yaxis_title='Scores')

# Display the figure
fig.show()