# Naive Bayes Using K-Nearest Neighbor

### Importing necessary libraries

- scikit-learn for datasets, model selection
- KNeighborsClassifier for KNN classification
- metrics for model evaluation.

In [1]:
# import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, accuracy_score

### Data Loading and Splitting:

- Load the Iris dataset using **datasets.load_iris(return_X_y=True)**. The data is split into features (**X**) and target values (**y**).
- Split the data into training and testing sets using **train_test_split**. It assigns 80% of the data for training (**X_train**, **y_train**) and 20% for testing (**X_test**, **y_test**). A random state is set for reproducibility, and **stratify** ensures that the class distribution is preserved in the split.

In [3]:
X, y = datasets.load_iris(return_X_y = True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=50, stratify=y)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(120, 4) (30, 4) (120,) (30,)


### KNN Classifier:

- Initialize a KNeighborsClassifier (**knn**) with the following settings:
    - **n_neighbors=11**: The number of neighbors to consider in the classification. In this case, it uses 11 nearest neighbors.
    - **weights='distance'**: This means that the contribution of the neighbors to the classification is based on their distance. Closer neighbors have a greater influence.

### Model Training:

- Fit the KNN classifier on the training data (**X_train**, **y_train**) using **knn.fit**.

### Model Evaluation:

- Use the trained KNN model to make predictions on the test set (**X_test**) and store the predictions in **predKnn**.
- Calculate and print the accuracy score of the model using **accuracy_score** from scikit-learn. The accuracy score measures the percentage of correctly classified instances.
- Calculate and print the confusion matrix using **confusion_matrix** from scikit-learn. The confusion matrix provides a breakdown of true positives, true negatives, false positives, and false negatives, allowing a detailed analysis of the model's performance.

In [4]:
knn = KNeighborsClassifier(n_neighbors=11, weights='distance')
knn.fit(X_train, y_train)

predKnn = knn.predict(X_test)
print(accuracy_score(y_test, predKnn))
print(confusion_matrix(y_test, predKnn))

0.9666666666666667
[[10  0  0]
 [ 0  9  1]
 [ 0  0 10]]
