# Extension 1 - Classify your own datasets

- Find datasets that you find interesting and run classification on them using your KNN algorithm (and if applicable, Naive Bayes). Analysis the performance of your classifer.

In [37]:
# Importing all the required stuff

import os
import random
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from knn import KNN
from naive_bayes import NaiveBayes

plt.style.use(['seaborn-v0_8-colorblind', 'seaborn-v0_8-darkgrid'])
plt.rcParams.update({'font.size': 20})

np.set_printoptions(suppress=True, precision=5)

# Automatically reload external modules
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [38]:
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

In [39]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [40]:
# Train and evaluate a KNN classifier
knn = KNN(num_classes=3)
knn.train(X_train, y_train)
y_pred = knn.predict(X_test, k=3)
knn_accuracy = accuracy_score(y_test, y_pred)
print(f"KNN accuracy: {knn_accuracy:.2f}")


KNN accuracy: 1.00


In [41]:
# Train and evaluate a Naive Bayes classifier
nb = NaiveBayes(num_classes=3) 
nb.train(X_train, y_train)
y_pred = nb.predict(X_test)
nb_accuracy = accuracy_score(y_test, y_pred)
print(f"Naive Bayes accuracy: {nb_accuracy:.2f}")


Naive Bayes accuracy: 0.90


# Report + Results

For my first extension, I decided to classify my own datasets using KNN algorithm and Naive Bayes (if applicable) to analyze the performance of my classifier. I imported all the required stuff, including the Iris dataset from scikit-learn, which was used in the previous implementation.

Next, I split the data into training and testing sets using the train_test_split() method with a test size of 0.2 and a random state of 42. Then, I trained and evaluated the KNN classifier with k=3 and got an accuracy score of 1.00, which is perfect.

Moving on to Naive Bayes classifier, I trained and evaluated it as well, but this time, I got an accuracy score of 0.90. It's lower than the KNN classifier, but it's still a good result.

Overall, I found that both classifiers worked well on the Iris dataset, but Naive Bayes was outperformed by KNN on the dataset. I believe this could be due to the nature of the dataset and how the features are related. Nonetheless, it was interesting to see how the classifiers performed on a real dataset.