<a href="https://colab.research.google.com/github/NUS-CS3244-AY2122S1-T42-Project/FashionMNIST/blob/master/kNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Import libraries**

In [18]:
import numpy as np
import pandas as pd
from skimage import filters, feature
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report

**Canny Filter**

In [14]:
def canny(X_train, X_test):
  # apply canny edge detection to all train and test
  # for each image, we append the edge detected image to the original image
  def apply_canny_filter(img): # img is a 1d array
    # takes 1d img array, applies filter and returns 1d array
    original_img = img.reshape(28,28)
    cannied_img = feature.canny(original_img)
    combined_img = np.concatenate((original_img, cannied_img))
    # plt.imshow(combined_img, cmap = 'Greys') # show concatenated image
    return combined_img.flatten()
  X_train_cannied = list(map(apply_canny_filter, X_train))
  X_test_cannied = list(map(apply_canny_filter, X_test))
  return (X_train_cannied, X_test_cannied)

**Raw Data**

In [21]:
# Read the CSV file
df = pd.read_csv("/fashion-mnist_train.csv")
df2 = pd.read_csv("/fashion-mnist_test.csv")

# raw data
X_train = df[df.columns[df.columns != 'label']].copy()
X_test = df2[df2.columns[df.columns != 'label']].copy()

# Get the label
y_train = df['label'].copy()
y_test = df2['label'].copy()

# Manually enter the meaningful name of each label
label = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# Transform the data into numpy array
X_train = X_train.to_numpy()
X_test = X_test.to_numpy()

# Transform the labels into lists
y_train = y_train.to_list()
y_test = y_test.to_list()

# Performance measures for different k-values
print("raw data: ")
for i in range(1, 10, 2):
    clf = KNeighborsClassifier(n_neighbors=i)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print("   k = ", i, ": ")
    print(classification_report(y_test, y_pred, target_names=label))

raw data: 
   k =  1 : 
              precision    recall  f1-score   support

 T-shirt/top       0.77      0.83      0.80      1000
     Trouser       0.98      0.98      0.98      1000
    Pullover       0.75      0.77      0.76      1000
       Dress       0.89      0.88      0.88      1000
        Coat       0.78      0.76      0.77      1000
      Sandal       0.99      0.86      0.92      1000
       Shirt       0.64      0.62      0.63      1000
     Sneaker       0.90      0.95      0.93      1000
         Bag       0.98      0.95      0.97      1000
  Ankle boot       0.90      0.97      0.94      1000

    accuracy                           0.86     10000
   macro avg       0.86      0.86      0.86     10000
weighted avg       0.86      0.86      0.86     10000

   k =  3 : 
              precision    recall  f1-score   support

 T-shirt/top       0.75      0.87      0.80      1000
     Trouser       0.99      0.97      0.98      1000
    Pullover       0.74      0.82      0.

**Raw Data + Image Edges, using Canny filter**

In [22]:
X_train_cannied, X_test_cannied = canny(X_train, X_test)
# Performance measures for different k-values
print("raw data + image edges: ")
for i in range(1, 10, 2):
    clf = KNeighborsClassifier(n_neighbors=i)
    clf.fit(X_train_cannied, y_train)
    y_pred = clf.predict(X_test_cannied)
    print("   k = ", i, ": ")
    print(classification_report(y_test, y_pred, target_names=label))

raw data + image edges: 
   k =  1 : 
              precision    recall  f1-score   support

 T-shirt/top       0.77      0.83      0.80      1000
     Trouser       0.98      0.98      0.98      1000
    Pullover       0.75      0.77      0.76      1000
       Dress       0.89      0.88      0.88      1000
        Coat       0.78      0.76      0.77      1000
      Sandal       0.99      0.86      0.92      1000
       Shirt       0.64      0.62      0.63      1000
     Sneaker       0.90      0.95      0.93      1000
         Bag       0.98      0.95      0.97      1000
  Ankle boot       0.90      0.97      0.94      1000

    accuracy                           0.86     10000
   macro avg       0.86      0.86      0.86     10000
weighted avg       0.86      0.86      0.86     10000

   k =  3 : 
              precision    recall  f1-score   support

 T-shirt/top       0.75      0.87      0.80      1000
     Trouser       0.99      0.97      0.98      1000
    Pullover       0.74    

**Conclusion**

We observe that the model using raw images acheived an overall accuracy around 86% for all k-values from 1 to 9, with a faster training time compared to the other model.

We also observe that using the image edges along with the raw images seem to not improve our model accuracy beyond 86%. K-values also did not play a significant role here.