In [1]:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [3]:
# Download Fashion MNIST dataset
fashion_mnist = keras.datasets.fashion_mnist
(trainImages, trainLabels), (testImages, testLabels) = fashion_mnist.load_data()

In [4]:
print("--------------------------")
print("Dimensions of Train Set")
print("Dimension(trImages)=",np.shape(trainImages))
print("There are", np.shape(trainImages)[0], "images where each image is", np.shape(trainImages)[1:], "in size")
print("There are", np.shape(np.unique(testLabels))[0], "unique image labels")
print("--------------------------")
print("Dimensions of Test Set")
print("Dimension(tImages)=",np.shape(testImages), "Dimension(tLabels)=", np.shape(testLabels)[0])
print("--------------------------")

--------------------------
Dimensions of Train Set
Dimension(trImages)= (60000, 28, 28)
There are 60000 images where each image is (28, 28) in size
There are 10 unique image labels
--------------------------
Dimensions of Test Set
Dimension(tImages)= (10000, 28, 28) Dimension(tLabels)= 10000
--------------------------


K-Nearest Neighbor Algorithm
K-Nearest Neighbor (or KNN) algorithm is a non-parametric classification algorithm.Non-parametric model, contrary to the name, has a very large number of parameters. In the case of Fashion MNIST example, we will use the entire Train Set as parameters of KNN.

The basic idea behind KNN is simple. Given a (test) vector or image to classify or label, find  k  vectors or images in Train Set that are "closest" to the (test) vector or image. With the  k  closest vectors or images, there are  k  labels. Assign the most frequent label of  k  labels to the (test) vector or image.



Closeness Metric
The idea of "closest" or "closeness" depends on the metric we choose to use; for instance

Euclidean Distance between two vectors  x=<x1,x2,x3>  and  y=<y1,y2,y3>  is defined as  dED:={(x1−y1)2+(x2−y2)2+(x3−y3)2}12 . In academic literature, you may see this being called L2 norm of  x−y .
L1 Distance between two vectors  x=<x1,x2,x3>  and  y=<y1,y2,y3>  is defined as  dL1:=|x1−y1|+|x2−y2|+|x3−y3| 
L0 Distance between two vectors  x=<x1,x2,x3>  and  y=<y1,y2,y3>  is defined as the number of non-zero elements in  x−y .
In this article, we will use the Euclidean distance and L0 distance.

Finding  k  Closest Vectors in Train Set
Given a vector (or image) from Test Set, we can't say which ones in the Train Set are closest without computing the metric over all elements in the Train Set. In the case of Fashion MNIST, we compute "closeness" metric of the vector from Test Set to every element, i.e., 60000 of them, in the Train Set and this will result in 60000 distance values. As you can imagine, if the Train Set is larger then it gets all that more time-consuming or computationally consuming to find all these distance values.

Optimizing Parameter  k 
I don't know if there is a systematic way to go about optimizing this parameter but try different "good" values for  k  and pick the one that works best. Let's review some extreme choices for  k :

If  k=1 , then labeling of the test vector or image is determined by one element in the Train Set
If  k=60000 , then label of the test vector is determined by all elements in the Train Set and if there is class imbalance, i.e., there are more images with a certain label in the Test Set, then every test vector will get the exact same label.

In [5]:
k = 11 # parameter k of K-nearest neighbors

# Defining KNN Graph with L0 Norm
x = tf.placeholder(trainImages.dtype, shape=trainImages.shape) # all train images, i.e., 60000 x 28 x 28
y = tf.placeholder(testImages.dtype, shape=testImages.shape[1:]) # a test image, 28 x 28

xThresholded = tf.clip_by_value(tf.cast(x, tf.int32), 0, 1) # x is int8 which is not supported in many tf functions, hence typecast
yThresholded = tf.clip_by_value(tf.cast(y, tf.int32), 0, 1) # clip_by_value converts dataset to tensors of 0 and 1, i.e., 1 where tensor is non-zero
computeL0Dist = tf.count_nonzero(xThresholded - yThresholded, axis=[1,2]) # Computing L0 Norm by reducing along axes
findKClosestTrImages = tf.contrib.framework.argsort(computeL0Dist, direction='ASCENDING') # sorting (image) indices in order of ascending metrics, pick first k in the next step
findLabelsKClosestTrImages = tf.gather(trLabels, findKClosestTrImages[0:k]) # doing trLabels[findKClosestTrImages[0:k]] throws error, hence this workaround
findULabels, findIdex, findCounts = tf.unique_with_counts(findLabelsKClosestTrImages) # examine labels of k closest Train images
findPredictedLabel = tf.gather(findULabels, tf.argmax(findCounts)) # assign label to test image based on most occurring labels among k closest Train images

# Let's run the graph
numErrs = 0
numTestImages = np.shape(tLabels)[0]
numTrainImages = np.shape(trLabels)[0] # so many train images

with tf.Session() as sess:
    for iTeI in range(0,numTestImages): # iterate each image in test set
        predictedLabel = sess.run([findPredictedLabel], feed_dict={x:trImages, y:tImages[iTeI]})
        
        if predictedLabel == tLabels[iTeI]:
            numErrs += 1
            print(numErrs,"/",iTeI)
            print("\t\t", predictedLabel[0], "\t\t\t\t", tLabels[iTeI])

            if (1):
                plt.figure(1)
                plt.subplot(1,2,1)
                plt.imshow(tImages[iTeI])
                plt.title('Test Image has label %i' %(predictedLabel[0]))

                for i in range(numTrainImages):
                    if trLabels[i] == predictedLabel:
                        plt.subplot(1,2,2)
                        plt.imshow(trImages[i])
                        plt.title('Correctly Labeled as %i' %(tLabels[iTeI]))
                        plt.draw()
                        break
                plt.show()

print("# Classification Errors= ", numErrs, "% accuracy= ", 100.*(numTestImages-numErrs)/numTestImages)
      


For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.



NameError: name 'paramk' is not defined