## K-nearest Neighbors (applied to Iris data set)
*Rachel Buttry*

*4 April 2018*

We can re-create the K-nearest neghbors model (for classification) in tensorflow.

The following code was taken from or based off of these examples: 

* [PHYS T480 Scikit-Learn-Intro notebook](https://github.com/gtrichards/PHYS_T480/blob/master/Scikit-Learn-Intro.ipynb)
* [Tensorflow ex](https://github.com/tensorflow/models/blob/master/samples/core/get_started/iris_data.py)
* [Knn ex1](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/2_BasicModels/nearest_neighbor.py) 
* [Knn ex2](http://marubon-ds.blogspot.com/2017/09/knn-k-nearest-neighbors-by-tensorflow.html)

In [13]:
#import tensorflow and create our session
import tensorflow as tf
sess = tf.Session()

In [14]:
# Importing data thru scikitlearn 
from sklearn.datasets import load_iris
import numpy as np
iris = load_iris()

x_vals = np.array([x[0:4] for x in iris.data])
y_vals = np.array(iris.target)

# train-test split
np.random.seed(1234)
train_perc = 0.5#percent of data used for training
train_indices = np.random.choice(len(x_vals), \
            int(round(len(x_vals) * train_perc)), replace=False)
test_indices =np.array(list(set(range(len(x_vals))) - set(train_indices)))

train_data = x_vals[train_indices]
test_data = x_vals[test_indices]
train_labels = y_vals[train_indices]
test_labels = y_vals[test_indices]

###### K-Nearest Neigbor(s) 
For classification, we take the k nearest data points, look at their classifications, and see which one occurs most frequently in the sample.

Be aware that the code below uses euclidian distance and uniform weights (which are the neighbors.KNeighborsClassifier defaults). The model can be adjusted to use different distance metrics or have weights based on distance.

In [15]:
k = 5 #number of neigbors we're considering

# tf Graph Input
xtr = tf.placeholder(tf.float32, name="training_data") 
xte = tf.placeholder(tf.float32, name="testing_data")
ytr = tf.placeholder(tf.float32, name="training_labels")

# Nearest Neighbor calculation using L1 Distance
# Calculate L2 Distance
sqr_dist = tf.reduce_sum(tf.square(tf.subtract(xte, xtr)), reduction_indices=1)
distance = tf.sqrt(sqr_dist)

# nearest k points
_, top_k_indices = tf.nn.top_k(tf.negative(distance), k=k)
top_k_labels = tf.gather(ytr, top_k_indices)

# get uniform weights
weights = tf.constant(1./k)

ordered_labels, indx, count = tf.unique_with_counts(top_k_labels)


weighted_count = tf.multiply(tf.cast(count, tf.float32), weights)
pred = ordered_labels[tf.argmax(weighted_count)]


# Prediction: Get min distance index (Nearest neighbor)
#pred = tf.argmin(distance, 0) #k = 1 (only looking at single nearest neighbor)

In [16]:
accuracy = 0.0

# Initialize the variables (i.e. assign their default value)
#init = tf.global_variables_initializer()

# Run the initializer
#sess.run(init) 

# loop over test data
for i in range(len(test_data)):
    # Get nearest neighbor
    nn_index = sess.run(pred, {xtr:train_data, xte: test_data[i, :], ytr: train_labels})
    prediction = int(nn_index)#int for indexing the target names
    actual_label = test_labels[i]
    print("Test", i, "Prediction:", iris['target_names'][prediction], \
        "True Class:", iris['target_names'][actual_label])
    # Calculate accuracy
    if prediction == actual_label:
        accuracy += 1./len(test_data)

('Test', 0, 'Prediction:', 'virginica', 'True Class:', 'virginica')
('Test', 1, 'Prediction:', 'virginica', 'True Class:', 'virginica')
('Test', 2, 'Prediction:', 'setosa', 'True Class:', 'setosa')
('Test', 3, 'Prediction:', 'setosa', 'True Class:', 'setosa')
('Test', 4, 'Prediction:', 'virginica', 'True Class:', 'virginica')
('Test', 5, 'Prediction:', 'setosa', 'True Class:', 'setosa')
('Test', 6, 'Prediction:', 'setosa', 'True Class:', 'setosa')
('Test', 7, 'Prediction:', 'virginica', 'True Class:', 'virginica')
('Test', 8, 'Prediction:', 'virginica', 'True Class:', 'virginica')
('Test', 9, 'Prediction:', 'virginica', 'True Class:', 'virginica')
('Test', 10, 'Prediction:', 'virginica', 'True Class:', 'virginica')
('Test', 11, 'Prediction:', 'setosa', 'True Class:', 'setosa')
('Test', 12, 'Prediction:', 'setosa', 'True Class:', 'setosa')
('Test', 13, 'Prediction:', 'setosa', 'True Class:', 'setosa')
('Test', 14, 'Prediction:', 'setosa', 'True Class:', 'setosa')
('Test', 15, 'Predictio

In [17]:
print "Accuracy: ", accuracy

Accuracy:  0.96


In [6]:
#visualize out method using tensorgraph
tf.reset_default_graph()#reset graph
writer = tf.summary.FileWriter("./graphs/iris_knn", sess.graph)
writer.close()

The graph looks like this:
![KNN Graph](./graphs/iris_knn.png)

### The scikit-learn way
We can also just use the built-in scikit-learn KNN model. I'll just run it to check if the accuracy is comparable

In [18]:
from sklearn import neighbors, datasets

iris = datasets.load_iris()
X, y = iris.data, iris.target

# create the model
knn = neighbors.KNeighborsClassifier(n_neighbors=k)

# fit the model
#knn.fit(X,y)
knn.fit(train_data, train_labels)

#test an example
#Xtest = np.array([3.0, 5.0, 4.0, 2.0]).reshape(1,-1)
accuracy_sk = 0.0
for j,a in zip(test_data, test_labels):
    result = knn.predict(j.reshape(1,-1))
    print("Prediction:", iris['target_names'][result][0], \
    "True Class:", iris['target_names'][a])
    # Calculate accuracy
    if result == a:
        accuracy_sk += 1./len(test_data)

('Prediction:', 'virginica', 'True Class:', 'virginica')
('Prediction:', 'virginica', 'True Class:', 'virginica')
('Prediction:', 'setosa', 'True Class:', 'setosa')
('Prediction:', 'setosa', 'True Class:', 'setosa')
('Prediction:', 'virginica', 'True Class:', 'virginica')
('Prediction:', 'setosa', 'True Class:', 'setosa')
('Prediction:', 'setosa', 'True Class:', 'setosa')
('Prediction:', 'virginica', 'True Class:', 'virginica')
('Prediction:', 'virginica', 'True Class:', 'virginica')
('Prediction:', 'virginica', 'True Class:', 'virginica')
('Prediction:', 'virginica', 'True Class:', 'virginica')
('Prediction:', 'setosa', 'True Class:', 'setosa')
('Prediction:', 'setosa', 'True Class:', 'setosa')
('Prediction:', 'setosa', 'True Class:', 'setosa')
('Prediction:', 'setosa', 'True Class:', 'setosa')
('Prediction:', 'setosa', 'True Class:', 'setosa')
('Prediction:', 'setosa', 'True Class:', 'setosa')
('Prediction:', 'setosa', 'True Class:', 'setosa')
('Prediction:', 'virginica', 'True Class

In [19]:
print "Accuracy: ", accuracy_sk 

Accuracy:  0.96


The accuracies are the same. 