Pair Problem

Write a function that takes three arguments:

- A list of lists, X_train, where each inner list is of length three and represents the position of a Wookiee in space, along the traditional x, y, and z axes.
- A list of strings, y, the same length as the outer list X_train, where each string represents the color of a Wookiee at the corresponding position.
- A list of lists, X_test, where each inner list is of length three and represents the position in space of a Wookiee of unknown color.

The function should produce a list of strings, the same length as the outer list, representing for each unknown Wookiee the color of the closest known Wookiee.

For example:

```python
X_train = [[1,   1,  1],
           [0,   0,  0],
           [-1, -1, -1],
           [10, 10, 10]]

y_train = ['red',
           'white',
           'blue',
           'chartreuse']

X_test = [[1.1, 1.1, 1.1]]

for result in your_function(X_train, y_train, X_test):
    print result
## red
```

Possible extensions:

- Does your solution work for any number of features in the training data sets?
- Does your solution handle ties?
- Can you add another parameter, k, to your solution so that it uses the k nearest Wookiees instead of only the nearest Wookiee?
- Can you add to your solution so that it has reasonable behavior if y_train is numeric?  

An extension of another kind:

- Are you confident that your solution is correct? How can you ensure that it is, and check that it stays correct in the future?

In [6]:
from sklearn.neighbors import KNeighborsClassifier
import math

In [2]:
def wookiee_color(X_train, y_train, X_test, k=1):
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    return y_pred

In [3]:
X_train = [[1,   1,  1],
           [0,   0,  0],
           [-1, -1, -1],
           [10, 10, 10]]

y_train = ['red',
           'white',
           'blue',
           'chartreuse']

X_test = [[1.1, 1.1, 1.1]]

In [4]:
wookiee_color(X_train, y_train, X_test)

array(['red'], 
      dtype='|S10')

In [47]:
def dist(a, b):
    d = [(a-b)**2 for a, b in zip(a,b)]
    d = math.sqrt(sum(d))
    return d


def wookiee_color2(X_train, y_train, X_test, k=1):
    distances = []
    for x in X_train:
        distances.append(dist(x, X_test[0]))
    print distances
    distances.sort()
    
        

In [48]:
wookiee_color2(X_train, y_train, X_test)

[0.1732050807568879, 1.9052558883257653, 3.6373066958946425, 15.41525218736301]
[0.1732050807568879, 1.9052558883257653, 3.6373066958946425, 15.41525218736301]


In [None]:
dist(X_train[0], X_test[0])