#Building an image retrieval system with deep features


#Fire up GraphLab Create

In [921]:
import pandas as pd

#Load the CIFAR-10 dataset

We will use a popular benchmark dataset in computer vision called CIFAR-10.  

(We've reduced the data to just 4 categories = {'cat','bird','automobile','dog'}.)

This dataset is already split into a training set and test set. In this simple retrieval example, there is no notion of "testing", so we will only use the training data.

In [922]:
image_train = pd.read_csv('image_train_data.csv')

#Computing deep features for our images

The two lines below allow us to compute deep features.  This computation takes a little while, so we have already computed them and saved the results as a column in the data you loaded. 

(Note that if you would like to compute such deep features and have a GPU on your machine, you should use the GPU enabled GraphLab Create, which will be significantly faster for this task.)

In [923]:
image_train.head()

Unnamed: 0,id,image,label,deep_features,image_array
0,24,Height: 32 Width: 32,bird,[0.242872 1.09545 0 0.39363 0 0 11.8949 0 0 0 ...,[73 77 58 71 68 50 77 69 44 120 116 83 125 120...
1,33,Height: 32 Width: 32,cat,[0.525088 0 0 0 0 0 9.94829 0 0 0 0 0 1.01264 ...,[7 5 8 7 5 8 5 4 6 7 4 7 11 5 9 11 5 9 17 11 1...
2,36,Height: 32 Width: 32,cat,[0.566016 0 0 0 0 0 9.9972 0 0 0 1.38345 0 0.7...,[169 122 65 131 108 75 193 196 192 218 221 222...
3,70,Height: 32 Width: 32,dog,[1.1298 0 0 0.778194 0 0.758051 9.83053 0 0 0....,[154 179 152 159 183 157 165 189 162 174 199 1...
4,90,Height: 32 Width: 32,bird,[1.71787 0 0 0 0 0 9.33936 0 0 0 0 0 0.412137 ...,[216 195 180 201 178 160 210 184 164 212 188 1...


Computing summary statistics of the data: Sketch summaries are techniques for computing summary statistics of data very quickly. Using the training data, compute the sketch summary of the ‘label’ column and interpret the results. What’s the least common category in the training data? Save this result to answer the quiz at the end.

In [924]:
sketch_sum = image_train.groupby(['label'],sort=0).apply( lambda g: pd.Series({\
    'count': g.label.count(),\
    }))
sketch_sum

Unnamed: 0_level_0,count
label,Unnamed: 1_level_1
bird,478
cat,509
dog,509
automobile,509


In [925]:
print('answer: the least commom category in the training data is bird')

answer: the least commom category in the training data is bird


Creating category-specific image retrieval models: In most retrieval tasks, the data we have is unlabeled, thus we call these unsupervised learning problems. However, we have labels in this image dataset, and will use these to create one model for each of the 4 image categories, {‘dog’,’cat’,’automobile’,bird’}. To start, follow these steps:

Split the dataframe with the training data into 4 different SFrames. Each of these will contain data for 1 of the 4 categories above. Hint: if you use a logical filter to select the rows where the ‘label’ column equals ‘dog’, you can create an SFrame with only the data for images labeled ‘dog’.

In [926]:
image_train_bird=image_train[image_train['label']=='bird']
image_train_bird

Unnamed: 0,id,image,label,deep_features,image_array
0,24,Height: 32 Width: 32,bird,[0.242872 1.09545 0 0.39363 0 0 11.8949 0 0 0 ...,[73 77 58 71 68 50 77 69 44 120 116 83 125 120...
4,90,Height: 32 Width: 32,bird,[1.71787 0 0 0 0 0 9.33936 0 0 0 0 0 0.412137 ...,[216 195 180 201 178 160 210 184 164 212 188 1...
7,121,Height: 32 Width: 32,bird,[0 0.237535 0 0 0 0 9.9908 0 0 0 0 0 1.58587 0...,[93 96 88 102 106 97 117 121 111 118 122 110 1...
9,138,Height: 32 Width: 32,bird,[0.658936 0 0 0 0 0 9.93748 0 0 0 0 0 0.244432...,[205 193 195 200 187 193 202 190 193 173 162 1...
16,335,Height: 32 Width: 32,bird,[0 0 0 0 0 0 8.50707 0 0 0.699767 0.914553 0 1...,[160 159 154 162 161 156 169 168 163 173 172 1...
...,...,...,...,...,...
1983,49471,Height: 32 Width: 32,bird,[0.105465 0 0 0 0 0.145688 6.96681 0.972185 0 ...,[118 120 135 121 123 138 127 129 144 132 136 1...
1987,49573,Height: 32 Width: 32,bird,[0.609868 0 0 0 0 0 11.1038 0 0 0 0 0 2.32751 ...,[136 108 65 120 117 86 108 113 92 110 111 95 1...
1989,49617,Height: 32 Width: 32,bird,[1.06615 0 0 0.40126 0 0 9.30395 0 0 0 0 0 1.4...,[115 114 79 103 114 63 102 112 69 116 117 80 1...
1993,49800,Height: 32 Width: 32,bird,[0.301113 0 0.696338 0 0 0 9.68343 0.524159 0 ...,[150 147 128 151 146 127 156 150 129 159 152 1...


In [927]:
image_train_cat=image_train[image_train['label']=='cat']
image_train_dog=image_train[image_train['label']=='dog']
image_train_auto=image_train[image_train['label']=='automobile']

Similarly to the image retrieval notebook you downloaded, you are going to create a nearest neighbor model using the 'deep_features' as the features, but this time create one such model for each category, using the corresponding subset of the training_data. You can call the model with the ‘dog’ data the dog_model, the one with the ‘cat’ data the cat_model, as so on.

In [928]:
mystring = image_train_bird['deep_features'].loc[0]
type(mystring)

str

In [929]:
mystring=np.array(mystring[1:-1].split()).astype(np.float)

In [930]:
mystring

array([ 0.242872,  1.09545 ,  0.      , ...,  0.      ,  0.      ,  0.      ])

In [931]:
bird_features=image_train_bird['deep_features']
for i in range(len(bird_features)):
    bird_features.iloc[i]=np.array(image_train_bird['deep_features'].iloc[i][1:-1].split()).astype(np.float)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [932]:
bird_features

0       [0.242872, 1.09545, 0.0, 0.39363, 0.0, 0.0, 11...
4       [1.71787, 0.0, 0.0, 0.0, 0.0, 0.0, 9.33936, 0....
7       [0.0, 0.237535, 0.0, 0.0, 0.0, 0.0, 9.9908, 0....
9       [0.658936, 0.0, 0.0, 0.0, 0.0, 0.0, 9.93748, 0...
16      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.50707, 0.0, 0...
                              ...                        
1983    [0.105465, 0.0, 0.0, 0.0, 0.0, 0.145688, 6.966...
1987    [0.609868, 0.0, 0.0, 0.0, 0.0, 0.0, 11.1038, 0...
1989    [1.06615, 0.0, 0.0, 0.40126, 0.0, 0.0, 9.30395...
1993    [0.301113, 0.0, 0.696338, 0.0, 0.0, 0.0, 9.683...
1998    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.30747, 0.0, 0...
Name: deep_features, Length: 478, dtype: object

In [933]:
bird_data = np.array([np.array(i) for i in bird_features])

In [934]:
bird_data

array([[ 0.242872,  1.09545 ,  0.      , ...,  0.      ,  0.      ,  0.      ],
       [ 1.71787 ,  0.      ,  0.      , ...,  0.      ,  0.      ,  0.      ],
       [ 0.      ,  0.237535,  0.      , ...,  0.      ,  0.      ,  0.      ],
       ..., 
       [ 1.06615 ,  0.      ,  0.      , ...,  2.28255 ,  0.      ,  0.      ],
       [ 0.301113,  0.      ,  0.696338, ...,  0.447935,  0.      ,
         0.980022],
       [ 0.      ,  0.      ,  0.      , ...,  0.      ,  0.      ,  0.      ]])

In [935]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.neighbors import NearestNeighbors
image_train_bird=image_train_bird.reset_index(drop=True)
bird_model = NearestNeighbors(metric='euclidean', algorithm='brute')
bird_model.fit(bird_data)

NearestNeighbors(algorithm='brute', leaf_size=30, metric='euclidean',
         metric_params=None, n_jobs=1, n_neighbors=5, p=2, radius=1.0)

In [936]:
image_test= pd.read_csv('image_test_data.csv')
image_test

Unnamed: 0,id,image,label,deep_features,image_array
0,0,Height: 32 Width: 32,cat,[1.13469 0 0 0 0.0366498 0 9.3536 0 0 0 0 0 0 ...,[158 112 49 159 111 47 165 116 51 166 118 53 1...
1,6,Height: 32 Width: 32,automobile,[0.231359 0 0 0 0 0.226023 8.85989 0 0 0 1.306...,[160 37 13 185 49 11 209 57 14 217 58 10 230 6...
2,8,Height: 32 Width: 32,cat,[0 0 0.0344192 0 0 0 11.0375 0 0 0 0 0 0 0 0 0...,[23 19 23 19 21 28 21 16 19 65 47 40 164 131 1...
3,9,Height: 32 Width: 32,automobile,[0 0 0 0 0 0 11.6065 0 0 0 1.54379 0 0.951929 ...,[217 215 209 210 208 202 205 208 191 199 202 1...
4,12,Height: 32 Width: 32,dog,[0.322317 0 1.24933 0 0 0 9.10822 0 0 0 0 0 2....,[91 64 30 82 58 30 87 73 59 89 87 83 95 92 80 ...
...,...,...,...,...,...
3995,9993,Height: 32 Width: 32,dog,[0 0.465366 0.226424 0 0 0 8.92032 0 0 0 0.586...,[95 98 92 93 92 90 96 89 92 92 84 89 83 80 81 ...
3996,9994,Height: 32 Width: 32,cat,[1.3185 0 0.714549 0 0 0 12.9369 0 0 0 0 0 0.0...,[68 44 13 55 27 1 81 46 11 109 70 23 139 107 5...
3997,9996,Height: 32 Width: 32,cat,[0.098765 0 0 0 0 0 8.47994 1.23358 0 0 0 0 0 ...,[81 57 43 91 69 53 98 75 63 106 80 64 108 85 6...
3998,9997,Height: 32 Width: 32,dog,[0 0 0 0 0 0 9.76592 0 0 0 0 0 0 1.01555 0 1.1...,[20 15 12 19 14 11 15 14 11 15 14 11 14 13 10 ...


In [937]:
image_test.iloc[0]

id                                                               0
image                                         Height: 32 Width: 32
label                                                          cat
deep_features    [1.13469 0 0 0 0.0366498 0 9.3536 0 0 0 0 0 0 ...
image_array      [158 112 49 159 111 47 165 116 51 166 118 53 1...
Name: 0, dtype: object

In [938]:
image_test[0:1]

Unnamed: 0,id,image,label,deep_features,image_array
0,0,Height: 32 Width: 32,cat,[1.13469 0 0 0 0.0366498 0 9.3536 0 0 0 0 0 0 ...,[158 112 49 159 111 47 165 116 51 166 118 53 1...


In [939]:
image_train_cat=image_train_cat.reset_index(drop=True)

In [940]:
# train a cat model
cat_features=image_train_cat['deep_features']
for i in range(len(cat_features)):
    cat_features.iloc[i]=np.array(image_train_cat['deep_features'].iloc[i][1:-1].split()).astype(np.float)
cat_data = [np.array(i) for i in cat_features] 
cat_model = NearestNeighbors(metric='euclidean',algorithm='brute')
cat_model.fit(cat_data)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)


NearestNeighbors(algorithm='brute', leaf_size=30, metric='euclidean',
         metric_params=None, n_jobs=1, n_neighbors=5, p=2, radius=1.0)

In [941]:
# train a dog model
image_train_dog=image_train_dog.reset_index(drop=True)
dog_features=image_train_dog['deep_features']
for i in range(len(dog_features)):
    dog_features.iloc[i]=np.array(image_train_dog['deep_features'].iloc[i][1:-1].split()).astype(np.float)
dog_data = [np.array(i) for i in dog_features] 
dog_model = NearestNeighbors(metric='euclidean', algorithm='brute')
dog_model.fit(dog_data)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)


NearestNeighbors(algorithm='brute', leaf_size=30, metric='euclidean',
         metric_params=None, n_jobs=1, n_neighbors=5, p=2, radius=1.0)

In [942]:
# train an auto model
image_train_auto=image_train_auto.reset_index(drop=True)
auto_features=image_train_auto['deep_features']
for i in range(len(auto_features)):
    auto_features.iloc[i]=np.array(image_train_auto['deep_features'].iloc[i][1:-1].split()).astype(np.float)
auto_data = [np.array(i) for i in auto_features] 
auto_model = NearestNeighbors(metric='euclidean', algorithm='brute')
auto_model.fit(auto_data)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)


NearestNeighbors(algorithm='brute', leaf_size=30, metric='euclidean',
         metric_params=None, n_jobs=1, n_neighbors=5, p=2, radius=1.0)

What is the nearest ‘cat’ labeled image in the training data to the cat image above (the first image in the test data)? Save this result.

In [943]:
test_features=image_test['deep_features']
for i in range(len(test_features)):
    test_features.iloc[i]=np.array(image_test['deep_features'].iloc[i][1:-1].split()).astype(np.float)
test_data = [np.array(i) for i in test_features] 

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)


In [944]:
test_data[0:1]

[array([ 1.13469 ,  0.      ,  0.      , ...,  0.279114,  0.      ,  0.      ])]

In [945]:
distances, indices = cat_model.kneighbors(test_data[0:1], n_neighbors=5)

In [946]:
neighbors_cat = pd.DataFrame({'distance':distances[0].tolist(), 'index':indices[0].tolist()})
#print (pd.merge(wiki, neighbors, on='id', how='inner').sort_values(by=['distance'])[['id','name','distance']])

In [947]:
neighbors_cat

Unnamed: 0,distance,index
0,34.623722,181
1,36.00688,467
2,36.520081,323
3,36.754848,269
4,36.873115,3


In [948]:
image_train_cat['index']=image_train_cat.index

In [949]:
image_train_cat

Unnamed: 0,id,image,label,deep_features,image_array,index
0,33,Height: 32 Width: 32,cat,"[0.525088, 0.0, 0.0, 0.0, 0.0, 0.0, 9.94829, 0...",[7 5 8 7 5 8 5 4 6 7 4 7 11 5 9 11 5 9 17 11 1...,0
1,36,Height: 32 Width: 32,cat,"[0.566016, 0.0, 0.0, 0.0, 0.0, 0.0, 9.9972, 0....",[169 122 65 131 108 75 193 196 192 218 221 222...,1
2,159,Height: 32 Width: 32,cat,"[0.0, 0.0, 0.0, 0.643275, 0.0, 0.0, 10.1772, 0...",[154 145 135 152 144 135 157 146 136 152 138 1...,2
3,331,Height: 32 Width: 32,cat,"[0.0, 0.0, 0.510964, 0.0, 0.0, 0.0, 11.2724, 0...",[45 65 92 72 95 110 106 132 129 106 132 129 10...,3
4,367,Height: 32 Width: 32,cat,"[1.38658, 0.0, 0.0, 0.0, 0.0, 0.182891, 10.395...",[168 151 143 145 130 124 134 134 136 178 173 1...,4
...,...,...,...,...,...,...
504,49832,Height: 32 Width: 32,cat,"[0.188276, 0.0, 0.0, 0.0, 0.0, 0.0, 8.06711, 0...",[80 72 60 77 69 58 78 70 59 80 72 61 79 71 60 ...,504
505,49840,Height: 32 Width: 32,cat,"[0.0344251, 0.0, 0.0, 0.0, 0.0, 0.0, 10.0338, ...",[150 147 143 162 158 157 154 150 151 156 151 1...,505
506,49896,Height: 32 Width: 32,cat,"[0.0, 0.0, 0.592454, 0.0, 0.0, 0.0, 7.29909, 0...",[181 187 187 185 190 190 190 194 195 190 194 1...,506
507,49958,Height: 32 Width: 32,cat,"[0.67496, 0.0, 0.0, 1.96409, 0.646847, 0.17025...",[102 142 139 97 136 133 98 137 134 98 137 135 ...,507


In [950]:
cat_neibor = pd.merge(neighbors_cat, image_train_cat, how='left', on='index') 
cat_neibor

Unnamed: 0,distance,index,id,image,label,deep_features,image_array
0,34.623722,181,16289,Height: 32 Width: 32,cat,"[0.964288, 0.0, 0.0, 0.0, 1.12516, 0.0, 9.3121...",[215 219 231 215 219 232 216 219 233 214 217 2...
1,36.00688,467,45646,Height: 32 Width: 32,cat,"[0.983678, 0.0, 0.0, 0.0, 0.0, 0.192085, 9.896...",[51 42 26 56 47 31 59 50 34 60 50 34 63 53 37 ...
2,36.520081,323,32139,Height: 32 Width: 32,cat,"[1.29409, 0.0, 0.0, 0.5138, 0.106392, 0.144628...",[217 220 205 221 227 218 195 209 205 156 176 1...
3,36.754848,269,25713,Height: 32 Width: 32,cat,"[0.536971, 0.0, 0.0, 0.0894459, 0.236474, 0.36...",[228 222 236 224 213 222 212 206 207 209 202 2...
4,36.873115,3,331,Height: 32 Width: 32,cat,"[0.0, 0.0, 0.510964, 0.0, 0.0, 0.0, 11.2724, 0...",[45 65 92 72 95 110 106 132 129 106 132 129 10...


In [966]:
print('answer: the nearest ‘cat’ labeled image in the training data to the cat image above is id 16289')

answer: the nearest ‘cat’ labeled image in the training data to the cat image above is id 16289


What is the nearest ‘dog’ labeled image in the training data to the cat image above (the first image in the test data)? Save this result.

In [952]:
test_data[0:1]

[array([ 1.13469 ,  0.      ,  0.      , ...,  0.279114,  0.      ,  0.      ])]

In [965]:
distances, indices = dog_model.kneighbors(test_data[0:1], n_neighbors=5)
neighbors_dog = pd.DataFrame({'distance':distances[0].tolist(), 'index':indices[0].tolist()})
image_train_dog['index']=image_train_dog.index
dog_neibor=pd.merge(neighbors_dog, image_train_dog, how='left', on='index') 
dog_neibor

Unnamed: 0,distance,index,id,image,label,deep_features,image_array
0,37.464261,159,16976,Height: 32 Width: 32,dog,"[0.755595, 0.0, 0.0, 0.0, 0.0, 0.0, 9.46039, 0...",[16 17 11 18 19 13 20 21 15 24 25 19 26 27 21 ...
1,37.566683,129,13387,Height: 32 Width: 32,dog,"[0.366494, 0.0, 0.0, 0.0, 0.0, 0.0, 8.91574, 0...",[255 255 255 255 255 255 255 255 255 255 255 2...
2,37.604727,362,35867,Height: 32 Width: 32,dog,"[0.305321, 0.0, 0.0, 0.0, 0.0, 0.0, 10.2703, 0...",[101 93 9 93 88 9 90 86 9 99 92 9 113 101 10 1...
3,37.706559,445,44603,Height: 32 Width: 32,dog,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 11.2647, 0.0, 0...",[8 25 9 29 39 22 66 75 53 70 82 57 75 87 64 76...
4,38.511329,67,6094,Height: 32 Width: 32,dog,"[0.470534, 0.0, 0.0, 0.0, 0.0, 0.0, 9.27895, 0...",[91 98 71 138 123 63 135 115 50 116 99 54 118 ...


In [964]:
print('answer: the nearest ‘dog’ labeled image in the training data to the cat image above is id 16976')

answer: the nearest ‘dog’ labeled image in the training data to the cat image above is id 16976


A simple example of nearest-neighbors classification: When we queried a nearest neighbors model, the ‘distance’ column in the table above shows the computed distance between the input and each of the retrieved neighbors. In this question, you will

For the first image in the test data (image_test[0:1]), which we used above, compute the mean distance between this image at its 5 nearest neighbors that were labeled ‘cat’ in the training data (similarly to what you did in the previous question). Save this result.

In [955]:
cat_neibor['distance'].mean()

36.15572932231886

In [956]:
print('answer:the mean distance between this image at its 5 nearest neighbors that were labeled ‘cat’ in the training data is',cat_neibor['distance'].mean())

answer:the mean distance between this image at its 5 nearest neighbors that were labeled ‘cat’ in the training data is 36.15572932231886


Similarly, for the first image in the test data (image_test[0:1]), which we used above, compute the mean distance between this image at its 5 nearest neighbors that were labeled ‘dog’ in the training data (similarly to what you did in the previous question). Save this result.

In [957]:
print('answer:the mean distance between this image at its 5 nearest neighbors that were labeled ‘cat’ in the training data is',dog_neibor['distance'].mean())

answer:the mean distance between this image at its 5 nearest neighbors that were labeled ‘cat’ in the training data is 37.770711933529554


On average, is the first image in the test data closer to its 5 nearest neighbors in the ‘cat’ data or in the ‘dog’ data? (In a later course, we will see that this is an example of what is called a k-nearest neighbors classifier, where we use the label of neighboring points to predict the label of a test point.)

In [958]:
print('answer:On average, the first image is in the test data closer to its 5 nearest neighbors in the ‘cat’ data or in the ‘dog’ data')

answer:On average, the first image is in the test data closer to its 5 nearest neighbors in the ‘cat’ data or in the ‘dog’ data


Challenging Question] Computing nearest neighbors accuracy using SFrame operations: A nearest neighbor classifier predicts the label of a point as the most common label of its nearest neighbors. In this question, we will measure the accuracy of a 1-nearest-neighbor classifier, i.e., predict the output as the label of the nearest neighbor in the training data. Although there are simpler ways of computing this result, we will go step-by-step here to introduce you to more concepts in nearest neighbors and SFrames, which will be useful later in this Specialization.

Training models: For this question, you will need the nearest neighbors models you learned above on the training data, i.e., the dog_model, cat_model, automobile_model and bird_model.

Spliting test data by label: Above, you split the train data SFrame into one SFrame for images labeled ‘dog’, another for those labeled ‘cat’, etc. Now, do the same for the test data. You can call the resulting SFrames

In [959]:
image_test_bird=image_test[image_test['label']=='bird']
image_test_cat=image_test[image_test['label']=='cat']
image_test_dog=image_test[image_test['label']=='dog']
image_test_auto=image_test[image_test['label']=='automobile']

Finding nearest neighbors in the training set for each part of the test set: Thus far, we have queried, e.g.,

In [960]:
test_dog_index=image_test[image_test['label']=='dog'].index

In [961]:
image_test[image_test['label']=='dog']

Unnamed: 0,id,image,label,deep_features,image_array
4,12,Height: 32 Width: 32,dog,"[0.322317, 0.0, 1.24933, 0.0, 0.0, 0.0, 9.1082...",[91 64 30 82 58 30 87 73 59 89 87 83 95 92 80 ...
5,16,Height: 32 Width: 32,dog,"[0.0, 0.0, 0.347357, 0.0, 0.0, 0.0, 9.98674, 0...",[95 76 78 92 77 78 89 77 77 86 75 75 89 79 78 ...
6,24,Height: 32 Width: 32,dog,"[1.31558, 0.0, 0.0, 0.0, 0.0, 0.0, 8.71812, 0....",[136 134 118 142 141 126 149 150 132 163 163 1...
8,31,Height: 32 Width: 32,dog,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.26019, 0.0, 0...",[127 130 81 130 133 88 135 137 96 132 134 96 1...
9,33,Height: 32 Width: 32,dog,"[0.130787, 0.727667, 0.0, 0.0, 0.0, 0.0, 10.11...",[118 113 81 122 117 83 116 104 71 103 92 61 11...
...,...,...,...,...,...
3988,9976,Height: 32 Width: 32,dog,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.5605, 0.0, 0...",[139 133 67 154 147 84 198 200 165 196 200 156...
3989,9977,Height: 32 Width: 32,dog,"[0.0, 0.0, 1.38838, 0.0, 0.0, 0.0, 9.66064, 0....",[11 7 6 11 7 6 11 7 6 11 7 6 11 7 6 12 6 6 12 ...
3992,9985,Height: 32 Width: 32,dog,"[0.805116, 0.0, 0.0, 0.0, 0.0, 0.0, 7.75086, 0...",[83 95 91 67 89 106 93 104 107 147 149 134 109...
3995,9993,Height: 32 Width: 32,dog,"[0.0, 0.465366, 0.226424, 0.0, 0.0, 0.0, 8.920...",[95 98 92 93 92 90 96 89 92 92 84 89 83 80 81 ...


In [976]:
image_test_dog=np.array([test_data[i] for i in test_dog_index])
image_test_dog

array([[ 0.322317 ,  0.       ,  1.24933  , ...,  0.       ,  0.       ,
         0.       ],
       [ 0.       ,  0.       ,  0.347357 , ...,  1.15925  ,  0.       ,
         0.       ],
       [ 1.31558  ,  0.       ,  0.       , ...,  1.35664  ,  0.       ,
         0.       ],
       ..., 
       [ 0.805116 ,  0.       ,  0.       , ...,  0.0568864,  0.       ,
         0.       ],
       [ 0.       ,  0.465366 ,  0.226424 , ...,  0.       ,  0.       ,
         0.0524249],
       [ 0.       ,  0.       ,  0.       , ...,  0.       ,  0.       ,
         0.       ]])

In [974]:
image_test_dog[1]

array([ 0.      ,  0.      ,  0.347357, ...,  1.15925 ,  0.      ,  0.      ])

In [991]:
image_test_dog[1].reshape(-1,2)

array([[ 0.      ,  0.      ],
       [ 0.347357,  0.      ],
       [ 0.      ,  0.      ],
       ..., 
       [ 0.      ,  0.      ],
       [ 0.      ,  1.15925 ],
       [ 0.      ,  0.      ]])

In [987]:
len(image_test_dog)

1000

In [1006]:
x,y=cat_model.kneighbors([image_test_dog[1]], n_neighbors=1)
x,y

(array([[ 38.83532566]]), array([[308]]))

In [1009]:
distance=[]
indices=[]
for i in range(len(image_test_dog)):
    x,y= cat_model.kneighbors([image_test_dog[i]], n_neighbors=1)
    distance.append(x)
    indices.append(y)
distance

[array([[ 36.41960343]]),
 array([[ 38.83532566]]),
 array([[ 36.97633928]]),
 array([[ 34.5750053]]),
 array([[ 34.77882715]]),
 array([[ 35.11715833]]),
 array([[ 40.6095825]]),
 array([[ 39.9036863]]),
 array([[ 38.0674688]]),
 array([[ 42.72587557]]),
 array([[ 40.07334974]]),
 array([[ 31.66335422]]),
 array([[ 37.71246648]]),
 array([[ 39.09035408]]),
 array([[ 49.57967062]]),
 array([[ 36.07738736]]),
 array([[ 36.50902209]]),
 array([[ 44.95243864]]),
 array([[ 33.43683405]]),
 array([[ 34.32455865]]),
 array([[ 34.71471183]]),
 array([[ 33.23747815]]),
 array([[ 34.42538715]]),
 array([[ 34.51304809]]),
 array([[ 37.98423897]]),
 array([[ 41.93784156]]),
 array([[ 39.47553429]]),
 array([[ 37.61346937]]),
 array([[ 36.46095599]]),
 array([[ 32.54458124]]),
 array([[ 36.83135593]]),
 array([[ 37.85902324]]),
 array([[ 41.78117334]]),
 array([[ 35.53206849]]),
 array([[ 33.12115719]]),
 array([[ 34.95202857]]),
 array([[ 33.21653943]]),
 array([[ 30.27480793]]),
 array([[ 35.256

In [1010]:
indices

[array([[0]]),
 array([[308]]),
 array([[58]]),
 array([[224]]),
 array([[84]]),
 array([[484]]),
 array([[157]]),
 array([[129]]),
 array([[464]]),
 array([[453]]),
 array([[466]]),
 array([[32]]),
 array([[41]]),
 array([[120]]),
 array([[425]]),
 array([[135]]),
 array([[107]]),
 array([[496]]),
 array([[45]]),
 array([[438]]),
 array([[464]]),
 array([[340]]),
 array([[34]]),
 array([[328]]),
 array([[499]]),
 array([[208]]),
 array([[176]]),
 array([[211]]),
 array([[145]]),
 array([[348]]),
 array([[346]]),
 array([[276]]),
 array([[43]]),
 array([[277]]),
 array([[383]]),
 array([[505]]),
 array([[419]]),
 array([[146]]),
 array([[344]]),
 array([[495]]),
 array([[126]]),
 array([[419]]),
 array([[158]]),
 array([[63]]),
 array([[475]]),
 array([[419]]),
 array([[35]]),
 array([[403]]),
 array([[396]]),
 array([[297]]),
 array([[244]]),
 array([[35]]),
 array([[32]]),
 array([[321]]),
 array([[7]]),
 array([[383]]),
 array([[81]]),
 array([[58]]),
 array([[122]]),
 array([[190]]

Create an SFrame with the distances from ‘dog’ test examples to the respective nearest neighbors in each class in the training data: The ‘distance’ column in dog_cat_neighbors above contains the distance between each ‘dog’ image in the test set and its nearest ‘cat’ image in the training set. The question we want to answer is how many of the test set ‘dog’ images are closer to a ‘dog’ in the training set than to a ‘cat’, ‘automobile’ or ‘bird’. So, next we will create an SFrame containing just these distances per data point. The goal is to create an SFrame called dog_distances with 4 columns: