# Building an image retrieval system with deep features


# Fire up GraphLab Create
(See [Getting Started with SFrames](../Week%201/Getting%20Started%20with%20SFrames.ipynb) for setup instructions)

In [1]:
import numpy as np
import pandas as pd

# Load the CIFAR-10 dataset

We will use a popular benchmark dataset in computer vision called CIFAR-10.  

(We've reduced the data to just 4 categories = {'cat','bird','automobile','dog'}.)

This dataset is already split into a training set and test set. In this simple retrieval example, there is no notion of "testing", so we will only use the training data.

In [2]:
image_train = pd.read_csv('image_train_data.csv')
image_test = pd.read_csv('image_test_data.csv')

# Computing deep features for our images

The two lines below allow us to compute deep features.  This computation takes a little while, so we have already computed them and saved the results as a column in the data you loaded. 

(Note that if you would like to compute such deep features and have a GPU on your machine, you should use the GPU enabled GraphLab Create, which will be significantly faster for this task.)

In [3]:
# deep_learning_model = graphlab.load_model('http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45')
# image_train['deep_features'] = deep_learning_model.extract_features(image_train)

In [4]:
image_train.head()

Unnamed: 0,id,image,label,deep_features,image_array
0,24,Height: 32 Width: 32,bird,[0.242872 1.09545 0 0.39363 0 0 11.8949 0 0 0 ...,[73 77 58 71 68 50 77 69 44 120 116 83 125 120...
1,33,Height: 32 Width: 32,cat,[0.525088 0 0 0 0 0 9.94829 0 0 0 0 0 1.01264 ...,[7 5 8 7 5 8 5 4 6 7 4 7 11 5 9 11 5 9 17 11 1...
2,36,Height: 32 Width: 32,cat,[0.566016 0 0 0 0 0 9.9972 0 0 0 1.38345 0 0.7...,[169 122 65 131 108 75 193 196 192 218 221 222...
3,70,Height: 32 Width: 32,dog,[1.1298 0 0 0.778194 0 0.758051 9.83053 0 0 0....,[154 179 152 159 183 157 165 189 162 174 199 1...
4,90,Height: 32 Width: 32,bird,[1.71787 0 0 0 0 0 9.33936 0 0 0 0 0 0.412137 ...,[216 195 180 201 178 160 210 184 164 212 188 1...


# Train a nearest-neighbors model for retrieving images using deep features

We will now build a simple image retrieval system that finds the nearest neighbors for any image.

In [5]:
from sklearn.neighbors import KNeighborsClassifier

In [6]:
image_train['deep_features']=image_train['deep_features'].apply(lambda x:[float(i) for i in x[1:-1].split(' ')])
image_test['deep_features']=image_test['deep_features'].apply(lambda x:[float(i) for i in x[1:-1].split(' ')])

In [18]:
train_deep_features = np.array([i for i in image_train['deep_features'].values])
train_y = image_train.id.values

In [19]:
train_deep_features

array([[ 0.242872 ,  1.09545  ,  0.       , ...,  0.       ,  0.       ,
         0.       ],
       [ 0.525088 ,  0.       ,  0.       , ...,  0.       ,  0.       ,
         0.50845  ],
       [ 0.566016 ,  0.       ,  0.       , ...,  0.       ,  0.       ,
         0.       ],
       ..., 
       [ 0.558163 ,  0.       ,  1.0511   , ...,  0.0337387,  0.       ,
         0.       ],
       [ 0.67496  ,  0.       ,  0.       , ...,  0.       ,  0.       ,
         0.       ],
       [ 1.07502  ,  0.       ,  0.       , ...,  0.       ,  0.       ,
         0.       ]])

In [20]:
knn_model = KNeighborsClassifier()
knn_model.fit(train_deep_features, train_y)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform')

# Use image retrieval model with deep features to find similar images
Let's find similar images to this cat picture.

In [21]:
cat = train_deep_features[18]

In [22]:
cat

array([ 1.04404,  0.     ,  0.     , ...,  0.     ,  0.     ,  0.     ])

In [23]:
dist, ind = knn_model.kneighbors(cat)



In [25]:
dist

array([[  0.        ,  36.94031242,  38.4634892 ,  39.75596657,  39.7865973 ]])

In [26]:
ind

array([[  18,  288, 1565, 1468, 1633]])

In [28]:
label = image_train['label'][ind[0]]
label

18      cat
288     cat
1565    cat
1468    cat
1633    cat
Name: label, dtype: object

In [29]:
pd.DataFrame({'ind': ind[0], 'dist': dist[0], 'label': label})

Unnamed: 0,dist,ind,label
18,0.0,18,cat
288,36.940312,288,cat
1565,38.463489,1565,cat
1468,39.755967,1468,cat
1633,39.786597,1633,cat


We are going to create a simple function to view the nearest neighbors to save typing:

In [30]:
def get_images_from_ids(query_result):
    dist, ind = query_result
    label = image_train['label'][ind[0]]
    return pd.DataFrame({'ind': ind[0], 'dist': dist[0], 'label': label})

In [31]:
cat_neighbors = get_images_from_ids(knn_model.kneighbors(cat))



In [32]:
cat_neighbors.head()

Unnamed: 0,dist,ind,label
18,0.0,18,cat
288,36.940312,288,cat
1565,38.463489,1565,cat
1468,39.755967,1468,cat
1633,39.786597,1633,cat


Very cool results showing similar cats.

## Finding similar images to a car

In [33]:
car = train_deep_features[8]
#car['image'].show()

In [34]:
get_images_from_ids (knn_model.kneighbors(car)).head()



Unnamed: 0,dist,ind,label
8,0.0,8,automobile
372,32.310817,372,automobile
1757,33.925375,1757,automobile
1343,35.02313,1343,automobile
1009,35.376827,1009,automobile


# Just for fun, let's create a lambda to find and show nearest neighbor images

In [35]:
show_neighbors = lambda i: get_images_from_ids(knn_model.kneighbors(train_deep_features[i:i+1])).head()

In [36]:
show_neighbors(8)

Unnamed: 0,dist,ind,label
8,0.0,8,automobile
372,32.310817,372,automobile
1757,33.925375,1757,automobile
1343,35.02313,1343,automobile
1009,35.376827,1009,automobile


In [37]:
show_neighbors(26)

Unnamed: 0,dist,ind,label
26,0.0,26,automobile
457,34.715476,457,automobile
377,35.793145,377,automobile
1576,36.665344,1576,automobile
1280,36.878414,1280,automobile
