# Using deep features to build an image classifier

# Fire up GraphLab Create

In [1]:
import graphlab

A newer version of GraphLab Create (v1.8.3) is available! Your current version is v1.8.2.

You can use pip to upgrade the graphlab-create package. For more information see https://dato.com/products/create/upgrade.


# Load a common image analysis dataset

We will use a popular benchmark dataset in computer vision called CIFAR-10.  

(We've reduced the data to just 4 categories = {'cat','bird','automobile','dog'}.)

This dataset is already split into a training set and test set.  

In [2]:
image_train = graphlab.SFrame('image_train_data/')
image_test = graphlab.SFrame('image_test_data/')

[INFO] GraphLab Create v1.8.2 started. Logging: /tmp/graphlab_server_1457063835.log


# Exploring the image data

In [3]:
graphlab.canvas.set_target('ipynb')

In [4]:
image_train['image'].show()

# Train a classifier on the raw image pixels

We first start by training a classifier on just the raw pixels of the image.

In [8]:
raw_pixel_model = graphlab.logistic_classifier.create(image_train,target='label',
                                              features=['image_array'])

PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



# Make a prediction with the simple model based on raw pixels

In [9]:
image_test[0:3]['image'].show()

In [10]:
image_test[0:3]['label']

dtype: str
Rows: 3
['cat', 'automobile', 'cat']

In [11]:
raw_pixel_model.predict(image_test[0:3])

dtype: str
Rows: 3
['bird', 'cat', 'bird']

The model makes wrong predictions for all three images.

# Evaluating raw pixel model on test data

In [12]:
raw_pixel_model.evaluate(image_test)

{'accuracy': 0.477, 'auc': 0.7183869583333347, 'confusion_matrix': Columns:
 	target_label	str
 	predicted_label	str
 	count	int
 
 Rows: 16
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |     bird     |       dog       |  164  |
 |     dog      |       cat       |  259  |
 |     bird     |    automobile   |  139  |
 |  automobile  |    automobile   |  624  |
 |     cat      |       dog       |  264  |
 |     dog      |       dog       |  403  |
 |     dog      |    automobile   |  109  |
 |     bird     |       bird      |  515  |
 |  automobile  |       bird      |  111  |
 |     bird     |       cat       |  182  |
 +--------------+-----------------+-------+
 [16 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'f1_score': 0.47527414683219893, 'log_loss': 1.2138609505674112, 'precision': 0.

The accuracy of this model is poor, getting only about 46% accuracy.

# Can we improve the model using deep features

We only have 2005 data points, so it is not possible to train a deep neural network effectively with so little data.  Instead, we will use transfer learning: using deep features trained on the full ImageNet dataset, we will train a simple model on this small dataset.

In [13]:
len(image_train)

2005

## Computing deep features for our images

The two lines below allow us to compute deep features.  This computation takes a little while, so we have already computed them and saved the results as a column in the data you loaded. 

(Note that if you would like to compute such deep features and have a GPU on your machine, you should use the GPU enabled GraphLab Create, which will be significantly faster for this task.)

In [14]:
deep_learning_model = graphlab.load_model('http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45')
image_train['deep_features'] = deep_learning_model.extract_features(image_train)

As we can see, the column deep_features already contains the pre-computed deep features for this data. 

In [15]:
image_train.head()

id,image,label,deep_features,image_array
24,Height: 32 Width: 32,bird,"[0.242872238159, 1.09545278549, 0.0, ...","[73.0, 77.0, 58.0, 71.0, 68.0, 50.0, 77.0, 69.0, ..."
33,Height: 32 Width: 32,cat,"[0.525088429451, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[7.0, 5.0, 8.0, 7.0, 5.0, 8.0, 5.0, 4.0, 6.0, 7.0, ..."
36,Height: 32 Width: 32,cat,"[0.566015422344, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[169.0, 122.0, 65.0, 131.0, 108.0, 75.0, ..."
70,Height: 32 Width: 32,dog,"[1.12979567051, 0.0, 0.0, 0.778195261955, 0.0, ...","[154.0, 179.0, 152.0, 159.0, 183.0, 157.0, ..."
90,Height: 32 Width: 32,bird,"[1.71786916256, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[216.0, 195.0, 180.0, 201.0, 178.0, 160.0, ..."
97,Height: 32 Width: 32,automobile,"[1.57818508148, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[33.0, 44.0, 27.0, 29.0, 44.0, 31.0, 32.0, 45.0, ..."
107,Height: 32 Width: 32,dog,"[0.0, 0.0, 0.22067707777, 0.0, 0.0, 0.0, ...","[97.0, 51.0, 31.0, 104.0, 58.0, 38.0, 107.0, 61.0, ..."
121,Height: 32 Width: 32,bird,"[0.0, 0.237533032894, 0.0, 0.0, 0.0, 0.0, ...","[93.0, 96.0, 88.0, 102.0, 106.0, 97.0, 117.0, ..."
136,Height: 32 Width: 32,automobile,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.573782444, 0.0, ...","[35.0, 59.0, 53.0, 36.0, 56.0, 56.0, 42.0, 62.0, ..."
138,Height: 32 Width: 32,bird,"[0.658935666084, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[205.0, 193.0, 195.0, 200.0, 187.0, 193.0, ..."


# Given the deep features, let's train a classifier

In [16]:
deep_features_model = graphlab.logistic_classifier.create(image_train,
                                                         features=['deep_features'],
                                                         target='label')

PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



# Apply the deep features model to first few images of test set

In [17]:
image_test[0:3]['image'].show()

In [18]:
deep_features_model.predict(image_test[0:3])

dtype: str
Rows: 3
['cat', 'automobile', 'cat']

The classifier with deep features gets all of these images right!

# Compute test_data accuracy of deep_features_model

As we can see, deep features provide us with significantly better accuracy (about 78%)

In [23]:
deep_features_model.evaluate(image_test)

{'accuracy': 0.78375, 'auc': 0.9399961666666654, 'confusion_matrix': Columns:
 	target_label	str
 	predicted_label	str
 	count	int
 
 Rows: 16
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |  automobile  |       cat       |   15  |
 |     bird     |       dog       |   69  |
 |     cat      |       bird      |   70  |
 |  automobile  |       dog       |   6   |
 |     cat      |    automobile   |   34  |
 |     dog      |       bird      |   37  |
 |     bird     |       cat       |  128  |
 |     dog      |    automobile   |   17  |
 |     dog      |       dog       |  738  |
 |     cat      |       dog       |  230  |
 +--------------+-----------------+-------+
 [16 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'f1_score': 0.7839808001874151, 'log_loss': 0.5710076000855692, 'precision': 0

# Computing summary statistics of the data

In [27]:
sketch = image_train['label'].sketch_summary()

In [28]:
sketch


+------------------+-------+----------+
|       item       | value | is exact |
+------------------+-------+----------+
|      Length      |  2005 |   Yes    |
| # Missing Values |   0   |   Yes    |
| # unique values  |   4   |    No    |
+------------------+-------+----------+

Most frequent items:
+-------+------------+-----+-----+------+
| value | automobile | cat | dog | bird |
+-------+------------+-----+-----+------+
| count |    509     | 509 | 509 | 478  |
+-------+------------+-----+-----+------+


## What’s the least common category in the training data?
The least common category in the training data is: **Bird**

# Creating category-specific image retrieval models

## Split the SFrame with the training data into 4 different SFrames

In [109]:
automobile_train = image_train[image_train['label'] == 'automobile']
cat_train        = image_train[image_train['label'] == 'cat']
dog_train        = image_train[image_train['label'] == 'dog']
bird_train       = image_train[image_train['label'] == 'bird']

Create a nearest neighbor model using the 'deep_features' as the features for each of the above datasets

In [31]:
automobile_model = graphlab.nearest_neighbors.create(automobile_train,features=['deep_features'], label='id')
cat_model        = graphlab.nearest_neighbors.create(cat_train,features=['deep_features'], label='id')
dog_model        = graphlab.nearest_neighbors.create(dog_train,features=['deep_features'], label='id')
bird_model       = graphlab.nearest_neighbors.create(bird_train,features=['deep_features'], label='id')

In [114]:
graphlab.canvas.set_target('ipynb')
def get_images_from_ids(query_result):
    return image_train.filter_by(query_result['reference_label'],'id')
show_neighbors = lambda i: get_images_from_ids(cat_model.query(image_train[i:i+1]))['image'].show()

In [119]:
cat = image_test[0:1]
cat.show()

## What is the nearest ‘cat’ labeled image in the training data to the cat image above (the first image in the test data)?

In [129]:
cat_cat_neighbors = cat_model.query(image_test[0:1])
print(cat_cat_neighbors)
get_images_from_ids(cat_cat_neighbors).show()

+-------------+-----------------+---------------+------+
| query_label | reference_label |    distance   | rank |
+-------------+-----------------+---------------+------+
|      0      |      16289      | 34.6237245482 |  1   |
|      0      |      45646      | 36.0068831055 |  2   |
|      0      |      32139      | 36.5200814714 |  3   |
|      0      |      25713      | 36.7548515281 |  4   |
|      0      |       331       | 36.8731201117 |  5   |
+-------------+-----------------+---------------+------+
[5 rows x 4 columns]



## What is the nearest ‘dog’ labeled image in the training data to the cat image above (the first image in the test data)?

In [130]:
get_images_from_ids(cat_cat_neighbors)['image'].show()

In [131]:
cat_dog_neighbors = dog_model.query(cat)
cat_dog_neighbors
get_images_from_ids(cat_dog_neighbors)['image'].show()

# 3. A simple example of nearest-neighbors classification

For the first image in the test data (image_test[0:1]), which we used above, **compute the mean distance between this image at its 5 nearest neighbors that were labeled ‘cat’ in the training data**

In [62]:
cat_cat_neighbors['distance'].mean()

36.15573215296916

for the first image in the test data `image_test[0:1]`, which we used above, **compute the mean distance between this image at its 5 nearest neighbors that were labeled ‘dog’ in the training data**

In [63]:
cat_dog_neighbors['distance'].mean()

37.77071247656341

**On average, is the first image in the test data closer to its 5 nearest neighbors in the ‘cat’ data or in the ‘dog’ data?**

As the cat-cat neighbors model's mean distance is less than cat-dog neighbors model's mean distance. **The first image is closer to it's 5 nearest neighbors in the 'cat' data**.

## 4. Computing nearest neighbors accuracy using SFrame operations

A nearest neighbor classifier predicts the label of a point as the most common label of its nearest neighbors. In this question, we will measure the accuracy of a 1-nearest-neighbor classifier, i.e., predict the output as the label of the nearest neighbor in the training data. Although there are simpler ways of computing this result, we will go step-by-step here to introduce you to more concepts in nearest neighbors and SFrames, which will be useful later in this Specialization.

* **Training models**: For this question, you will need the nearest neighbors models you learned above on the training data, i.e., the dog_model, cat_model, automobile_model and bird_model.
* **Spliting test data by label**: Above, you split the train data SFrame into one SFrame for images labeled ‘dog’, another for those labeled ‘cat’, etc. Now, do the same for the test data. You can call the resulting SFrames `image_test_cat, image_test_dog, image_test_bird, image_test_automobile`

In [65]:
image_test_cat = image_test[image_test['label'] == 'cat']
image_test_dog = image_test[image_test['label'] == 'dog']
image_test_bird = image_test[image_test['label'] == 'bird']
image_test_automobile = image_test[image_test['label'] == 'automobile']

In [67]:
dog_cat_neighbors = cat_model.query(image_test_dog, k=1)
dog_cat_neighbors['distance']

dtype: float
Rows: 1000
[36.419608941315715, 38.835326766067844, 36.97634488374303, 34.57500615171566, 34.77882262619482, 35.11715824103536, 40.60958513268354, 39.9036859490617, 38.067474089102575, 42.72587380814982, 40.0733455492491, 31.66335160329702, 37.71246797975448, 39.09036263161471, 49.579679117113166, 36.07738703838557, 36.50902562050289, 44.952439964006174, 33.43682585875031, 34.32455744042027, 34.71471125239844, 33.23747699124214, 34.42538527944865, 34.51304142336406, 37.9842449780713, 41.937835565530875, 39.475535430067325, 37.613473828098975, 36.460958495862734, 32.54458590633176, 36.83135865607554, 37.85902587035358, 41.78116745376098, 35.532073884223564, 33.12115259308907, 34.95202733923359, 33.216541005966654, 30.274806510119348, 35.256572141437815, 36.64845801974247, 33.67638181699052, 32.42639588336553, 39.64035274393126, 44.35133035473476, 38.262874984143714, 37.19424773611843, 38.416204787691235, 33.47475643950759, 35.28479740151353, 43.05319712340215, 31.5868986222

finds 1 neighbor (that’s what k=1 does) to the dog test images (image_test_dog) in the cat portion of the training data (used to train the cat_model).

**The question we want to answer is how many of the test set ‘dog’ images are closer to a ‘dog’ in the training set than to a ‘cat’, ‘automobile’ or ‘bird’.** So, next we will create an SFrame containing just these distances per data point.

In [84]:
dog_distances = graphlab.SFrame({
        'dog-dog':        dog_model.query(image_test_dog, k=1)['distance'],
        'dog-cat':        cat_model.query(image_test_dog, k=1)['distance'],
        'dog-automobile': automobile_model.query(image_test_dog, k=1)['distance'],
        'dog-bird':       bird_model.query(image_test_dog, k=1)['distance']
    })

cat_distances = graphlab.SFrame({
        'cat-dog':        dog_model.query(image_test_cat, k=1)['distance'],
        'cat-cat':        cat_model.query(image_test_cat, k=1)['distance'],
        'cat-automobile': automobile_model.query(image_test_cat, k=1)['distance'],
        'cat-bird':       bird_model.query(image_test_cat, k=1)['distance']
    })

In [71]:
dog_distances

dog-automobile,dog-bird,dog-cat,dog-dog
41.9579712598,41.7538680747,36.4196089413,33.4773628317
46.0021334736,41.3382942608,38.8353267661,32.8458506185
42.9462302647,38.6157605656,36.9763448837,35.0397072372
41.6866089424,37.0892255104,34.5750061517,33.9010314165
39.2269686105,38.2722896979,34.7788226262,37.4849231295
40.5845151397,39.1462093453,35.117158241,34.9451650714
45.1067385141,40.5230392066,40.6095851327,39.0957285891
41.3221159523,38.1947906379,39.9036859491,37.7696184861
41.8244610106,40.1567136635,38.0674740891,35.1089149251
45.4976892981,45.5597967729,42.7258738081,43.2422849


## Computing the number of correct predictions using 1-nearest neighbors for the dog class

* Consider one row of the SFrame `dog_distances`. Let’s call this variable `row`.

In [75]:
row = dog_distances[0]

In [77]:
row['dog-cat']

36.419608941315715

In [86]:
def is_dog_correct(row):
    dd = row['dog-dog']
    if dd < row['dog-cat'] and dd < row['dog-bird'] and dd < row['dog-automobile']:
        return 1
    else:
        return 0
    pass

def is_cat_correct(row):
    cc = row['cat-cat']
    if cc < row['cat-dog'] and cc < row['cat-bird'] and cc < row['cat-automobile']:
        return 1
    else:
        return 0
    pass

* Use the .apply() method to iterate the function is_dog_correct for each row of the SFrame.

In [82]:
dog_correct = dog_distances.apply(is_dog_correct)

In [83]:
dog_correct.sum()

678

In [87]:
cat_distances.apply(is_cat_correct).sum()

548

**Accuracy of predicting dog in the test data** What is the accuracy of the 1-nearest neighbor classifier at classifying "dog" images from the test set?

In [93]:
678 / 1000.0

0.678

In [95]:
deep_features_model.evaluate(image_test_dog)

{'accuracy': 0.738, 'auc': nan, 'confusion_matrix': Columns:
 	target_label	str
 	predicted_label	str
 	count	int
 
 Rows: 4
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |     dog      |       cat       |  208  |
 |     dog      |       bird      |   37  |
 |     dog      |       dog       |  738  |
 |     dog      |    automobile   |   17  |
 +--------------+-----------------+-------+
 [4 rows x 3 columns], 'f1_score': 0.21231300345224396, 'log_loss': 0.6775307430935967, 'precision': 0.25, 'recall': 0.738, 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 	class	int
 
 Rows: 400004
 
 Data:
 +-----------+-------+-----+---+------+-------+
 | threshold |  fpr  | tpr | p |  n   | class |
 +-----------+-------+-----+---+------+-------+
 |    0.0    |  1.0  | nan | 0 | 1000 |   0   |
 |   1e-05   | 0.989 | nan | 0 | 1000 |   0   |
 |   2e-05   | 0.985 | nan | 0 | 

In [108]:
image_test[0:1].show()

In [110]:
deep_features_model.predict(image_test[0:1])

dtype: str
Rows: 1
['cat']

In [128]:
get_images_from_ids(dog_model.query(image_test[0:1])).show()