# Image features exercise
*Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the [assignments page](http://vision.stanford.edu/teaching/cs231n/assignments.html) on the course website.*

We have seen that we can achieve reasonable performance on an image classification task by training a linear classifier on the pixels of the input image. In this exercise we will show that we can improve our classification performance by training linear classifiers not on raw pixels but on features that are computed from the raw pixels.

All of your work for this exercise will be done in this notebook.

In [1]:
import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

from __future__ import print_function

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

## Load data
Similar to previous exercises, we will load CIFAR-10 data from disk.

In [3]:
from cs231n.features import color_histogram_hsv, hog_feature

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
    # Load the raw CIFAR-10 data
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
    
    # Subsample the data
    mask = list(range(num_training, num_training + num_validation))
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = list(range(num_training))
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = list(range(num_test))
    X_test = X_test[mask]
    y_test = y_test[mask]
    
    return X_train, y_train, X_val, y_val, X_test, y_test

X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()

## Extract Features
For each image we will compute a Histogram of Oriented
Gradients (HOG) as well as a color histogram using the hue channel in HSV
color space. We form our final feature vector for each image by concatenating
the HOG and color histogram feature vectors.

Roughly speaking, HOG should capture the texture of the image while ignoring
color information, and the color histogram represents the color of the input
image while ignoring texture. As a result, we expect that using both together
ought to work better than using either alone. Verifying this assumption would
be a good thing to try for the bonus section.

The `hog_feature` and `color_histogram_hsv` functions both operate on a single
image and return a feature vector for that image. The extract_features
function takes a set of images and a list of feature functions and evaluates
each feature function on each image, storing the results in a matrix where
each column is the concatenation of all feature vectors for a single image.

In [4]:
from cs231n.features import *

num_color_bins = 10 # Number of bins in the color histogram
feature_fns = [hog_feature, lambda img: color_histogram_hsv(img, nbin=num_color_bins)]
X_train_feats = extract_features(X_train, feature_fns, verbose=True)
X_val_feats = extract_features(X_val, feature_fns)
X_test_feats = extract_features(X_test, feature_fns)

# Preprocessing: Subtract the mean feature
mean_feat = np.mean(X_train_feats, axis=0, keepdims=True)
X_train_feats -= mean_feat
X_val_feats -= mean_feat
X_test_feats -= mean_feat

# Preprocessing: Divide by standard deviation. This ensures that each feature
# has roughly the same scale.
std_feat = np.std(X_train_feats, axis=0, keepdims=True)
X_train_feats /= std_feat
X_val_feats /= std_feat
X_test_feats /= std_feat

# Preprocessing: Add a bias dimension
# This operations only for SVM !!!!!!
#X_train_feats_svm = np.hstack([X_train_feats, np.ones((X_train_feats.shape[0], 1))])
#X_val_feats_svm = np.hstack([X_val_feats, np.ones((X_val_feats.shape[0], 1))])
#X_test_feats_svm = np.hstack([X_test_feats, np.ones((X_test_feats.shape[0], 1))])

Done extracting features for 1000 / 49000 images
Done extracting features for 2000 / 49000 images
Done extracting features for 3000 / 49000 images
Done extracting features for 4000 / 49000 images
Done extracting features for 5000 / 49000 images
Done extracting features for 6000 / 49000 images
Done extracting features for 7000 / 49000 images
Done extracting features for 8000 / 49000 images
Done extracting features for 9000 / 49000 images
Done extracting features for 10000 / 49000 images
Done extracting features for 11000 / 49000 images
Done extracting features for 12000 / 49000 images
Done extracting features for 13000 / 49000 images
Done extracting features for 14000 / 49000 images
Done extracting features for 15000 / 49000 images
Done extracting features for 16000 / 49000 images
Done extracting features for 17000 / 49000 images
Done extracting features for 18000 / 49000 images
Done extracting features for 19000 / 49000 images
Done extracting features for 20000 / 49000 images
Done extr

## Train SVM on features
Using the multiclass SVM code developed earlier in the assignment, train SVMs on top of the features extracted above; this should achieve better results than training SVMs directly on top of raw pixels.

In [None]:
# Use the validation set to tune the learning rate and regularization strength

from cs231n.classifiers.linear_classifier import LinearSVM

learning_rates = [3e-8, 4e-8, 5e-8, 6e-8,1e-7]
regularization_strengths = [3e5 ,5e5,6e5,7e5,8e5,9e5,1e6,2e6]

results = {}
best_val = -1
best_svm = None

################################################################################
# TODO:                                                                        #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save    #
# the best trained classifer in best_svm. You might also want to play          #
# with different numbers of bins in the color histogram. If you are careful    #
# you should be able to get accuracy of near 0.44 on the validation set.       #
################################################################################
for l_rate in learning_rates:
    for reg_val in regularization_strengths:
        svm_now = LinearSVM()
        loss_info = svm_now.train(X_train_feats_svm, y_train, learning_rate=l_rate, reg=reg_val,
                      num_iters=1000, verbose=True)
        val_acc_now = (svm_now.predict(X_val_feats_svm) == y_val).mean()
        train_acc_now = (svm_now.predict(X_train_feats_svm) == y_train).mean()
        results[(l_rate, reg_val)] = (train_acc_now, val_acc_now)
        if val_acc_now > best_val:
            best_val = val_acc_now
            best_svm = svm_now
################################################################################
#                              END OF YOUR CODE                                #
################################################################################

# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))
    
print('best validation accuracy achieved during cross-validation: %f' % best_val)

In [None]:
# Evaluate your trained SVM on the test set
y_test_pred = best_svm.predict(X_test_feats_svm)
test_accuracy = np.mean(y_test == y_test_pred)
print(test_accuracy)

In [None]:
# An important way to gain intuition about how an algorithm works is to
# visualize the mistakes that it makes. In this visualization, we show examples
# of images that are misclassified by our current system. The first column
# shows images that our system labeled as "plane" but whose true label is
# something other than "plane".

examples_per_class = 6
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for cls, cls_name in enumerate(classes):
    idxs = np.where((y_test != cls) & (y_test_pred == cls))[0]
    idxs = np.random.choice(idxs, examples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt.subplot(examples_per_class, len(classes), i * len(classes) + cls + 1)
        plt.imshow(X_test[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls_name)
plt.show()

### Inline question 1:
Describe the misclassification results that you see. Do they make sense?

## Neural Network on image features
Earlier in this assigment we saw that training a two-layer neural network on raw pixels achieved better classification performance than linear classifiers on raw pixels. In this notebook we have seen that linear classifiers on image features outperform linear classifiers on raw pixels. 

For completeness, we should also try training a neural network on image features. This approach should outperform all previous approaches: you should easily be able to achieve over 55% classification accuracy on the test set; our best model achieves about 60% classification accuracy.

In [5]:
print(X_train_feats.shape)
dim_m = X_train_feats.shape[1]

(49000, 154)


In [8]:
from cs231n.classifiers.neural_net import TwoLayerNet

input_dim = dim_m
hidden_dim_set = [80,100,200]
num_classes = 10

learning_rate_set = [1e-2, 5e-2,1e-1]
regularazation_value = [1e-5, 2e-5]
#net = TwoLayerNet(input_dim, hidden_dim, num_classes)
best_net = None
best_val = -1
results = {}
################################################################################
# TODO: Train a two-layer neural network on image features. You may want to    #
# cross-validate various parameters as in previous sections. Store your best   #
# model in the best_net variable.                                              #
################################################################################
for h_size in hidden_dim_set:
    for l_rate in learning_rate_set:
        for reg_val in regularazation_value:
            net_now = None
            net_now = TwoLayerNet(input_dim, h_size, num_classes)
            info = net_now.train(X_train_feats, y_train, X_val_feats, y_val,
                                num_iters=6000, batch_size=200,
                                learning_rate=l_rate, learning_rate_decay=0.95,
                                reg=reg_val, verbose=True)
            val_acc_now = (net_now.predict(X_val_feats) == y_val).mean()
            results[(h_size, l_rate, reg_val)] = val_acc_now
            if val_acc_now > best_val:
                best_val = val_acc_now
                best_net = net_now
                
            print('h_size = %d, l_rate = %f, reg = %f, val_accuracy = %f'%(h_size, l_rate, reg_val, val_acc_now))
################################################################################
#                              END OF YOUR CODE                                #
################################################################################

iteration 0 / 6000: loss 2.302585
iteration 100 / 6000: loss 2.302586
iteration 200 / 6000: loss 2.302447
iteration 300 / 6000: loss 2.302591
iteration 400 / 6000: loss 2.302051
iteration 500 / 6000: loss 2.302834
iteration 600 / 6000: loss 2.302621
iteration 700 / 6000: loss 2.302645
iteration 800 / 6000: loss 2.302349
iteration 900 / 6000: loss 2.302574
iteration 1000 / 6000: loss 2.302425
iteration 1100 / 6000: loss 2.302647
iteration 1200 / 6000: loss 2.302794
iteration 1300 / 6000: loss 2.301927
iteration 1400 / 6000: loss 2.301378
iteration 1500 / 6000: loss 2.301599
iteration 1600 / 6000: loss 2.300318
iteration 1700 / 6000: loss 2.300562
iteration 1800 / 6000: loss 2.298410
iteration 1900 / 6000: loss 2.296570
iteration 2000 / 6000: loss 2.288473
iteration 2100 / 6000: loss 2.289331
iteration 2200 / 6000: loss 2.269360
iteration 2300 / 6000: loss 2.248211
iteration 2400 / 6000: loss 2.249519
iteration 2500 / 6000: loss 2.226750
iteration 2600 / 6000: loss 2.200916
iteration 270

iteration 3700 / 6000: loss 1.245435
iteration 3800 / 6000: loss 1.362219
iteration 3900 / 6000: loss 1.373728
iteration 4000 / 6000: loss 1.355308
iteration 4100 / 6000: loss 1.279421
iteration 4200 / 6000: loss 1.234934
iteration 4300 / 6000: loss 1.204442
iteration 4400 / 6000: loss 1.393225
iteration 4500 / 6000: loss 1.133001
iteration 4600 / 6000: loss 1.221867
iteration 4700 / 6000: loss 1.337860
iteration 4800 / 6000: loss 1.229570
iteration 4900 / 6000: loss 1.309838
iteration 5000 / 6000: loss 1.222484
iteration 5100 / 6000: loss 1.304973
iteration 5200 / 6000: loss 1.088907
iteration 5300 / 6000: loss 1.286846
iteration 5400 / 6000: loss 1.302916
iteration 5500 / 6000: loss 1.224008
iteration 5600 / 6000: loss 1.089661
iteration 5700 / 6000: loss 1.380415
iteration 5800 / 6000: loss 1.183661
iteration 5900 / 6000: loss 1.250041
h_size = 80, l_rate = 0.050000, reg = 0.000020, val_accuracy = 0.532000
iteration 0 / 6000: loss 2.302585
iteration 100 / 6000: loss 2.302182
iterati

iteration 1200 / 6000: loss 2.302717
iteration 1300 / 6000: loss 2.302642
iteration 1400 / 6000: loss 2.302528
iteration 1500 / 6000: loss 2.301818
iteration 1600 / 6000: loss 2.299649
iteration 1700 / 6000: loss 2.298053
iteration 1800 / 6000: loss 2.296834
iteration 1900 / 6000: loss 2.293723
iteration 2000 / 6000: loss 2.289948
iteration 2100 / 6000: loss 2.281158
iteration 2200 / 6000: loss 2.268808
iteration 2300 / 6000: loss 2.264516
iteration 2400 / 6000: loss 2.229900
iteration 2500 / 6000: loss 2.226062
iteration 2600 / 6000: loss 2.181031
iteration 2700 / 6000: loss 2.184517
iteration 2800 / 6000: loss 2.141960
iteration 2900 / 6000: loss 2.113737
iteration 3000 / 6000: loss 2.101258
iteration 3100 / 6000: loss 2.114308
iteration 3200 / 6000: loss 2.048792
iteration 3300 / 6000: loss 2.069403
iteration 3400 / 6000: loss 2.007422
iteration 3500 / 6000: loss 1.967986
iteration 3600 / 6000: loss 1.990026
iteration 3700 / 6000: loss 1.987187
iteration 3800 / 6000: loss 1.901992
i

iteration 4900 / 6000: loss 1.175767
iteration 5000 / 6000: loss 1.009770
iteration 5100 / 6000: loss 1.112186
iteration 5200 / 6000: loss 1.173423
iteration 5300 / 6000: loss 1.080095
iteration 5400 / 6000: loss 1.116512
iteration 5500 / 6000: loss 1.017724
iteration 5600 / 6000: loss 1.038225
iteration 5700 / 6000: loss 1.126907
iteration 5800 / 6000: loss 1.099063
iteration 5900 / 6000: loss 1.013308
h_size = 100, l_rate = 0.100000, reg = 0.000010, val_accuracy = 0.579000
iteration 0 / 6000: loss 2.302585
iteration 100 / 6000: loss 2.302951
iteration 200 / 6000: loss 2.225083
iteration 300 / 6000: loss 1.947368
iteration 400 / 6000: loss 1.658534
iteration 500 / 6000: loss 1.565952
iteration 600 / 6000: loss 1.490900
iteration 700 / 6000: loss 1.497449
iteration 800 / 6000: loss 1.372893
iteration 900 / 6000: loss 1.330334
iteration 1000 / 6000: loss 1.379027
iteration 1100 / 6000: loss 1.383271
iteration 1200 / 6000: loss 1.395010
iteration 1300 / 6000: loss 1.317466
iteration 1400

iteration 2400 / 6000: loss 1.221302
iteration 2500 / 6000: loss 1.321285
iteration 2600 / 6000: loss 1.359364
iteration 2700 / 6000: loss 1.350446
iteration 2800 / 6000: loss 1.470606
iteration 2900 / 6000: loss 1.365002
iteration 3000 / 6000: loss 1.327609
iteration 3100 / 6000: loss 1.174521
iteration 3200 / 6000: loss 1.452576
iteration 3300 / 6000: loss 1.285922
iteration 3400 / 6000: loss 1.231659
iteration 3500 / 6000: loss 1.309039
iteration 3600 / 6000: loss 1.349799
iteration 3700 / 6000: loss 1.261426
iteration 3800 / 6000: loss 1.368280
iteration 3900 / 6000: loss 1.383172
iteration 4000 / 6000: loss 1.392520
iteration 4100 / 6000: loss 1.256294
iteration 4200 / 6000: loss 1.296722
iteration 4300 / 6000: loss 1.149826
iteration 4400 / 6000: loss 1.383378
iteration 4500 / 6000: loss 1.389521
iteration 4600 / 6000: loss 1.331955
iteration 4700 / 6000: loss 1.197713
iteration 4800 / 6000: loss 1.392210
iteration 4900 / 6000: loss 1.250947
iteration 5000 / 6000: loss 1.268163
i

In [9]:
# Run your neural net classifier on the test set. You should be able to
# get more than 55% accuracy.

test_acc = (best_net.predict(X_test_feats) == y_test).mean()
print(test_acc)

0.547


# Bonus: Design your own features!

You have seen that simple image features can improve classification performance. So far we have tried HOG and color histograms, but other types of features may be able to achieve even better classification performance.

For bonus points, design and implement a new type of feature and use it for image classification on CIFAR-10. Explain how your feature works and why you expect it to be useful for image classification. Implement it in this notebook, cross-validate any hyperparameters, and compare its performance to the HOG + Color histogram baseline.

# Bonus: Do something extra!
Use the material and code we have presented in this assignment to do something interesting. Was there another question we should have asked? Did any cool ideas pop into your head as you were working on the assignment? This is your chance to show off!