# Image features exercise
*Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the [assignments page](https://course.cse.ust.hk/comp4901j/Password_Only/programs/assignment1/index.html) on the course website.*

We have seen that we can achieve reasonable performance on an image classification task by training a linear classifier on the pixels of the input image. In this exercise we will show that we can improve our classification performance by training linear classifiers not on raw pixels but on features that are computed from the raw pixels.

All of your work for this exercise will be done in this notebook.

In [2]:
import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

from __future__ import print_function

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

## Load data
Similar to previous exercises, we will load CIFAR-10 data from disk.

In [3]:
from cs231n.features import color_histogram_hsv, hog_feature

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
    # Load the raw CIFAR-10 data
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
    
    # Subsample the data
    mask = list(range(num_training, num_training + num_validation))
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = list(range(num_training))
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = list(range(num_test))
    X_test = X_test[mask]
    y_test = y_test[mask]
    
    return X_train, y_train, X_val, y_val, X_test, y_test

X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()

## Extract Features
For each image we will compute a Histogram of Oriented
Gradients (HOG) as well as a color histogram using the hue channel in HSV
color space. We form our final feature vector for each image by concatenating
the HOG and color histogram feature vectors.

Roughly speaking, HOG should capture the texture of the image while ignoring
color information, and the color histogram represents the color of the input
image while ignoring texture. As a result, we expect that using both together
ought to work better than using either alone. Verifying this assumption would
be a good thing to try for the bonus section.

The `hog_feature` and `color_histogram_hsv` functions both operate on a single
image and return a feature vector for that image. The extract_features
function takes a set of images and a list of feature functions and evaluates
each feature function on each image, storing the results in a matrix where
each column is the concatenation of all feature vectors for a single image.

In [4]:
from cs231n.features import *

num_color_bins = 10 # Number of bins in the color histogram
feature_fns = [hog_feature, lambda img: color_histogram_hsv(img, nbin=num_color_bins)]
X_train_feats = extract_features(X_train, feature_fns, verbose=True)
X_val_feats = extract_features(X_val, feature_fns)
X_test_feats = extract_features(X_test, feature_fns)

# Preprocessing: Subtract the mean feature
mean_feat = np.mean(X_train_feats, axis=0, keepdims=True)
X_train_feats -= mean_feat
X_val_feats -= mean_feat
X_test_feats -= mean_feat

# Preprocessing: Divide by standard deviation. This ensures that each feature
# has roughly the same scale.
std_feat = np.std(X_train_feats, axis=0, keepdims=True)
X_train_feats /= std_feat
X_val_feats /= std_feat
X_test_feats /= std_feat

# Preprocessing: Add a bias dimension
X_train_feats = np.hstack([X_train_feats, np.ones((X_train_feats.shape[0], 1))])
X_val_feats = np.hstack([X_val_feats, np.ones((X_val_feats.shape[0], 1))])
X_test_feats = np.hstack([X_test_feats, np.ones((X_test_feats.shape[0], 1))])

Done extracting features for 1000 / 49000 images
Done extracting features for 2000 / 49000 images
Done extracting features for 3000 / 49000 images
Done extracting features for 4000 / 49000 images
Done extracting features for 5000 / 49000 images
Done extracting features for 6000 / 49000 images
Done extracting features for 7000 / 49000 images
Done extracting features for 8000 / 49000 images
Done extracting features for 9000 / 49000 images
Done extracting features for 10000 / 49000 images
Done extracting features for 11000 / 49000 images
Done extracting features for 12000 / 49000 images
Done extracting features for 13000 / 49000 images
Done extracting features for 14000 / 49000 images
Done extracting features for 15000 / 49000 images
Done extracting features for 16000 / 49000 images
Done extracting features for 17000 / 49000 images
Done extracting features for 18000 / 49000 images
Done extracting features for 19000 / 49000 images
Done extracting features for 20000 / 49000 images
Done extr

## Train SVM on features
Using the multiclass SVM code developed earlier in the assignment, train SVMs on top of the features extracted above; this should achieve better results than training SVMs directly on top of raw pixels.

In [35]:
# Use the validation set to tune the learning rate and regularization strength

from cs231n.classifiers.linear_classifier import LinearSVM

learning_rates = [1e-5, 1e-6, 1e-7, 1e-8, 1e-9]
regularization_strengths = [0.5e2, 2e2,0.5e3, 2e3,0.5e4, 2e4,0.5e5, 2e5,0.5e6, 2e6, 0.5e7, 2e7]

results = {}
best_val = -1
best_svm = None

pass
################################################################################
# TODO:                                                                        #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save    #
# the best trained classifer in best_svm. You might also want to play          #
# with different numbers of bins in the color histogram. If you are careful    #
# you should be able to get accuracy of near 0.44 on the validation set.       #
################################################################################
def accuracy(label, pred):
    return np.mean(label == pred)

for lr in learning_rates:
    for reg in regularization_strengths:
        svm = LinearSVM()
        lost_hist = []
        loss_hist = svm.train(X_train_feats, y_train, learning_rate=lr, reg=reg,
                              num_iters=1500, verbose=True)
        
        temp_val = accuracy(y_train, svm.predict(X_train_feats))
        if(temp_val > best_val):
            best_val = temp_val
            best_svm = svm
        results[(lr, reg)] = (temp_val, best_val)
        
        print("# lr: {} # reg: {}".format(lr, reg))        
        print('# current_accuracy: {} # Best_accuracy: {}'.format(temp_val, best_val))
################################################################################
#                              END OF YOUR CODE                                #
################################################################################

# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))
    
print('best validation accuracy achieved during cross-validation: %f' % best_val)

iteration 0 / 1500: loss 9.073733
iteration 100 / 1500: loss 9.000410
iteration 200 / 1500: loss 8.939066
iteration 300 / 1500: loss 8.865559
iteration 400 / 1500: loss 8.855626
iteration 500 / 1500: loss 8.793242
iteration 600 / 1500: loss 8.816699
iteration 700 / 1500: loss 8.739201
iteration 800 / 1500: loss 8.778080
iteration 900 / 1500: loss 8.719224
iteration 1000 / 1500: loss 8.628713
iteration 1100 / 1500: loss 8.714343
iteration 1200 / 1500: loss 8.661715
iteration 1300 / 1500: loss 8.636730
iteration 1400 / 1500: loss 8.711811
# lr: 1e-05 # reg: 50.0
# current_accuracy: 0.4143061224489796 # Best_accuracy: 0.4143061224489796
iteration 0 / 1500: loss 9.297178
iteration 100 / 1500: loss 9.082846
iteration 200 / 1500: loss 8.987475
iteration 300 / 1500: loss 8.942681
iteration 400 / 1500: loss 8.906447
iteration 500 / 1500: loss 8.915561
iteration 600 / 1500: loss 8.917067
iteration 700 / 1500: loss 8.924625
iteration 800 / 1500: loss 8.909856
iteration 900 / 1500: loss 8.905508


iteration 800 / 1500: loss 9.008206
iteration 900 / 1500: loss 9.008171
iteration 1000 / 1500: loss 9.003046
iteration 1100 / 1500: loss 8.998521
iteration 1200 / 1500: loss 8.982427
iteration 1300 / 1500: loss 8.982304
iteration 1400 / 1500: loss 8.971050
# lr: 1e-06 # reg: 50.0
# current_accuracy: 0.2570612244897959 # Best_accuracy: 0.415
iteration 0 / 1500: loss 9.315409
iteration 100 / 1500: loss 9.299370
iteration 200 / 1500: loss 9.279884
iteration 300 / 1500: loss 9.240036
iteration 400 / 1500: loss 9.222997
iteration 500 / 1500: loss 9.197494
iteration 600 / 1500: loss 9.161821
iteration 700 / 1500: loss 9.157997
iteration 800 / 1500: loss 9.127150
iteration 900 / 1500: loss 9.120134
iteration 1000 / 1500: loss 9.117472
iteration 1100 / 1500: loss 9.092971
iteration 1200 / 1500: loss 9.071061
iteration 1300 / 1500: loss 9.056678
iteration 1400 / 1500: loss 9.053967
# lr: 1e-06 # reg: 200.0
# current_accuracy: 0.28781632653061223 # Best_accuracy: 0.415
iteration 0 / 1500: loss 9

iteration 800 / 1500: loss 9.089011
iteration 900 / 1500: loss 9.067618
iteration 1000 / 1500: loss 9.074768
iteration 1100 / 1500: loss 9.068913
iteration 1200 / 1500: loss 9.074093
iteration 1300 / 1500: loss 9.060275
iteration 1400 / 1500: loss 9.066141
# lr: 1e-07 # reg: 50.0
# current_accuracy: 0.10748979591836735 # Best_accuracy: 0.4169591836734694
iteration 0 / 1500: loss 9.308701
iteration 100 / 1500: loss 9.303281
iteration 200 / 1500: loss 9.312397
iteration 300 / 1500: loss 9.302011
iteration 400 / 1500: loss 9.305584
iteration 500 / 1500: loss 9.328328
iteration 600 / 1500: loss 9.308763
iteration 700 / 1500: loss 9.289174
iteration 800 / 1500: loss 9.296363
iteration 900 / 1500: loss 9.288261
iteration 1000 / 1500: loss 9.286583
iteration 1100 / 1500: loss 9.275864
iteration 1200 / 1500: loss 9.291438
iteration 1300 / 1500: loss 9.270368
iteration 1400 / 1500: loss 9.284714
# lr: 1e-07 # reg: 200.0
# current_accuracy: 0.09263265306122449 # Best_accuracy: 0.4169591836734694

iteration 900 / 1500: loss 9.054544
iteration 1000 / 1500: loss 9.074951
iteration 1100 / 1500: loss 9.055560
iteration 1200 / 1500: loss 9.079400
iteration 1300 / 1500: loss 9.074065
iteration 1400 / 1500: loss 9.065924
# lr: 1e-08 # reg: 50.0
# current_accuracy: 0.1070204081632653 # Best_accuracy: 0.4169591836734694
iteration 0 / 1500: loss 9.295692
iteration 100 / 1500: loss 9.312640
iteration 200 / 1500: loss 9.307061
iteration 300 / 1500: loss 9.286578
iteration 400 / 1500: loss 9.298598
iteration 500 / 1500: loss 9.292689
iteration 600 / 1500: loss 9.303342
iteration 700 / 1500: loss 9.298051
iteration 800 / 1500: loss 9.316222
iteration 900 / 1500: loss 9.296034
iteration 1000 / 1500: loss 9.312820
iteration 1100 / 1500: loss 9.307414
iteration 1200 / 1500: loss 9.296965
iteration 1300 / 1500: loss 9.305983
iteration 1400 / 1500: loss 9.316243
# lr: 1e-08 # reg: 200.0
# current_accuracy: 0.1106938775510204 # Best_accuracy: 0.4169591836734694
iteration 0 / 1500: loss 9.809561
ite

iteration 400 / 1500: loss 9.321014
iteration 500 / 1500: loss 9.327795
iteration 600 / 1500: loss 9.329932
iteration 700 / 1500: loss 9.312018
iteration 800 / 1500: loss 9.307743
iteration 900 / 1500: loss 9.307260
iteration 1000 / 1500: loss 9.315504
iteration 1100 / 1500: loss 9.326311
iteration 1200 / 1500: loss 9.323400
iteration 1300 / 1500: loss 9.318900
iteration 1400 / 1500: loss 9.323684
# lr: 1e-09 # reg: 200.0
# current_accuracy: 0.09763265306122448 # Best_accuracy: 0.4171020408163265
iteration 0 / 1500: loss 9.731431
iteration 100 / 1500: loss 9.748319
iteration 200 / 1500: loss 9.746346
iteration 300 / 1500: loss 9.754664
iteration 400 / 1500: loss 9.754106
iteration 500 / 1500: loss 9.746293
iteration 600 / 1500: loss 9.740960
iteration 700 / 1500: loss 9.744469
iteration 800 / 1500: loss 9.737495
iteration 900 / 1500: loss 9.756267
iteration 1000 / 1500: loss 9.739408
iteration 1100 / 1500: loss 9.745424
iteration 1200 / 1500: loss 9.729001
iteration 1300 / 1500: loss 9

In [25]:
# Evaluate your trained SVM on the test set
y_test_pred = best_svm.predict(X_test_feats)
test_accuracy = np.mean(y_test == y_test_pred)
print(test_accuracy)

0.42


In [None]:
# An important way to gain intuition about how an algorithm works is to
# visualize the mistakes that it makes. In this visualization, we show examples
# of images that are misclassified by our current system. The first column
# shows images that our system labeled as "plane" but whose true label is
# something other than "plane".

examples_per_class = 8
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for cls, cls_name in enumerate(classes):
    idxs = np.where((y_test != cls) & (y_test_pred == cls))[0]
    idxs = np.random.choice(idxs, examples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt.subplot(examples_per_class, len(classes), i * len(classes) + cls + 1)
        plt.imshow(X_test[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls_name)
plt.show()

### Inline question 1:
Describe the misclassification results that you see. Do they make sense?

## Neural Network on image features
Earlier in this assigment we saw that training a two-layer neural network on raw pixels achieved better classification performance than linear classifiers on raw pixels. In this notebook we have seen that linear classifiers on image features outperform linear classifiers on raw pixels. 

For completeness, we should also try training a neural network on image features. This approach should outperform all previous approaches: you should easily be able to achieve over 55% classification accuracy on the test set; our best model achieves about 60% classification accuracy.

In [None]:
print(X_train_feats.shape)

In [None]:
from cs231n.classifiers.neural_net import TwoLayerNet

input_dim = X_train_feats.shape[1]
hidden_dim = 500
num_classes = 10

net = TwoLayerNet(input_dim, hidden_dim, num_classes)
best_net = None

################################################################################
# TODO: Train a two-layer neural network on image features. You may want to    #
# cross-validate various parameters as in previous sections. Store your best   #
# model in the best_net variable.                                              #
################################################################################
pass
################################################################################
#                              END OF YOUR CODE                                #
################################################################################

In [None]:
# Run your neural net classifier on the test set. You should be able to
# get more than 55% accuracy.

test_acc = (net.predict(X_test_feats) == y_test).mean()
print(test_acc)

# Bonus: Design your own features!

You have seen that simple image features can improve classification performance. So far we have tried HOG and color histograms, but other types of features may be able to achieve even better classification performance.

For bonus points, design and implement a new type of feature and use it for image classification on CIFAR-10. Explain how your feature works and why you expect it to be useful for image classification. Implement it in this notebook, cross-validate any hyperparameters, and compare its performance to the HOG + Color histogram baseline.

# Bonus: Do something extra!
Use the material and code we have presented in this assignment to do something interesting. Was there another question we should have asked? Did any cool ideas pop into your head as you were working on the assignment? This is your chance to show off!