# [Scene Recognition with Bag-of-Words](https://www.cc.gatech.edu/~hays/compvision/proj4/)
For this project, you will need to report performance for three
combinations of features / classifiers. It is suggested you code them in
this order, as well:
1. Tiny image features and nearest neighbor classifier
2. Bag of sift features and nearest neighbor classifier
3. Bag of sift features and linear SVM classifier

The starter code is initialized to 'placeholder' just so that the starter
code does not crash when run unmodified and you can get a preview of how
results are presented.

## Setup

In [52]:
# Set up parameters, image paths and category list
%matplotlib notebook
%load_ext autoreload
%autoreload 2

import cv2
import numpy as np
import os.path as osp
import pickle
from random import shuffle
import matplotlib.pyplot as plt
from utils import *
import student_code as sc
import warnings
warnings.filterwarnings('ignore')


# This is the list of categories / directories to use. The categories are
# somewhat sorted by similarity so that the confusion matrix looks more
# structured (indoor and then urban and then rural).
categories = ['Kitchen', 'Store', 'Bedroom', 'LivingRoom', 'Office', 'Industrial', 'Suburb',
              'InsideCity', 'TallBuilding', 'Street', 'Highway', 'OpenCountry', 'Coast',
              'Mountain', 'Forest'];
# This list of shortened category names is used later for visualization
abbr_categories = ['Kit', 'Sto', 'Bed', 'Liv', 'Off', 'Ind', 'Sub',
                   'Cty', 'Bld', 'St', 'HW', 'OC', 'Cst',
                   'Mnt', 'For'];

# Number of training examples per category to use. Max is 100. For
# simplicity, we assume this is the number of test cases per category, as
# well.
num_train_per_cat = 100

# This function returns lists containing the file path for each train
# and test image, as well as lists with the label of each train and
# test image. By default all four of these lists will have 1500 elements
# where each element is a string.
data_path = osp.join('..', 'data')
train_image_paths, test_image_paths, train_labels, test_labels = get_image_paths(data_path,
                                                                                 categories,
                                                                                 num_train_per_cat);

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Section 1: Tiny Image features with Nearest Neighbor classifier

### Section 1a: Represent each image with the Tiny Image feature

Each function to construct features should return an N x d numpy array, where N is the number of paths passed to the function and d is the dimensionality of each image representation. See the starter code for each function for more details.

In [2]:
print('Using the TINY IMAGE representation for images')
train_image_feats = sc.get_tiny_images(train_image_paths)
test_image_feats = sc.get_tiny_images(test_image_paths)

Using the TINY IMAGE representation for images


### Section 1b: Classify each test image by training and using the Nearest Neighbor classifier

Each function to classify test features will return an N element list, where N is the number of test cases and each entry is a string indicating the predicted category for each test image. Each entry in 'predicted_categories' must be one of the 15 strings in 'categories', 'train_labels', and 'test_labels'. See the starter code for each function for more details.

In [3]:
print('Using NEAREST NEIGHBOR classifier to predict test set categories')

predicted_categories = sc.nearest_neighbor_classify(train_image_feats, train_labels, test_image_feats)

Using NEAREST NEIGHBOR classifier to predict test set categories


### Section 1c: Build a confusion matrix and score the recognition system

(You do not need to code anything in this section.)

If we wanted to evaluate our recognition method properly we would train
and test on many random splits of the data. You are not required to do so
for this project.

This function will create a confusion matrix and various image
thumbnails each time it is called. View the confusion matrix to help interpret
your classifier performance. Where is it making mistakes? Are the
confusions reasonable?

Interpreting your performance with 100 training examples per category:
- accuracy  =   0 -> Your code is broken (probably not the classifier's fault! A classifier would have to be amazing to perform this badly).
- accuracy ~= .07 -> Your performance is chance. Something is broken or you ran the starter code unchanged.
- accuracy ~= .20 -> Rough performance with tiny images and nearest neighbor classifier. Performance goes up a few percentage points with K-NN instead of 1-NN.
- accuracy ~= .20 -> Rough performance with tiny images and linear SVM classifier. The linear classifiers will have a lot of trouble trying to separate the classes and may be unstable (e.g. everything classified to one category)
- accuracy ~= .50 -> Rough performance with bag of SIFT and nearest neighbor classifier. Can reach .60 with K-NN and different distance metrics.
- accuracy ~= .60 -> You've gotten things roughly correct with bag of SIFT and a linear SVM classifier.
- accuracy >= .70 -> You've also tuned your parameters well. E.g. number of clusters, SVM regularization, number of patches sampled when building vocabulary, size and step for dense SIFT features.
- accuracy >= .80 -> You've added in spatial information somehow or you've added additional, complementary image features. This represents state of the art in Lazebnik et al 2006.
- accuracy >= .85 -> You've done extremely well. This is the state of the art in the 2010 SUN database paper from fusing many  features. Don't trust this number unless you actually measure many random splits.
- accuracy >= .90 -> You used modern deep features trained on much larger image databases.
- accuracy >= .96 -> You can beat a human at this task. This isn't a realistic number. Some accuracy calculation is broken or your classifier is cheating and seeing the test labels.

In [4]:
show_results(train_image_paths, test_image_paths, train_labels, test_labels, categories, abbr_categories,
             predicted_categories)

<IPython.core.display.Javascript object>

## Section 2: Bag of SIFT features with Nearest Neighbor classifier

### Section 2a: Represent each image with the Bag of SIFT feature

To create a new vocabulary, make sure `vocab_filename` is different than the old vocabulary, or delete the old one.

In [5]:
print('Using the BAG-OF-SIFT representation for images')

vocab_filename = 'vocab.pkl'
if not osp.isfile(vocab_filename):
    # Construct the vocabulary
    print('No existing visual word vocabulary found. Computing one from training images')
    vocab_size = 200  # Larger values will work better (to a point) but be slower to compute
    vocab = sc.build_vocabulary(train_image_paths, vocab_size)
    with open(vocab_filename, 'wb') as f:
        pickle.dump(vocab, f)
        print('{:s} saved'.format(vocab_filename))

train_image_feats = sc.get_bags_of_sifts(train_image_paths, vocab_filename)
test_image_feats = sc.get_bags_of_sifts(test_image_paths, vocab_filename)

Using the BAG-OF-SIFT representation for images
Bag of sift size:  (1500, 200)
Bag of sift size:  (1500, 200)


### Section 2b: Classify each test image by training and using the Nearest Neighbor classifier

In [6]:
print('Using NEAREST NEIGHBOR classifier to predict test set categories')
predicted_categories = sc.nearest_neighbor_classify(train_image_feats, train_labels, test_image_feats)

Using NEAREST NEIGHBOR classifier to predict test set categories


### Section 2c: Build a confusion matrix and score the recognition system

In [7]:
show_results(train_image_paths, test_image_paths, train_labels, test_labels, categories, abbr_categories,
             predicted_categories)

<IPython.core.display.Javascript object>

## Section 3: Bag of SIFT features and SVM classifier
We will reuse the bag of SIFT features from Section 2a.

The difference is that this time we will classify them with a support vector machine (SVM).

### Section 3a: Classify each test image by training and using the SVM classifiers

In [30]:
print('Using SVM classifier to predict test set categories')
predicted_categories = sc.svm_classify(train_image_feats, train_labels, test_image_feats)

Using SVM classifier to predict test set categories


### Section 3b: Build a confusion matrix and score the recognition system

In [31]:
show_results(train_image_paths, test_image_paths, train_labels, test_labels, categories, abbr_categories,
             predicted_categories)

<IPython.core.display.Javascript object>

## EXPERIMENTAL DESIGN SECTION

Section 4a: Cross-Validation 
Use cross-validation to measure performance rather than the fixed test / train split provided by the starter code. Randomly pick 100 training and 100 testing images for each iteration and report average performance and standard deviations.

Run the 2 block below to perform the experiment, and to see results using the best parameters

In [53]:
iter = 15
best_hyperparameters, best_result = sc.cross_validation_splits(train_image_feats, train_labels, test_image_feats, test_labels, categories, iter)
print("Best hyperparameters found: C = %d, tol = %f. Accuracy: %f" % (best_hyperparameters[0],best_hyperparameters[1], best_result[0] * 100))

Results of accuracy and standard deviation with (C,tol) combinations: 
(5, 0.001)  :  (0.7057627070780067, 0.01853445526956051)
(3, 0.0001)  :  (0.6867143668024299, 0.022633728965847987)
(1, 0.0001)  :  (0.6691467729718527, 0.01658429195607021)
(9, 1e-06)  :  (0.7119494827218646, 0.017319250330920653)
(7, 1e-05)  :  (0.7049577573340321, 0.021727422852158725)
(9, 0.0001)  :  (0.705394732389214, 0.013823978901827301)
(7, 0.001)  :  (0.707957281932551, 0.020436611610602555)
(3, 0.001)  :  (0.696014277052372, 0.030380084014608105)
(7, 0.0001)  :  (0.7003419704181971, 0.024929476360439157)
(3, 1e-05)  :  (0.6938278523977331, 0.02241636746000017)
(5, 0.0001)  :  (0.7068920832731739, 0.0252347318333805)
(1, 1e-05)  :  (0.6641520513576271, 0.020686256541322046)
(9, 0.001)  :  (0.7015255819067637, 0.01344561067151784)
(9, 1e-05)  :  (0.7067715118576221, 0.01638143901565866)
(3, 1e-06)  :  (0.7031741472678983, 0.01289781091026515)
(1, 1e-06)  :  (0.6609473321758838, 0.013708004385636591)
(7, 1e-

In [54]:
#Results of best hyperparameters on the given training and testing dataset
tol = best_hyperparameters[1]
c = best_hyperparameters[0]
print('Using SVM classifier to predict test set categories')
predicted_categories = sc.svm_classify(train_image_feats, train_labels, test_image_feats, tol, c)

show_results(train_image_paths, test_image_paths, train_labels, test_labels, categories, abbr_categories,
             predicted_categories)

Using SVM classifier to predict test set categories


<IPython.core.display.Javascript object>

Section 4b: Use of Validation set along with training and testing set
Add a validation set to your training process to tune learning parameters. This validation set could either be a subset of the training set or some of the otherwise unused test set.

Run the 2 block below to perform the experiment, and to see results using the best parameters

In [56]:
best_hyperparameters, best_result = sc.cross_validation_splits(train_image_feats, train_labels, test_image_feats, test_labels, categories, iter, validation=True)
print("Best hyperparameters found: C = %d, tol = %f. Accuracy: %f" % (best_hyperparameters[0],best_hyperparameters[1], best_result[0] * 100))

Results of accuracy with (C,tol) combinations: 
(5, 0.001)  :  (0.6321180088041144,)
(3, 0.0001)  :  (0.61300472654888,)
(1, 0.0001)  :  (0.5764904465666424,)
(9, 1e-06)  :  (0.6317889975042305,)
(7, 1e-05)  :  (0.6346688231868224,)
(9, 0.0001)  :  (0.6719845342829215,)
(7, 0.001)  :  (0.6709498985273812,)
(3, 0.001)  :  (0.6338437924719552,)
(7, 0.0001)  :  (0.6328003734987401,)
(3, 1e-05)  :  (0.6034422678036312,)
(5, 0.0001)  :  (0.6665652908181644,)
(1, 1e-05)  :  (0.6075953118739331,)
(9, 0.001)  :  (0.6454798253730921,)
(9, 1e-05)  :  (0.6368512694408135,)
(3, 1e-06)  :  (0.6427640690501831,)
(1, 1e-06)  :  (0.6001620952068673,)
(7, 1e-06)  :  (0.6470817592682028,)
(1, 0.001)  :  (0.6120456429586197,)
(5, 1e-06)  :  (0.5977451035318437,)
(5, 1e-05)  :  (0.6440927015589905,)
Best hyperparameters found: C = 9, tol = 0.000100. Accuracy: 67.198453


In [57]:
#Results of best hyperparameters on the given training and testing dataset
tol = best_hyperparameters[1]
c = best_hyperparameters[0]
print('Using SVM classifier to predict test set categories')
predicted_categories = sc.svm_classify(train_image_feats, train_labels, test_image_feats, tol, c)

show_results(train_image_paths, test_image_paths, train_labels, test_labels, categories, abbr_categories,
             predicted_categories)

Using SVM classifier to predict test set categories


<IPython.core.display.Javascript object>

Section 4c: Varying Vocabulary sizes 
Experiment with many different vocabulary sizes and report performance. E.g. 10, 20, 50, 100, 200, 400, 1000, 10000.

Run the 2 block below to perform the experiment, and to see results using the best parameter

In [67]:
test_labels = np.array(test_labels)
print('Testing Vocabulary of different sizes')
size = [20, 50, 100, 200, 400, 1000, 10000]
result = {}
for idx in size:
    vocab_filename = 'vocab-'+str(idx)+'.pkl'
    if not osp.isfile(vocab_filename):
        # Construct the vocabulary
        vocab_size = idx  # Larger values will work better (to a point) but be slower to compute
        vocab = sc.build_vocabulary(train_image_paths, vocab_size)
        with open(vocab_filename, 'wb') as f:
            pickle.dump(vocab, f)

    train_image_feats = sc.get_bags_of_sifts(train_image_paths, vocab_filename)
    test_image_feats = sc.get_bags_of_sifts(test_image_paths, vocab_filename)
    
    predicted_categories = sc.svm_classify(train_image_feats, train_labels, test_image_feats)
    acc = sc.evaluate_svm_performance(test_labels, predicted_categories, categories)
    result[vocab_size] = acc

print("Results of accuracy with different vocabulary sizes: ")
for key in result:
    print(key, " : ", result[key])

Testing Vocabulary of different sizes
Results of accuracy with different vocabulary sizes: 
400  :  0.676
1000  :  0.6686666666666666
50  :  0.5773333333333334
100  :  0.6506666666666667
200  :  0.702
20  :  0.5046666666666667
10000  :  0.4773333333333333


In [69]:
best_size = max(result.keys(), key=(lambda key: result[key]))
best_results = result[best_size]
print("Best size: ", best_size)
vocab_filename = 'vocab-'+str(best_size)+'.pkl'
if not osp.isfile(vocab_filename):
    # Construct the vocabulary
    vocab_size = best_size  # Larger values will work better (to a point) but be slower to compute
    vocab = sc.build_vocabulary(train_image_paths, vocab_size)
    with open(vocab_filename, 'wb') as f:
        pickle.dump(vocab, f)

train_image_feats = sc.get_bags_of_sifts(train_image_paths, vocab_filename)
test_image_feats = sc.get_bags_of_sifts(test_image_paths, vocab_filename)

predicted_categories = sc.svm_classify(train_image_feats, train_labels, test_image_feats)

show_results(train_image_paths, test_image_paths, train_labels, test_labels, categories, abbr_categories,
             predicted_categories)

Best size:  200


<IPython.core.display.Javascript object>