Meng-Jiun Chiou (邱盟竣) NCTU

Project 3 / Scene recognition with bag of words

Overview

The project is related to the whole process of image classification. It required to implement all of the process, including feature representation and classiication.

Implementation

There're actually five .m file should be completed in this project, which are get_tiny_images.m, nearest_neighbor_classify.m, build_vocabulary.m, get_bags_of_sifts.m, and svm_classify.m.

Using tiny images as feature representation

With VLFeat, one can easily extract SIFT from any image. In my implementation I used Matlab interface of VLFeat to get the descriptors. To build a tiny image feature, simply resize the original image to a very small square resolution, e.g. 16x16.

In my get_tiny_images.m, mainly codes like below:

image_feats = [];
length = 16;

for i = 1:size(image_paths)
    img = imread(image_paths{i, 1});
    img = imresize(img, [length length]);
    
    new_img = [];
    for j = 1:length
        new_img = [new_img, img(j, :)];
    end
    
    new_img = double(new_img);
    new_img = new_img ./ sum(new_img);
    new_img = new_img - mean(new_img);
    
    image_feats = [image_feats; new_img];
end

note that new_img vectors are normalized as unit length and zero mean.

Classifying by the Nearest Neighbor

One of the easiest way to dealing with classification problem is by nearest neighbor approach. Just find the nearest training pictures of testing pictures. However, this approach is easily affected by noises.

In my nearest_neighbor_classify.m, mainly codes like below:

dist = vl_alldist2(train_image_feats', test_image_feats');
dist = dist';

predicted_categories = [];
for i = 1:size(test_image_feats,1)
    [Y, I] = min(dist(i, :));
    label = train_labels(I, 1);
    predicted_categories = [predicted_categories; label];
end

Turning to better method: SIFT and Bag-Of-Features

Beyond the easily but worse-performance approaches like tiny pictures, one more popular and much better approach of feature representation is the combination of SIFT and Bag-of-Features. To implement, one should firstly build an vocabulary dictionary, which is essentially visual words clusters.

In my build_vocabulary.m file,

step = 15;
bin_size = 8;
features = [];

for i = 1:length(image_paths)
    img = single( imread(image_paths{i}) );
    if size(img, 3) > 1
        img =rgb2gray(img);
    end
    [locations, SIFT_features] = vl_dsift(img, 'fast', 'step', step, 'size', bin_size);
    features = [features, SIFT_features];
end

[centers, assignments] = vl_kmeans(double(features), vocab_size)
vocab = centers';

note that one could choose different parameters (step, bin_size, etc.) to optimize the result.

Next, assign the nearest visual word to each SIFT features, and then obtain the histogram representation of each picture. In my get_bags_of_sifts.m file,

for i = 1:length(image_paths)
    img = single( imread(image_paths{i}) );
    if size(img, 3) > 1
        img =rgb2gray(img);
    end
    
    [locations, SIFT_features] = vl_dsift(img, 'step', step, 'size', bin_size);
    
    [index , dist] = vl_kdtreequery(forest , vocab' , double(SIFT_features));
    
    feature_hist = hist(double(index), vocab_size);
    feature_hist = feature_hist ./ sum(feature_hist);
    % feature_hist = feature_hist ./ norm(feature_hist);
    
    image_feats(i, :) = feature_hist;
end

Classification with Support Vector Machine

One of the most popular and high-accuracy approach of classification is Support Vector Machine (SVM). Linear SVM can be good when applied to data which are of high dimension.

In my svm_classiy.m file, I first construct the labels of all training data:

for i = 1:num_categories
    matching_indices = strcmp(categories(i) , train_labels);
    matching_indices = double(matching_indices);
    for j = 1: size(train_labels, 1)
        if(matching_indices(j) == 0)
            matching_indices(j) = -1;
        end
    end

then, train a linear SVM classifier with vl_svmtrain() and record scores:

[w, b] = vl_svmtrain(train_image_feats', matching_indices, lambda);
scores = [scores; (w' * test_image_feats' + b) ];

Finally, pick the highest scores (even smaller than zero, it's fine):

% get maximum scores
[max_values, max_indices] = max(scores);
predicted_categories = categories(max_indices');

In addition, I also implement non-linear SVM (RBF kernel) with prestigious library LibSVM. I make it automated-tuning with parameters C and gamma. In my libsvm_classify.m:

addpath('~/lib/libsvm-316');
addpath('~/lib/libsvm-316/matlab')

% grid of parameters
folds = 5;
[C,gamma] = meshgrid(-5:2:15, -15:2:3);

% grid search, and cross-validation
cv_acc = zeros(numel(C),1);
for i=1:numel(C)
    cv_acc(i) = svmtrain(total_matching_indices, train_image_feats, sprintf('-c %f -g %f -v %d', 2^C(i), 2^gamma(i), folds));
end

% pair (C,gamma) with best accuracy
[~,idx] = max(cv_acc);

% train model using best_C and best_gamma
best_C = 2^C(idx);
best_gamma = 2^gamma(idx);

SVM = svmtrain(total_matching_indices, train_image_feats, sprintf('-c %f -g %f', 2^C(i), 2^gamma(i)));
% total_matching_indices ~= test_image_labels as using 100 pics per class orderly
[predicted_label, acc, dic_value] = svmpredict(total_matching_indices, test_image_feats, SVM);
fprintf('Best C = 2^%d, Best gamma = 2^%d\n', C(idx), gamma(idx));

predicted_categories = categories(predicted_label);

Installation

Download the repository, open your matlab and change the work folder to homework3/code. Then, set images path of homework3/data.

Finally, click Run!

Results

Tiny images representation and nearest neighbor classifier

Accuracy (mean of diagonal of confusion matrix) is 0.201

Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label
Kitchen	0.060			Office	InsideCity	Highway	TallBuilding
Store	0.010			Office	Industrial	Coast	Coast
Bedroom	0.130			Office	Industrial	Mountain	OpenCountry
LivingRoom	0.060			Store	Industrial	Street	Coast
Office	0.070			Kitchen	Bedroom	Forest	Kitchen
Industrial	0.030			Suburb	Forest	Coast	TallBuilding
Suburb	0.220			Street	Store	Street	Street
InsideCity	0.070			Bedroom	Mountain	Mountain	Forest
TallBuilding	0.170			Industrial	Industrial	Highway	LivingRoom
Street	0.380			Bedroom	LivingRoom	Highway	Highway
Highway	0.580			Street	Street	Coast	OpenCountry
OpenCountry	0.380			Highway	Store	Highway	Highway
Coast	0.470			Store	Highway	Industrial	Highway
Mountain	0.180			Store	Office	Coast	Coast
Forest	0.200			Store	LivingRoom	Coast	Coast
Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label

Bag of SIFT representation and nearest neighbor classifier

Accuracy (mean of diagonal of confusion matrix) is 0.463

Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label
Kitchen	0.360			Office	InsideCity	Office	Store
Store	0.400			InsideCity	TallBuilding	Suburb	Suburb
Bedroom	0.120			Kitchen	TallBuilding	Kitchen	InsideCity
LivingRoom	0.270			Coast	Bedroom	InsideCity	Office
Office	0.700			LivingRoom	Bedroom	Kitchen	Kitchen
Industrial	0.310			Street	TallBuilding	Store	Street
Suburb	0.850			Coast	OpenCountry	Store	InsideCity
InsideCity	0.270			Bedroom	TallBuilding	Store	Industrial
TallBuilding	0.360			LivingRoom	Bedroom	Bedroom	Industrial
Street	0.500			Bedroom	InsideCity	Industrial	Industrial
Highway	0.750			Coast	Mountain	Kitchen	Street
OpenCountry	0.360			Coast	Industrial	Suburb	Mountain
Coast	0.350			Industrial	OpenCountry	Suburb	OpenCountry
Mountain	0.460			Highway	Highway	Suburb	Forest
Forest	0.890			Coast	Mountain	Suburb	Suburb
Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label

Bag of SIFT representation and linear SVM classifier

Accuracy (mean of diagonal of confusion matrix) is 0.614

Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label
Kitchen	0.570			LivingRoom	Bedroom	Bedroom	Office
Store	0.440			InsideCity	LivingRoom	Forest	Mountain
Bedroom	0.350			Street	Kitchen	TallBuilding	Store
LivingRoom	0.180			Kitchen	Office	TallBuilding	Street
Office	0.900			InsideCity	LivingRoom	TallBuilding	Bedroom
Industrial	0.400			Forest	OpenCountry	TallBuilding	Mountain
Suburb	0.960			Industrial	Industrial	Coast	Coast
InsideCity	0.470			Store	OpenCountry	Highway	TallBuilding
TallBuilding	0.690			Store	Industrial	Street	Mountain
Street	0.540			LivingRoom	LivingRoom	Highway	TallBuilding
Highway	0.820			Street	Coast	Bedroom	Coast
OpenCountry	0.380			Highway	TallBuilding	Bedroom	Coast
Coast	0.810			Highway	OpenCountry	Mountain	Suburb
Mountain	0.810			TallBuilding	TallBuilding	Street	Suburb
Forest	0.890			TallBuilding	OpenCountry	Mountain	Mountain
Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label

Bag of SIFT representation and Non-linear SVM classifier (LibSVM with RBF kernel)

Accuracy (mean of diagonal of confusion matrix) is 0.615

Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label
Kitchen	0.540			Office	Office	Street	InsideCity
Store	0.490			TallBuilding	Industrial	Highway	InsideCity
Bedroom	0.360			Store	InsideCity	Store	LivingRoom
LivingRoom	0.340			Bedroom	Office	Bedroom	Industrial
Office	0.750			LivingRoom	LivingRoom	Bedroom	Kitchen
Industrial	0.470			Street	LivingRoom	Store	LivingRoom
Suburb	0.920			OpenCountry	OpenCountry	Industrial	Bedroom
InsideCity	0.500			Industrial	TallBuilding	Store	TallBuilding
TallBuilding	0.570			InsideCity	InsideCity	Store	Mountain
Street	0.610			Store	OpenCountry	Store	Industrial
Highway	0.820			Industrial	Coast	LivingRoom	Street
OpenCountry	0.530			Street	Forest	TallBuilding	Highway
Coast	0.710			Highway	Mountain	OpenCountry	InsideCity
Mountain	0.730			Bedroom	OpenCountry	Suburb	Suburb
Forest	0.890			Mountain	OpenCountry	Mountain	OpenCountry
Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.md

index.md

Meng-Jiun Chiou (邱盟竣) NCTU

Project 3 / Scene recognition with bag of words

Overview

Implementation

Installation

Results

Tiny images representation and nearest neighbor classifier

Bag of SIFT representation and nearest neighbor classifier

Bag of SIFT representation and linear SVM classifier

Bag of SIFT representation and Non-linear SVM classifier (LibSVM with RBF kernel)

Files

index.md

Latest commit

History

index.md

File metadata and controls

Meng-Jiun Chiou (邱盟竣) NCTU

Project 3 / Scene recognition with bag of words

Overview

Implementation

Installation

Results

Tiny images representation and nearest neighbor classifier

Bag of SIFT representation and nearest neighbor classifier

Bag of SIFT representation and linear SVM classifier

Bag of SIFT representation and Non-linear SVM classifier (LibSVM with RBF kernel)