Skip to content

Latest commit

 

History

History
994 lines (945 loc) · 62.4 KB

File metadata and controls

994 lines (945 loc) · 62.4 KB

Meng-Jiun Chiou (邱盟竣) NCTU

Project 3 / Scene recognition with bag of words

Overview

The project is related to the whole process of image classification. It required to implement all of the process, including feature representation and classiication.

Implementation

There're actually five .m file should be completed in this project, which are get_tiny_images.m, nearest_neighbor_classify.m, build_vocabulary.m, get_bags_of_sifts.m, and svm_classify.m.

  • Using tiny images as feature representation

With VLFeat, one can easily extract SIFT from any image. In my implementation I used Matlab interface of VLFeat to get the descriptors. To build a tiny image feature, simply resize the original image to a very small square resolution, e.g. 16x16.

In my get_tiny_images.m, mainly codes like below:

image_feats = [];
length = 16;

for i = 1:size(image_paths)
    img = imread(image_paths{i, 1});
    img = imresize(img, [length length]);
    
    new_img = [];
    for j = 1:length
        new_img = [new_img, img(j, :)];
    end
    
    new_img = double(new_img);
    new_img = new_img ./ sum(new_img);
    new_img = new_img - mean(new_img);
    
    image_feats = [image_feats; new_img];
end

note that new_img vectors are normalized as unit length and zero mean.

  • Classifying by the Nearest Neighbor

One of the easiest way to dealing with classification problem is by nearest neighbor approach. Just find the nearest training pictures of testing pictures. However, this approach is easily affected by noises.

In my nearest_neighbor_classify.m, mainly codes like below:

dist = vl_alldist2(train_image_feats', test_image_feats');
dist = dist';

predicted_categories = [];
for i = 1:size(test_image_feats,1)
    [Y, I] = min(dist(i, :));
    label = train_labels(I, 1);
    predicted_categories = [predicted_categories; label];
end
  • Turning to better method: SIFT and Bag-Of-Features

Beyond the easily but worse-performance approaches like tiny pictures, one more popular and much better approach of feature representation is the combination of SIFT and Bag-of-Features. To implement, one should firstly build an vocabulary dictionary, which is essentially visual words clusters.

In my build_vocabulary.m file,

step = 15;
bin_size = 8;
features = [];

for i = 1:length(image_paths)
    img = single( imread(image_paths{i}) );
    if size(img, 3) > 1
        img =rgb2gray(img);
    end
    [locations, SIFT_features] = vl_dsift(img, 'fast', 'step', step, 'size', bin_size);
    features = [features, SIFT_features];
end

[centers, assignments] = vl_kmeans(double(features), vocab_size)
vocab = centers';

note that one could choose different parameters (step, bin_size, etc.) to optimize the result.

Next, assign the nearest visual word to each SIFT features, and then obtain the histogram representation of each picture. In my get_bags_of_sifts.m file,

for i = 1:length(image_paths)
    img = single( imread(image_paths{i}) );
    if size(img, 3) > 1
        img =rgb2gray(img);
    end
    
    [locations, SIFT_features] = vl_dsift(img, 'step', step, 'size', bin_size);
    
    [index , dist] = vl_kdtreequery(forest , vocab' , double(SIFT_features));
    
    feature_hist = hist(double(index), vocab_size);
    feature_hist = feature_hist ./ sum(feature_hist);
    % feature_hist = feature_hist ./ norm(feature_hist);
    
    image_feats(i, :) = feature_hist;
end
  • Classification with Support Vector Machine

One of the most popular and high-accuracy approach of classification is Support Vector Machine (SVM). Linear SVM can be good when applied to data which are of high dimension.

In my svm_classiy.m file, I first construct the labels of all training data:

for i = 1:num_categories
    matching_indices = strcmp(categories(i) , train_labels);
    matching_indices = double(matching_indices);
    for j = 1: size(train_labels, 1)
        if(matching_indices(j) == 0)
            matching_indices(j) = -1;
        end
    end

then, train a linear SVM classifier with vl_svmtrain() and record scores:

[w, b] = vl_svmtrain(train_image_feats', matching_indices, lambda);
scores = [scores; (w' * test_image_feats' + b) ];

Finally, pick the highest scores (even smaller than zero, it's fine):

% get maximum scores
[max_values, max_indices] = max(scores);
predicted_categories = categories(max_indices');

In addition, I also implement non-linear SVM (RBF kernel) with prestigious library LibSVM. I make it automated-tuning with parameters C and gamma. In my libsvm_classify.m:

addpath('~/lib/libsvm-316');
addpath('~/lib/libsvm-316/matlab')

% grid of parameters
folds = 5;
[C,gamma] = meshgrid(-5:2:15, -15:2:3);

% grid search, and cross-validation
cv_acc = zeros(numel(C),1);
for i=1:numel(C)
    cv_acc(i) = svmtrain(total_matching_indices, train_image_feats, sprintf('-c %f -g %f -v %d', 2^C(i), 2^gamma(i), folds));
end

% pair (C,gamma) with best accuracy
[~,idx] = max(cv_acc);

% train model using best_C and best_gamma
best_C = 2^C(idx);
best_gamma = 2^gamma(idx);

SVM = svmtrain(total_matching_indices, train_image_feats, sprintf('-c %f -g %f', 2^C(i), 2^gamma(i)));
% total_matching_indices ~= test_image_labels as using 100 pics per class orderly
[predicted_label, acc, dic_value] = svmpredict(total_matching_indices, test_image_feats, SVM);
fprintf('Best C = 2^%d, Best gamma = 2^%d\n', C(idx), gamma(idx));

predicted_categories = categories(predicted_label);

Installation

Download the repository, open your matlab and change the work folder to homework3/code. Then, set images path of homework3/data.

Finally, click Run!

Results

Tiny images representation and nearest neighbor classifier


Accuracy (mean of diagonal of confusion matrix) is 0.201

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.060
Office

InsideCity

Highway

TallBuilding
Store 0.010
Office

Industrial

Coast

Coast
Bedroom 0.130
Office

Industrial

Mountain

OpenCountry
LivingRoom 0.060
Store

Industrial

Street

Coast
Office 0.070
Kitchen

Bedroom

Forest

Kitchen
Industrial 0.030
Suburb

Forest

Coast

TallBuilding
Suburb 0.220
Street

Store

Street

Street
InsideCity 0.070
Bedroom

Mountain

Mountain

Forest
TallBuilding 0.170
Industrial

Industrial

Highway

LivingRoom
Street 0.380
Bedroom

LivingRoom

Highway

Highway
Highway 0.580
Street

Street

Coast

OpenCountry
OpenCountry 0.380
Highway

Store

Highway

Highway
Coast 0.470
Store

Highway

Industrial

Highway
Mountain 0.180
Store

Office

Coast

Coast
Forest 0.200
Store

LivingRoom

Coast

Coast
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Bag of SIFT representation and nearest neighbor classifier


Accuracy (mean of diagonal of confusion matrix) is 0.463

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.360
Office

InsideCity

Office

Store
Store 0.400
InsideCity

TallBuilding

Suburb

Suburb
Bedroom 0.120
Kitchen

TallBuilding

Kitchen

InsideCity
LivingRoom 0.270
Coast

Bedroom

InsideCity

Office
Office 0.700
LivingRoom

Bedroom

Kitchen

Kitchen
Industrial 0.310
Street

TallBuilding

Store

Street
Suburb 0.850
Coast

OpenCountry

Store

InsideCity
InsideCity 0.270
Bedroom

TallBuilding

Store

Industrial
TallBuilding 0.360
LivingRoom

Bedroom

Bedroom

Industrial
Street 0.500
Bedroom

InsideCity

Industrial

Industrial
Highway 0.750
Coast

Mountain

Kitchen

Street
OpenCountry 0.360
Coast

Industrial

Suburb

Mountain
Coast 0.350
Industrial

OpenCountry

Suburb

OpenCountry
Mountain 0.460
Highway

Highway

Suburb

Forest
Forest 0.890
Coast

Mountain

Suburb

Suburb
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Bag of SIFT representation and linear SVM classifier


Accuracy (mean of diagonal of confusion matrix) is 0.614

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.570
LivingRoom

Bedroom

Bedroom

Office
Store 0.440
InsideCity

LivingRoom

Forest

Mountain
Bedroom 0.350
Street

Kitchen

TallBuilding

Store
LivingRoom 0.180
Kitchen

Office

TallBuilding

Street
Office 0.900
InsideCity

LivingRoom

TallBuilding

Bedroom
Industrial 0.400
Forest

OpenCountry

TallBuilding

Mountain
Suburb 0.960
Industrial

Industrial

Coast

Coast
InsideCity 0.470
Store

OpenCountry

Highway

TallBuilding
TallBuilding 0.690
Store

Industrial

Street

Mountain
Street 0.540
LivingRoom

LivingRoom

Highway

TallBuilding
Highway 0.820
Street

Coast

Bedroom

Coast
OpenCountry 0.380
Highway

TallBuilding

Bedroom

Coast
Coast 0.810
Highway

OpenCountry

Mountain

Suburb
Mountain 0.810
TallBuilding

TallBuilding

Street

Suburb
Forest 0.890
TallBuilding

OpenCountry

Mountain

Mountain
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Bag of SIFT representation and Non-linear SVM classifier (LibSVM with RBF kernel)


Accuracy (mean of diagonal of confusion matrix) is 0.615

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.540
Office

Office

Street

InsideCity
Store 0.490
TallBuilding

Industrial

Highway

InsideCity
Bedroom 0.360
Store

InsideCity

Store

LivingRoom
LivingRoom 0.340
Bedroom

Office

Bedroom

Industrial
Office 0.750
LivingRoom

LivingRoom

Bedroom

Kitchen
Industrial 0.470
Street

LivingRoom

Store

LivingRoom
Suburb 0.920
OpenCountry

OpenCountry

Industrial

Bedroom
InsideCity 0.500
Industrial

TallBuilding

Store

TallBuilding
TallBuilding 0.570
InsideCity

InsideCity

Store

Mountain
Street 0.610
Store

OpenCountry

Store

Industrial
Highway 0.820
Industrial

Coast

LivingRoom

Street
OpenCountry 0.530
Street

Forest

TallBuilding

Highway
Coast 0.710
Highway

Mountain

OpenCountry

InsideCity
Mountain 0.730
Bedroom

OpenCountry

Suburb

Suburb
Forest 0.890
Mountain

OpenCountry

Mountain

OpenCountry
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label