Created by Tsung-Yu Lin, Aruni RoyChowdhury and Subhransu Maji at UMass Amherst
This repository contains the code for reproducing the results in B-CNN [ICCV 2015] and Improved B-CNN [BMVC 2017] papers:
@inproceedings{lin2015bilinear,
Author = {Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji},
Title = {Bilinear CNNs for Fine-grained Visual Recognition},
Booktitle = {International Conference on Computer Vision (ICCV)},
Year = {2015}
}
@inproceedings{lin2017impbcnn,
Author = {Tsung-Yu Lin, and Subhransu Maji},
Booktitle = {British Machine Vision Conference (BMVC)},
Title = {Improved Bilinear Pooling with CNNs},
Year = {2017}}
The code is tested on Ubuntu 14.04 using NVIDIA Titan X GPU and MATLAB R2014b. The recent release on 12/15/17 includes the implementation of our BMVC paper using matrix normalization. The implementation of bilinear pooling layers and matirx normalization layers are wrapped into a separate bcnn-package.
Link to the project page.
Method | Birds | Birds + box | Aircrafts | Cars |
---|---|---|---|---|
B-CNN [M,M] | 78.1% | 80.4% | 77.9% | 86.5% |
B-CNN [D,M] | 84.1% | 85.1% | 83.9% | 91.3% |
B-CNN [D,D] | 84.0% | 84.8% | 84.1% | 90.6% |
- Dataset details:
- Birds: CUB-200-2011 dataset. Birds + box uses bounding-boxes at training and test time.
- Aircrafts: FGVC aircraft dataset
- Cars: Stanford cars dataset
- These results are with domain specific fine-tuning. For more details see the updated B-CNN tech report.
- The pre-trained models are available (see below).
This code depends on VLFEAT and MatConvNet and bcnn-package. They are pre-defined as submodules for this project. To download the code, type:
>> git submodule init
>> git submodule update
Follow instructions on VLFEAT and MatConvNet project pages to install them first. Our code is built on MatConvNet version 1.0-beta19
. To retrieve a particular version of MatConvNet using git, cd to MatConvNet folder and type:
>> git fetch --tags
>> git checkout tags/v1.0-beta19
Once these are installed edit the setup.m
to run the corresponding setup
scripts.
The implementation of the bilinear combination layer in symmetic and assymetic CNNs is included in the bcnn-package. This code contains scripts to fine-tune models and run experiments on several fine-grained recognition datasets. We also provide pre-trained models.
ImageNet LSVRC 2012 pre-trained models: We use vgg-m and vgg-verydeep-16 as our basic models. The format of pre-trained models on MatConvnet has evolved continuously. In this project, we use the models in version beta18. Please download the models from matconvnet pre-trained models.
Fine-tuned models: We provide three B-CNN fine-trained models ([M,M], [D,M], and [D,D]) and SVM models trained on respective bcnn features for each of CUB-200-2011, FGVC Aircraft and Cars dataset. These can be downloaded individually here. The fine-tuned models for B-CNN with matrix square-root normalization will be added soon.
You can also download all the model files as a tar.gz here.
To run experiments download the datasets from various places and edit the model_setup.m
file to point it to the location of each dataset. For instance, you can point to the birds dataset directory by setting opts.cubDir = 'data/cub'
.
The script bird_demo
takes an image and runs our pre-trained fine-grained bird classifier to predict the top five species and shows some examples images of the class with the highest score. If you haven't already done so, download our pre-trained B-CNN [D,M] and SVM models for this demo and locate them in data/models
. In addition, download the CUB-200-2011 dataset to data/cub
as well. You can follow our default setting or edit opts
in the script to point it to the models and dataset. If you have GPU installed on your machine, set opts.useGpu=true
to speedup the computation. You should see the following output when you run bird_demo()
:
>> bird_demo();
0.09s to load imdb.
1.63s to load models into memory.
Top 5 prediction for test_image.jpg:
064.Ring_billed_Gull
059.California_Gull
147.Least_Tern
062.Herring_Gull
060.Glaucous_winged_Gull
3.80s to make predictions [GPU=0]
To run it on your own images run bird_demo('imgPath', 'favorite-bird.jpg');
. Classification roughlly takes 4s per image on my laptop on a CPU. On an NVIDIA K40 GPU with bigger batch sizes you should roughly get a throughput of 8 images/second with the B-CNN [D,M]
model.
run_experments.m
extracts B-CNN features and trains a svm classifier on fine-grained categories. Following shows how to setup B-CNN models:
-
Symmetric B-CNN: extracts the self outer-product of features at 'layera'.
bcnn.opts = {... 'type', 'bcnn', ... 'modela', PRETRAINMODEL, ... 'layera', 14,... 'modelb', [], ... 'layerb', [],... } ;
-
Cross layer B-CNN: extracts the outer-product between features at 'layera' and 'layerb' using the same CNN.
bcnn.opts = {... 'type', 'bcnn', ... 'modela', PRETRAINMODEL, ... 'layera', 14,... 'modelb', [], ... 'layerb', 12,... } ;
-
Asymmetric B-CNN: extracts the outer-product between features from CNN 'modela' at 'layera' and CNN 'modelb' at 'layerb'.
bcnn.opts = {... 'type', 'bcnn', ... 'modela', PRETRAINMODEL_A, ... 'layera', 30,... 'modelb', PRETRAINMODEL_B, ... 'layerb', 14,... } ;
-
Fine-tuned B-CNN: If you fine-tune a B-CNN network (see next section), you can evaluate the model using:
bcnn.opts = {... 'type', 'bcnn', ... 'modela', FINE-TUNED_MODEL, ... 'layera', [],... 'modelb', [], ... 'layerb', [],... } ;
-
B-CNN with matrix square-root:
impbcnn.opts = {... 'type', 'impbcnn', ... 'model', PRETRAINMODEL, ... 'layer', 14, ... 'pow', 0.5, ... 'sigma', 1, ... 'method', 'schulz', ... % 'schulz' for fast approximation or 'svd' 'maxIter', 5, ... % used for 'schulz' only } ;
See run_experiments_bcnn_train.m
for fine-tuning a B-CNN model. Note that this code caches all the intermediate results during fine-tuning which takes about 200GB disk space.
Here are the steps to fine-tuning a B-CNN [M,M] model on the CUB dataset:
-
Download
CUB-200-2011
dataset (see link above) -
Edit
opts.cubDir=CUBROOT
inmodel_setup.m
, CUBROOT is the location of CUB dataset. -
Download
imagenet-vgg-m
model (see link above) -
Set the path of the model in
run_experiments_bcnn_train.m
. For example, set PRETRAINMODEL='data/model/imagenet-vgg-m.mat', to use the Oxford's VGG-M model trained on ImageNet LSVRC 2012 dataset. You also have to set thebcnnmm.opts
to:bcnnmm.opts = {... 'type', 'bcnn', ... 'modela', PRETRAINMODEL, ... 'layera', 14,... 'modelb', PRETRAINMODEL, ... 'layerb', 14,... 'shareWeight', true,... } ;
The option
shareWeight=true
implies that the blinear model uses the same CNN to extract both features resulting in a symmetric model. For assymetric models setshareWeight=false
. Note that this roughly doubles the GPU memory requirement. Thecnn_train()
provided from MatConvNet requires the setup of validation set. You need to prepare a validation set for the datasets without pre-defined validation set. -
Once the fine-tuning is complete, you can train a linear SVM on the extracted features to evaluate the model. See
run_experiments.m
for training/testing using SVMs. You can simply set the MODELPATH to the location of the fine-tuned model by setting MODELPATH='data/ft-models/bcnn-cub-mm.mat' and thebcnnmm.opts
to:bcnnmm.opts = {... 'type', 'bcnn', ... 'modela', MODELPATH, ... 'layera', [],... 'modelb', [], ... 'layerb', [],... } ;
-
And type
>> run_experiments()
on the MATLAB command line. The results with be saved in theopts.resultPath
.
-
Follow the steps 1 to steps 3 described aboved in
fine-tuning B-CNN models
. -
Set the path of the model in
run_experiments_bcnn_train.m
. For example, set PRETRAINMODEL='data/model/imagenet-vgg-m.mat', to use the Oxford's VGG-M model trained on ImageNet LSVRC 2012 dataset. You also have to set theimpbcnnm.opts
to:impbcnnm.opts = {... 'type', 'impbcnn', ... 'model', MODELPATH, ... 'layer', 14, ... 'pow', 0.5, ... 'sigma', 1, ... 'method', 'svd', ... % 'schulz' for fast approximation or 'svd' 'bpMethod', 'lyap', ... 'maxIter', 5, ... % used for 'schulz' only } ;
The option
method
specifies the approach to compute the matrix-square root. Thesvd
approach computes the matrix square-root by taking the square-root of eigenvalues and the svd matrices are cached for backward compuation. Thh fast approximationschulz
by solving Newton iteration is an order-of-magnitude faster during forward pass. However, the backward still requires svd computation and offer no benefit in efficiency during training phase. -
Once the fine-tuning is complete, you can train a linear SVM on the extracted features to evaluate the model. See
run_experiments.m
for training/testing using SVMs. You can simply set the MODELPATH to the location of the fine-tuned model and set theimpbcnnmm.opts
to:impbcnnm.opts = {.. 'type', 'impbcnn', ... 'model', PRETRAINMODEL, ... 'layer', [], ... 'pow', 0.5, ... 'sigma', 1, ... 'method', 'schulz', ... % 'schulz' for fast approximation or 'svd' 'maxIter', 5, ... % used for 'schulz' only } ;
-
And type
>> run_experiments()
on the MATLAB command line. The results with be saved in theopts.resultPath
.
The asymmetric B-CNN model is implemented using two networks whose feature outputs are bilinearly combined followed by normalization and softmax loss layers. The network is constructed using DagNN structure. You can find the details in initializeNetworksTwoStreams()
and bcnn_train_dag()
.
When the same network is used to extract both features, the symmetric B-CNN model is implemented as a single network architecture consisting of bilinearpool
, sqrt
, and l2norm
layers on the top of convolutional
layers. This implementation is about twice as fast and memory efficient than asymmetric implementaion. You can find the details in initializeNetworkSharedWeights()
and bcnn_train_simplenn()
.
The code for B-CNN is implemented in the following MATLAB functions:
vl_bilinearnn()
: This extendsvl_simplenn()
of the MatConvNet library to include the bilinear layers.vl_nnbilinearpool()
: Bilinear feature pooling with outer product with itself.vl_nnbilinearclpool()
: Bilinear feature pooling with outer product of two different features. Current version only supports the same resolution of two feature outputs.vl_nnsqrt()
: Signed square-root normalization.vl_nnl2norm()
: L2 normalization.
The code can be used for other classification datasets as well. You have to implement the corresponding >> imdb = <dataset-name>_get_database()
function that returns the imdb
structure in the right format. Take a look at the cub_get_database.m
file as an example.
We thank MatConvNet and VLFEAT teams for creating and maintaining these excellent packages.