This repository contains a PyTorch implementation of
@inproceedings{Dixit17a,
author = {M.~Dixit and R.~Kwitt and M.~Niethammer and N.~Vasconcelos},
title = {AGA : Attribute-Guided Augmentation},
booktitle = {CVPR},
year = 2017}
To reproduce our results on SUNRGBD from scratch, we need to following packages
The SelectiveSearch code (downloadable) is included in this repository. The download script is shamelessly taken from Shaoqing Ren's GitHub SPP_net repository. To use SelectiveSearch, start MATLAB and run
cd <PATH_TO_AGA_DIR>/3rdparty/selective_search
fetch_selective_search
img = imread('<PATH_TO_AGA_DIR>/datasets/SUNRGBD/image_00001.jpg');
boxes = selective_search_boxes(img);
save('<PATH_TO_AGA_DIR>/datasets/SUNRGBD/image_00001_ss_boxes.mat', 'boxes');
We provide a couple of files that have already prepared SUNRGBD data and a finetuned (to a selection of 20 object classes) Fast-RCNN model. These files can we downloaded via
For the original SUNRGBD metadata, see also here.
We will assume, from now own, that the images are unpacked into /data/images/
. The
finetuned model has to be put into the <PATH_TO_FAST_RCNN_DIR>/data/fast_rcnn_models
folder.
Say, you have unpacked the original SUNRGBD images to /data/images
, i.e., the
directory should contain image_00001.jpg
to image_10335.jpg
. We also assume
that you have run SelectiveSearch (see paragraph above) on each image
(in MATLAB) and stored the bounding box proposals for each image as image_00001_ss_boxes.mat
in the same folder (i.e., /data/images
).
First, we create a file which contains all image filenames without extension.
cd /data/images
find . -name '*.jpg' -exec basename {} .jpg \; > image_list.txt
Next, we run the Fast-RCNN detector with
cd <PATH_TO_AGA_DIR>
cp misc/object_features.py <PATH_TO_FAST_RCNN_DIR>/tools
cd <PATH_TO_FAST_RCNN_DIR>
python tools/object_features.py
object_features.py
uses the SelectiveSearch bounding boxes and
create (in /data/images
), for each image, a file
<IMAGE_NAME>_bbox_features.mat
which contains Fast-RCNN (default: FC7)
features with detection scores for each object class.
We create, for each image, a Python pickle file <IMAGE_NAME>_fc7.pkl
that
contains Fast-RCNN FC7 activations for all bounding boxes that overlap with
the ground truth by IoU > 0.7 and detection scores for object classes > 0.5. Detections for __background__
and others
are excluded from this process.
Further, these files contain annotations (per remaining bounding box) for
depth and pose.
cd <PATH_TO_AGA_DIR>
python collect_train.py \
--img_meta datasets/SUNRGBD/SUNRGBD_meta.pkl \
--img_list /data/images/image_list.txt \
--img_base /data/images \
--bbox_postfix _ss_boxes.mat \
--data_postfix _bbox_features.mat \
--outfile_postfix _fc7.pkl \
--object_class_file datasets/SUNRGBD/SUNRGBD_objects.pkl
To train the attribute strength predictor (here: for depth), we first need to collect adequate trainign data:
mkdir /data/output
cd <PATH_TO_AGA_DIR>
python collect_rho_data.py \
--img_list /data/images/image_list.txt \
--img_base /data/images \
--beg_index 0 \
--end_index 5335 \
--attribute depth \
--data_postfix _fc7.pkl \
--save /data/output/rho_train_fc7.pkl \
--verbose
This will create a file /data/output/rho_train_fc7.pkl
which contains all relevant
training data, collected from images 0 through 5335 (essentially, half of the data from
SUNRGBD). collect_rho_data.py
also, by default, samples activations from all object
classes in a balanced manner (with sample size equal to the smallest number of activations per object class). We will later use the --no_sampling
option to
create evaluation data for the attribute strength predictor.
Next, we can train the regressor (a simple MLP, see paper) (for 150 epochs):
cd <PATH_TO_AGA_DIR>
python train_rho.py \
--log_file /data/output/rho_model_fc7.log \
--data_file /data/output/rho_train_fc7.pkl \
--save /data/output/rho_model_fc7.pht \
--epochs 150 \
--verbose
By default, we train with an initial learning rate of 0.001. The learning schedule halves the learning rate every 50-th epoch.
Now that we have a trained attribute strength predictor, we can go ahead
and train the synthesis function(s). For that, we again collect adequate
training data first. In detail, we collect activations from all objects with
attribute strengths within certain intervals. In case of depth, for example,
these intervals are [0m,1m], [0.5m,1.5m], ..., [4.5m,5.5m], but other
binning strategies are of course possible. Our predefined interval file is
<PATH_TO_AGA_DIR>/datasets/SUNRGBD/SUNRGBD_depth_intervals.txt
. We also
ensure to only use data from objects that have at least 100 activations
in an interval.
cd <PATH_TO_AGA_DIR>
python collect_phi_data.py \
--interval_file datasets/SUNRGBD/SUNRGBD_depth_intervals.txt \
--img_list /data/images/image_list.txt \
--img_base /data/images \
--beg_index 0 \
--end_index 5335 \
--data_postfix _fc7.pkl \
--attribute depth \
--save /data/output/phi_train_fc7.pkl \
--min_count 100 \
--verbose
This will create a file /data/output/phi_train_fc7.pkl
that will hold
information about all trained models. The models are stored in the
corresponding folder /data/output/phi_train_fc7
.
Before training the synthesis functions, we pretrain using all available data that we previously collected for training the regressor. This step simply trains the encoder-decoder to map from FC7 activations to FC7 activations. We found this to be beneficial, as training the synthesis functions can be tricky if little data is available per interval.
cd <PATH_TO_AGA_DIR>
python pretrain_phi.py \
--data_file /data/output/rho_train_fc7.pkl \
--epochs 150 \
--learning_rate 0.01 \
--save /data/output/phi_model_fc7_pretrained.pht \
--verbose
Then, we can start the actual training:
cd <PATH_TO_AGA_DIR>
python train_phi.py \
--pretrained_rho /data/output/rho_model_fc7.pht \
--pretrained_phi /data/output/phi_model_fc7_pretrained.pht \
--data_file /data/output/phi_train_fc7.pkl \
--save /data/output/phi_model_fc7 \
--learning_rate 0.01 \
--epochs 150 \
--verbose
The trained models will be dumped to /data/output/phi_model_fc7
and a
metadata file /data/output/phi_model_fc7.pkl
will be created.
To demonstrate synthesis, we reproduce the results from Table 3 of the paper. For convenience, we provide a selection of images from SUNRGBD (left-out during training of the regressor + synthesis) that contain objects that we have never seen before. In particular, the object classes are: picture, whiteboard, fridge, counter, books, stove, cabinet, printer, computer, ottoman. This is the T0 set from the paper. The data can be downloaded from
We next synthesize FC7 activations (using our trained synthesis functions for attribute depth). To do so, we collect all data and then synthesize.
cd <PATH_TO_AGA_DIR>
python collect_eval.py \
--img_list /data/T0/image_list.txt \
--img_base /data/T0 \
--data_postfix _bbox_features.mat \
--label_postfix _ss_labels.mat \
--object_class_file datasets/SUNRGBD/SUNRGBD_one_shot_classes_T0.pkl --outfile_postfix _fc7.pkl
cd <PATH_TO_AGA_DIR>
python synthesis.py \
--img_list /data/T0/image_list.txt \
--img_base /data/T0 \
--phi_model /data/output/phi_model_fc7 \
--rho_model /data/output/rho_model_fc7.pht \
--data_postfix _fc7.pkl \
--syn_postfix _fc7_AGA_depth.pkl
--verbose
This will create synthesis files <IMAGE_NAME>_fc7_AGA_depth.pkl
for
ever image in /data/T0
containing the synthesized features.
Finally, we can run the one/few-shot object recognition experiment(s) from Table 3 of the paper.
cd <PATH_TO_AGA_DIR>/experiments
python SUNRGBD_few_shot.py \
--img_list /data/T0/image_list.txt \
--img_base /data/T0 \
--data_postfix _fc7_AGA_depth.pkl \
--runs 100
--shots 1
This will run 100 trials of selecting one-shot instances from each object class
(in T0 we have 10 classes) and training a linear SVM and a 1-NN classifier using
(1) only one-shot instances as well as (2) one-shot instances and synthesized
features. Testing is done using all remaining original activations. The --shots
argument allows you to specify how many original samples you want to use for the
experiments. If, for example, you specify --shots 5
, you can run a 5-shot
object-recognition experiment. In that case, AGA-synthesized results will be based
on the original 5 samples per class + AGA-synthesized features.
Note: If you
specify the --omit_original
flag, the same experiment is performed, but training
of the SVM and 1NN will only use synthetic data (without original samples).
adsf
This should produce (using the settings from above) classification results
(w/o AGA and w AGA) similar to:
[00001] SVM (w/o AGA): 35.76 | [00001] SVM (w AGA): 47.19 | [00001] 1NN (w/o AGA): 32.12 | [00001] 1NN (w AGA): 43.27 |
[00002] SVM (w/o AGA): 35.00 | [00002] SVM (w AGA): 40.06 | [00002] 1NN (w/o AGA): 35.14 | [00002] 1NN (w AGA): 40.48 |
[00003] SVM (w/o AGA): 45.73 | [00003] SVM (w AGA): 42.14 | [00003] 1NN (w/o AGA): 42.80 | [00003] 1NN (w AGA): 41.28 |
...
Average: SVM (w/o AGA): 35.96 | SVM (w AGA): 39.64 | 1NN (w/o AGA): 35.31 | 1NN (w AGA): 37.61 |
Lets run the same experiment without the one-shot instances included in the synthetic data:
[00001] SVM (w/o AGA): 35.76 | [00001] SVM (w AGA): 43.55 | [00001] 1NN (w/o AGA): 32.12 | [00001] 1NN (w AGA): 41.57 |
[00002] SVM (w/o AGA): 41.57 | [00002] SVM (w AGA): 40.25 | [00002] 1NN (w/o AGA): 35.14 | [00002] 1NN (w AGA): 39.16 |
[00003] SVM (w/o AGA): 38.64 | [00003] SVM (w AGA): 41.99 | [00003] 1NN (w/o AGA): 42.80 | [00003] 1NN (w AGA): 40.01 |
...
Average: SVM (w/o AGA): 33.50 | SVM (w AGA): 40.12 | 1NN (w/o AGA): 35.31 | 1NN (w AGA): 36.73 |
-
In the paper, we do not use an adaptive learning rate strategy. Here, for all training, we half the learning rate every 50-th epoch.
-
In addition to the results from the paper, we now also provide results for a 1-NN classifier as this more directly assesses the quality of the synthesis results.