This project is a pytorch implementation of the baseline RFCN in the Detect to Track paper. This repository is influenced by the following implementations:
jwyang/faster-rcnn.pytorch, based on Pytorch
rbgirshick/py-faster-rcnn, based on Pycaffe + Numpy
longcw/faster_rcnn_pytorch, based on Pytorch + Numpy
endernewton/tf-faster-rcnn, based on TensorFlow + Numpy
ruotianluo/pytorch-faster-rcnn, Pytorch + TensorFlow + Numpy
Our implementation stems heavily from the work jwyang/faster-rcnn.pytorch. As in that implementation, this repository has the following qualities:
It is pure Pytorch code. We convert all the numpy implementations to pytorch!
It supports multi-image batch training. We revise all the layers, including dataloader, rpn, roi-pooling, etc., to support multiple images in each minibatch.
It supports multiple GPUs training. We use a multiple GPU wrapper (nn.DataParallel here) to make it flexible to use one or more GPUs, as a merit of the above two features.
It is memory efficient. We limit the aspect ratio of the images in each roidb and group images with similar aspect ratios into a minibatch. As such, we can train resnet101 with batchsize = 2 (4 images) on a 2 Titan X (12 GB).
Supports 4 pooling methods. roi pooling, roi alignment, roi cropping, and position-sensitive roi pooling. More importantly, we modify all of them to support multi-image batch training.
- Python 2.7
- Pytorch 0.3.0 (0.4.0 may work, but hasn't been tested)
- CUDA 8.0 or higher
The RFCN network weights are initialized using the ImageNet resnet-101 weights.
The pretrained resnet-101 model can be accessed from
Below are instructions for training an RFCN network on Imagenet VID+DET.
cd pytorch-detect-rfcn mkdir data
Untar the file:
tar xf ILSVRC2015.tar.gz
We'll refer to this directory as
Make sure the directory structure looks something like:
|--ILSVRC2015 |----Annotations |------DET |--------train |--------val |------VID |--------train |--------val |----Data |------DET |--------train |--------val |------VID |--------train |--------val |----ImageSets |------DET |------VID
Create a soft link under
ln -s $DATAPATH/ILSVRC2015 ./ILSVRC
Create a directory called
and place the pretrained models into this directory.
Before training, set the correct directory to save and load the trained models.
The default is
Change the arguments "save_dir" and "load_dir" in trainval_net.py and
test_net.py to adapt to your environment.
To train an RFCN D&T model with resnet-101 on Imagenet VID, simply run:
CUDA_VISIBLE_DEVICES=$GPU_ID python trainval_net.py \ --cuda \ --dataset imagenet_vid \ --cag \ --lr $LEARNING_RATE \ --bs $BATCH_SIZE \
where 'bs' is the batch size with default 1. Above, BATCH_SIZE and WORKER_NUMBER can be set adaptively according to your GPU memory size. On 2 Titan Xps with 12G memory, it can be up to 2 (4 images, 2 per GPU).
Imagenet VID+DET (Train/Test: imagenet_vid_train+imagenet_det_train/imagenet_vid_val, scale=600, PS ROI Pooling).
As pointed out by ruotianluo/pytorch-faster-rcnn, choose the right
-arch to compile the cuda code:
|GTX 1080 (Ti)||sm_61|
|Grid K520 (AWS g2.2xlarge)||sm_30|
|Tesla K80 (AWS p2.xlarge)||sm_37|
Install all the python dependencies using pip:
pip install -r requirements.txt
Compile the cuda dependencies using following simple commands:
cd lib sh make.sh
It will compile all the modules you need, including NMS, PSROI_POOLING, ROI_Pooing, ROI_Align and ROI_Crop. The default version is compiled with Python 2.7, please compile by yourself if you are using a different python version.
As pointed out in this issue, if you encounter some error during the compilation, you might miss to export the CUDA paths to your environment.