Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

R-C3D: Region Convolutional 3D Network for Temporal Activity Detection

By Huijuan Xu, Abir Das and Kate Saenko (Boston University).


We propose a fast end-to-end Region Convolutional 3D Network (R-C3D) for activity detection in continuous video streams. The network encodes the frames with fully-convolutional 3D filters, proposes activity segments, then classifies and refines them based on pooled features within their boundaries.


R-C3D is released under the MIT License (refer to the LICENSE file for details).

Citing R-C3D

If you find R-C3D useful in your research, please consider citing:

    title = {R-C3D: Region Convolutional 3D Network for Temporal Activity Detection},
    author = {Huijuan Xu and Abir Das and Kate Saenko},
    booktitle = {Proceedings of the International Conference on Computer Vision (ICCV)},
    year = {2017}

We build this repo based on Faster R-CNN, C3D and ActivityNet dataset. Please cite the following papers as well:

Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In Advances in neural information processing systems, pp. 91-99. 2015.

Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3d convolutional networks." In Proceedings of the IEEE international conference on computer vision, pp. 4489-4497. 2015.

Caba Heilbron, Fabian, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. "Activitynet: A large-scale video benchmark for human activity understanding." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961-970. 2015.


  1. Installation
  2. Preparation
  3. Training
  4. Testing


  1. Clone the R-C3D repository.

    git clone --recursive
  2. Build Caffe3d with pycaffe (see: Caffe installation instructions).

    Note: Caffe must be built with Python support!

    cd ./caffe3d
    # If have all of the requirements installed and your Makefile.config in place, then simply do:
    make -j8 && make pycaffe
  3. Build R-C3D lib folder.

    cd ./lib    


  1. Download the ground truth annatations and videos in ActivityNet dataset.

    cd ./preprocess/activityNet/
    # Download the groud truth annotations in ActivityNet dataset.
    # Download the videos in ActivityNet dataset into ./preprocess/activityNet/videos.
  2. Extract frames from downloaded videos in 25 fps.

    # training video frames are saved in ./preprocess/activityNet/frames/training/
    # validation video frames are saved in ./preprocess/activityNet/frames/validation/ 
  3. Generate the pickle data for training and testing R-C3D model.

    # generate training data
    # generate validation data


  1. Download C3D classification pretrain model to ./pretrain/ .

    The C3D model weight pretrained on Sports1M and finetuned on ActivityNet dataset is provided in: caffemodel .

  2. In R-C3D root folder, run:



  1. Download one sample R-C3D model to ./snapshot/ .

    One R-C3D model on ActivityNet dataset is provided in: caffemodel .

    The provided R-C3D model has the Average-mAP 14.4% on the validation set.

  2. In R-C3D root folder, generate the prediction log file on the validation set.

  3. Generate the results.json file from the prediction log file.

    cd ./experiments/activitynet/test
    python test_log_<iters>.txt.*
  4. Get the detection evaluation result.

    cd ./experiments/activitynet/test/Evaluation
    python data/activity_net.v1-3.min.json ../results.json


The codes for THUMOS'14 dataset and Charades dataset are uploaded to the corresponding folders.