Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?


Failed to load latest commit information.
Latest commit message
Commit time

DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection

Official code implementation for the paper "DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection" (AAAI 2021) paper.

The code is developed based on the architecture of zylo117/Yet-Another-EfficientDet-Pytorch. We also follow some data pre-processing and model evaluation methods in BigRedT/no_frills_hoi_det and vt-vl-lab/iCAN. We sincerely thank the authors for the excellent work.


  • Training and Test for V-COCO dataset
  • Training and Test for HICO-DET dataset
  • Demonstration on images
  • Demonstration on videos
  • More efficient voting strategy for inference using GPU


The code was tested with python 3.6, pytorch 1.5.1, torchvision 0.6.1, CUDA 10.2, and Ubuntu 18.04.


  1. Clone this repository:

    git clone
  2. Install pytorch and torchvision:

    pip install torch==1.5.1 torchvision==0.6.1
  3. Install other necessary packages:

    pip install pycocotools numpy opencv-python tqdm tensorboard tensorboardX pyyaml webcolors

Data Preparation

V-COCO Dataset:

Download V-COCO dataset following the official instructions.

You can find the files new_prior_mask.pkl here. Each element inside it refers to the prior probability that a verb (e.g. eat) is associated with an object category (e.g. apple). You should also download the combined training and valdataion sets annotations instances_trainval2014.json here, and put it in datasets/vcoco/coco/annotations.

HICO-DET Dataset:

Download HICO-DET dataset from the official website.

We transform the annotations of HICO-DET dataset to JSON format following BigRedT/no_frills_hoi_det. You can directly download the processed annotations from here.

We count the training sample number of each category in hico_processed/hico-det_verb_count.json. It serves as a weight when calculating loss.

Dataset Structure:

Make sure to put the files in the following structure:

|-- datasets
|   |-- vcoco
|	|	|-- data
|	|	|	|-- splits
|	|	|	|-- vcoco
|	|	|
|	|	|-- coco
|	| 	|	|-- images
|	|	|	|-- annotations
|	|	|-- new_prior_mask.pkl   
|   |-- hico_20160224_det
|	|	|-- images
|	|	|-- hico_processed


Demonstration on Images

CUDA_VISIBLE_DEVICES=0 python --image_path /path/to/a/single/image

Demonstration on Videos

Coming soon.

Pre-trained Weights

You can download the pre-trained weights for V-COCO dataset (vcoco_best.pth) and HICO-DET dataset (hico-det_best.pth) here.


Download the pre-trained weight of our backbone (efficientdet-d3_vcoco.pth and efficientdet-d3_hico-det.pth) here, and save it in weights/ directory.

Training on V-COCO Dataset

CUDA_VISIBLE_DEVICES=0,1,2,3 python -p vcoco --batch_size 32 --load_weights weights/efficientdet-d3_vcoco.pth

Training on HICO-DET Dataset

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -p hico-det --batch_size 48 --load_weights weights/efficientdet-d3_hico-det.pth

You may also adjust the saving directory and GPU number in projects/vcoco.yaml and projects/hico-det.yaml or create your own projects in projects/.


Test on V-COCO Dataset

CUDA_VISIBLE_DEVICES=0 python -w $path to the checkpoint$

Test on HICO-DET Dataset

CUDA_VISIBLE_DEVICES=0 python -w $path to the checkpoint$

Then please follow the same procedures in vt-vl-lab/iCAN to evaluate the result on HICO-DET dataset.


If you found our paper or code useful for your research, please cite the following paper:

      title={DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection}, 
      author={Fang, Hao-Shu and Xie, Yichen and Shao, Dian and Lu, Cewu},
      booktitle = {The AAAI Conference on Artificial Intelligence (AAAI)}


Code for "DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection" (AAAI 2021)






No releases published


No packages published