Skip to content

MVIG-SJTU/DIRV

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection

Official code implementation for the paper "DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection" (AAAI 2021) paper.

The code is developed based on the architecture of zylo117/Yet-Another-EfficientDet-Pytorch. We also follow some data pre-processing and model evaluation methods in BigRedT/no_frills_hoi_det and vt-vl-lab/iCAN. We sincerely thank the authors for the excellent work.

Checklist

  • Training and Test for V-COCO dataset
  • Training and Test for HICO-DET dataset
  • Demonstration on images
  • Demonstration on videos
  • More efficient voting strategy for inference using GPU

Prerequisites

The code was tested with python 3.6, pytorch 1.5.1, torchvision 0.6.1, CUDA 10.2, and Ubuntu 18.04.

Installation

  1. Clone this repository:

    git clone https://github.com/MVIG-SJTU/DIRV.git
    
  2. Install pytorch and torchvision:

    pip install torch==1.5.1 torchvision==0.6.1
    
  3. Install other necessary packages:

    pip install pycocotools numpy opencv-python tqdm tensorboard tensorboardX pyyaml webcolors
    

Data Preparation

V-COCO Dataset:

Download V-COCO dataset following the official instructions.

You can find the files new_prior_mask.pkl here. Each element inside it refers to the prior probability that a verb (e.g. eat) is associated with an object category (e.g. apple). You should also download the combined training and valdataion sets annotations instances_trainval2014.json here, and put it in datasets/vcoco/coco/annotations.

HICO-DET Dataset:

Download HICO-DET dataset from the official website.

We transform the annotations of HICO-DET dataset to JSON format following BigRedT/no_frills_hoi_det. You can directly download the processed annotations from here.

We count the training sample number of each category in hico_processed/hico-det_verb_count.json. It serves as a weight when calculating loss.

Dataset Structure:

Make sure to put the files in the following structure:

|-- datasets
|   |-- vcoco
|	|	|-- data
|	|	|	|-- splits
|	|	|	|-- vcoco
|	|	|
|	|	|-- coco
|	| 	|	|-- images
|	|	|	|-- annotations
|	|	|-- new_prior_mask.pkl   
|   |-- hico_20160224_det
|	|	|-- images
|	|	|-- hico_processed

Demonstration

Demonstration on Images

CUDA_VISIBLE_DEVICES=0 python demo.py --image_path /path/to/a/single/image

Demonstration on Videos

Coming soon.

Pre-trained Weights

You can download the pre-trained weights for V-COCO dataset (vcoco_best.pth) and HICO-DET dataset (hico-det_best.pth) here.

Training

Download the pre-trained weight of our backbone (efficientdet-d3_vcoco.pth and efficientdet-d3_hico-det.pth) here, and save it in weights/ directory.

Training on V-COCO Dataset

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py -p vcoco --batch_size 32 --load_weights weights/efficientdet-d3_vcoco.pth

Training on HICO-DET Dataset

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train.py -p hico-det --batch_size 48 --load_weights weights/efficientdet-d3_hico-det.pth

You may also adjust the saving directory and GPU number in projects/vcoco.yaml and projects/hico-det.yaml or create your own projects in projects/.

Test

Test on V-COCO Dataset

CUDA_VISIBLE_DEVICES=0 python test_vcoco.py -w $path to the checkpoint$

Test on HICO-DET Dataset

CUDA_VISIBLE_DEVICES=0 python test_hico-det.py -w $path to the checkpoint$

Then please follow the same procedures in vt-vl-lab/iCAN to evaluate the result on HICO-DET dataset.

Citation

If you found our paper or code useful for your research, please cite the following paper:

@inproceedings{fang2020dirv,
      title={DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection}, 
      author={Fang, Hao-Shu and Xie, Yichen and Shao, Dian and Lu, Cewu},
      year={2021},
      booktitle = {The AAAI Conference on Artificial Intelligence (AAAI)}
}

About

Code for "DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection" (AAAI 2021)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages