DiffHOI: Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model

Project Page | Paper | Data (Coming Soon)

SynHOI dataset Visiualization

🔥 Key Features

DiffHOI: The first framework leverages the generative and representative capabilities to benefit the HOI task.
SynHOI dataset: A class-balance, large-scale, and high-diversity synthetic HOI dataset.

⚔️ We are dedicated to enhancing and expanding the SynHOI dataset. We will release it soon, together with more powerful models for HICO-DET and V-COCO through SynHOI-Pretraining.

🐟 Installation

Installl the dependencies.

pip install -r requirements.txt

Clone and build CLIP.

git clone https://github.com/openai/CLIP.git && cd CLIP && python setup.py develop && cd ..

Compiling CUDA operators for deformable attention.

cd models/DiffHOI_L/ops
python setup.py build install
cd ../../..

Download the checkpoint of Stable-Diffusion (we use v1-5 by default). Please also follow its instructions to install the required packages.

🦈 Data preparation

HICO-DET

HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz) to the data directory.

Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The annotation files can be downloaded from here. The downloaded annotation files have to be placed as follows.

data
 └─ hico_20160224_det
     |─ annotations
     |   |─ trainval_hico.json
     |   |─ test_hico.json
     |   └─ corre_hico.npy
     :

V-COCO

First clone the repository of V-COCO from here, and then follow the instruction to generate the file instances_vcoco_all_2014.json. Next, download the prior file prior.pickle from here. Place the files and make directories as follows.

DiffHOI
 |─ data
 │   └─ v-coco
 |       |─ data
 |       |   |─ instances_vcoco_all_2014.json
 |       |   :
 |       |─ prior.pickle
 |       |─ images
 |       |   |─ train2014
 |       |   |   |─ COCO_train2014_000000000009.jpg
 |       |   |   :
 |       |   └─ val2014
 |       |       |─ COCO_val2014_000000000042.jpg
 |       |       :
 |       |─ annotations
 :       :

The annotation file have to be converted to the HOIA format. The conversion can be conducted as follows.

PYTHONPATH=data/v-coco \
        python convert_vcoco_annotations.py \
        --load_path data/v-coco/data \
        --prior_path data/v-coco/prior.pickle \
        --save_path data/v-coco/annotations

Note that only Python2 can be used for this conversion because vsrl_utils.py in the v-coco repository shows a error with Python3.

V-COCO annotations with the HOIA format, corre_vcoco.npy, test_vcoco.json, and trainval_vcoco.json will be generated to annotations directory.

🚢 Pre-trained model

Download the pretrained model of DETR detector for ResNet50, and put it to the params directory.

python ./tools/convert_parameters.py \
        --load_path params/detr-r50-e632da11.pth \
        --save_path params/detr-r50-pre-2branch-hico.pth \
        --num_queries 64

python ./tools/convert_parameters.py \
        --load_path params/detr-r50-e632da11.pth \
        --save_path params/detr-r50-pre-2branch-vcoco.pth \
        --dataset vcoco \
        --num_queries 64

Download the pretrained model of Deformable DETR detector for Swin-L, and put it to the params directory.

🚀 Results and Models

😎 DiffHOI on HICO-DET.

	Full (D)	Rare (D)	Non-rare (D)	Full(KO)	Rare (KO)	Non-rare (KO)	Download	Conifg
DiffHOI-S (R50)	34.41	31.07	35.40	37.31	34.56	38.14	model	config
DiffHOI-L (Swin-L)	40.63	38.10	41.38	43.14	40.24	44.01	model	config

⭐ Training

After the preparation, you can start training with the following commands.

HICO-DET

sh ./run/hico_s.sh

V-COCO

sh ./run/vcoco_s.sh

Zero-shot

sh ./run/hico_s_zs_nf_uc.sh

⭐ Testing

HICO-DET

sh ./run/hico_s_eval.sh

sh ./run/hico_l_eval.sh

Citation

Please consider citing our paper if it helps your research.

@article{yang2023boosting,
          title={Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model},
          author={Yang, Jie and Li, Bingliang and Yang, Fengyu and Zeng, Ailing and Zhang, Lei and Zhang, Ruimao},
          journal={arXiv preprint arXiv:2305.12252},
          year={2023}
        }

Acknowledge

This repo is mainly based on GEN-VLKT Licensed under MIT Copyright (c) [2022] [Yue Liao] , DINO under Apache 2.0 Copyright (c) [2022] [IDEA-Research]. We thank their well-organized code!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
SD_Extractor		SD_Extractor
StableDiffusion		StableDiffusion
assets		assets
configs		configs
datasets		datasets
models		models
run		run
tools		tools
util		util
LICENSE		LICENSE
README.md		README.md
engine.py		engine.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DiffHOI: Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model

Project Page | Paper | Data (Coming Soon)

SynHOI dataset Visiualization

🔥 Key Features

DiffHOI: The first framework leverages the generative and representative capabilities to benefit the HOI task.

SynHOI dataset: A class-balance, large-scale, and high-diversity synthetic HOI dataset.

🐟 Installation

🦈 Data preparation

HICO-DET

V-COCO

🚢 Pre-trained model

🚀 Results and Models

😎 DiffHOI on HICO-DET.

⭐ Training

HICO-DET

V-COCO

Zero-shot

⭐ Testing

HICO-DET

Citation

Acknowledge

About

Releases

Packages

Languages

License

IDEA-Research/DiffHOI

Folders and files

Latest commit

History

Repository files navigation

DiffHOI: Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model

Project Page | Paper | Data (Coming Soon)

SynHOI dataset Visiualization

🔥 Key Features

DiffHOI: The first framework leverages the generative and representative capabilities to benefit the HOI task.

SynHOI dataset: A class-balance, large-scale, and high-diversity synthetic HOI dataset.

🐟 Installation

🦈 Data preparation

HICO-DET

V-COCO

🚢 Pre-trained model

🚀 Results and Models

😎 DiffHOI on HICO-DET.

⭐ Training

HICO-DET

V-COCO

Zero-shot

⭐ Testing

HICO-DET

Citation

Acknowledge

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages