Our paper Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models has been accepted by NIPS 2023.
Check INSTALL.md for installation instructions.
Check DATASET.md for instructions of dataset preprocessing.
bash scripts/extract_clip_obj_feature.sh
bash scripts/draw_imgs_and_generate_spatial_logits.sh
bash scripts/infer.sh
We provide the extracted clip visual feature, visual cue descriptions, and some spatial information, you can download from here*.
If you find this project helps your research, please kindly consider citing our paper in your publications.
@article{li2023zero,
title={Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models},
author={Li, Lin and Xiao, Jun and Chen, Guikun and Shao, Jian and Zhuang, Yueting and Chen, Long},
journal={arXiv preprint arXiv:2305.12476},
year={2023}
}
Our codebase is based on Scene-Graph-Benchmark.pytorch.