Exploring Iterative Refinement with Diffusion Models for Video Grounding

💡 Introduction

Implementation for DiffusionVG: Exploring Iterative Refinement with Diffusion Models for Video Grounding.

In this work, we propose a novel framework with diffusion models that formulates video grounding as a conditioned generation task, enhancing predictions through iterative refinement.

The full paper can be found at: https://arxiv.org/abs/2310.17189

Results on Charades-STA

Feature	R@1 IoU=0.3	R@1 IoU=0.5	R@1 IoU=0.7
I3D	76.53	62.30	40.05
VGG	70.38	57.13	35.06

Results on Activitynet-Captions

Feature	R@1 IoU=0.3	R@1 IoU=0.5	R@1 IoU=0.7
C3D	65.02	47.27	27.87

🔥 Updates

2024/10/23: Implementation is simplified!
2024/03/16: The code associated with this work is fully open-sourced!
2024/03/13: This work is accepted by ICME 2024.
2023/10/26: A demo implementation of DiffusionVG is updated (demo.py).

⚙️ Setup

We recommend using Conda to manage your environment. It is advised to use the latest versions of torch and transformers. To set up your environment, run the following commands:

git https://github.com/MasterVito/DiffusionVG.git && cd DiffusionVG
conda create -n diffusionvg python=3.10.14
conda activate diffusionvg
pip install -r requirements.txt

🚀 Quick Start

In order to help users quickly review the implementation of our proposed DiffusionVG, we have simplified the inference process into a demo script, which can be run as follows:

python demo.py

🍀 Video Feature Download

For Charades-STA, we utilize the I3D features from 2D-TAN.

For Activitynet-Captions, we utilize the C3D features from the official website.

After downloading the features, the default setting is to unzip them into the "features" folder. You can change the feature path to a custom directory using the "vid_feature_path" parameter.

⚡️ Training & Evalution

Start the training for DiffusionVG by running the following command. For details on parameter configurations, please refer to the script and config.py. Evaluations are integrated into the training process.

bash scripts/run_train.sh

Citation

If you find our work helpful to your research, please consider citing our paper using the following format:

@inproceedings{liang2024exploring,
  title={Exploring iterative refinement with diffusion models for video grounding},
  author={Liang, Xiao and Shi, Tao and Liang, Yaoyuan and Tao, Te and Huang, Shao-Luo},
  booktitle={2024 IEEE International Conference on Multimedia and Expo (ICME)},
  pages={1--6},
  year={2024},
  organization={IEEE}
}

Name	Name	Last commit message	Last commit date
Latest commit MasterVito simplify_implementation Oct 24, 2024 c39c74f · Oct 24, 2024 History 9 Commits
annotations	annotations	simplify implementation	Oct 23, 2024
images	images	simplify implementation	Oct 23, 2024
scripts	scripts	simplify implementation	Oct 23, 2024
src	src	simplify implementation	Oct 23, 2024
utils	utils	simplify implementation	Oct 23, 2024
.gitignore	.gitignore	simplify implementation	Oct 23, 2024
README.md	README.md	simplify_implementation	Oct 24, 2024
demo.py	demo.py	simplify implementation	Oct 23, 2024
requirements.txt	requirements.txt	simplify implementation	Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring Iterative Refinement with Diffusion Models for Video Grounding

💡 Introduction

Results on Charades-STA

Results on Activitynet-Captions

🔥 Updates

⚙️ Setup

🚀 Quick Start

🍀 Video Feature Download

⚡️ Training & Evalution

Citation

About

Releases

Packages

Languages

MasterVito/DiffusionVG

Folders and files

Latest commit

History

Repository files navigation

Exploring Iterative Refinement with Diffusion Models for Video Grounding

💡 Introduction

Results on Charades-STA

Results on Activitynet-Captions

🔥 Updates

⚙️ Setup

🚀 Quick Start

🍀 Video Feature Download

⚡️ Training & Evalution

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages