This is an implementation repository for our work in SIGIR 2022. You Need to Read Again: Multi-granularity Perception Network for Moment Retrieval in Videos. paper
Clone the repository and move to folder:
git clone https://github.com/Huntersxsx/MGPN.git
cd MGPN
To use this source code, you need Python3.7+ and a few python3 packages:
- pytorch 1.1.0
- torchvision 0.3.0
- torchtext
- easydict
- terminaltables
- tqdm
We use the data offered by 2D-TAN, and the extracted features can be found at Box.
The folder structure should be as follows:
.
├── checkpoints
│ ├── best
│ │ ├── TACoS
│ │ ├── ActivityNet
│ │ └── Charades
├── data
│ ├── TACoS
│ │ ├── tall_c3d_features.hdf5
│ │ └── ...
│ ├── ActivityNet
│ │ ├── sub_activitynet_v1-3.c3d.hdf5
│ │ └── ...
│ ├── Charades-STA
│ │ ├── charades_vgg_rgb.hdf5
│ │ └── ...
│
├── experiments
│
├── lib
│ ├── core
│ ├── datasets
│ └── models
│
└── moment_localization
Please download the visual features from Box and save it to the data/
folder.
Use the following commands for training:
- For TACoS dataset, run:
sh run_tacos.sh
- For ActivityNet-Captions dataset, run:
sh run_activitynet.sh
- For Charades-STA dataset, run:
sh run_charades.sh
TACoS | Rank1@0.3 | Rank1@0.5 | Rank5@0.3 | Rank5@0.5 |
---|---|---|---|---|
MGPN256 | 48.81 | 36.74 | 71.46 | 59.24 |
ActivityNet | Rank1@0.5 | Rank1@0.7 | Rank5@0.6 | Rank5@0.7 |
---|---|---|---|---|
MGPN256 | 47.92 | 30.47 | 78.15 | 63.56 |
Charades (I3D) | Rank1@0.5 | Rank1@0.7 | Rank5@0.5 | Rank5@0.7 |
---|---|---|---|---|
MGPN256 | 60.82 | 41.16 | 89.77 | 64.73 |
We greatly appreciate the 2D-Tan repository. Please remember to cite the papers:
@article{sun2022you,
title={You Need to Read Again: Multi-granularity Perception Network for Moment Retrieval in Videos},
author={Sun, Xin and Wang, Xuan and Gao, Jialin and Liu, Qiong and Zhou, Xi},
journal={arXiv preprint arXiv:2205.12886},
year={2022}
}
@inproceedings{gao2021relation,
title={Relation-aware Video Reading Comprehension for Temporal Language Grounding},
author={Gao, Jialin and Sun, Xin and Xu, Mengmeng and Zhou, Xi and Ghanem, Bernard},
booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
pages={3978--3988},
year={2021}
}
@InProceedings{2DTAN_2020_AAAI,
author = {Zhang, Songyang and Peng, Houwen and Fu, Jianlong and Luo, Jiebo},
title = {Learning 2D Temporal Adjacent Networks forMoment Localization with Natural Language},
booktitle = {AAAI},
year = {2020}
}