Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos
This is the newly proposed splits for the paper "Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos" (ACM MM 2022).
We define a new task, Temporal Emotion Localization in videos (TEL), to provide a new benchmark for the research on emotion understanding and video-and-language reasoning. The proposed splits are based on the MovieGraphs dataset, which contains detailed graph-based annotations of social situations for 7637 clips in 51 movies. We extract 239 available emotion labels from all clips and group them into 18 discrete emotion classes. As only a few emotion labels in the MovieGraphs dataset have temporal annotations, we develop an annotation tool and ask human annotators to provide the temporal boundaries of emotions in the testing split.
If you feel this project helpful to your research, please cite our work.
@inproceedings{li2022dilated,
title={Dilated context integrated network with cross-modal consensus for temporal emotion localization in videos},
author={Li, Juncheng and Xie, Junlin and Zhu, Linchao and Qian, Long and Tang, Siliang and Zhang, Wenqiao and Shi, Haochen and Zhang, Shengyu and Wei, Longhui and Tian, Qi and Zhuang, Yueting},
booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
year={2022}
}