This repository is the implementation of "Discriminative Latent Semantic Graph for Video Captioning" ACM MM 2021.
- Create two empty folders,
data
andcaption-eval
- Download visual and text features of MSVD
and MSR-VTT, and put them in
data
folder. (from RMN) - Download evaluation tool, and put them in
caption-eval
folder.
CUDA_VISIBLE_DEVICES=0,1,2,3 python \-m torch.distributed.launch \--nproc_per_node=4 train_debug.py
The figure above shows the training loss of the proposed D-LSG model. After the generator loss drops at the mid-stage of training, the caption loss and all evaluation metrics received better performance. This shows the effectiveness of using semantic information of a given sentence as discriminative information works on the video captioning task. The optimal setting setting is no uploaded yet (forgot to save the optimal setting, and will update with the optimal setting soon.)
Our code is based on https://github.com/tgc1997/RMN. Thanks for their great works!