Doubts about data sources for text modal #18

XYI-xue · 2024-04-17T02:47:14Z

Hi author, I would like to ask how you got the transcribed text corresponding to the videos in the SumMe and TVSum datasets? Was it created manually or did you use an existing model?I am very much looking forward to your answer.

boheumd · 2024-04-20T19:16:13Z

Hi, you can refer to the implementation details section in the main paper. For the SumMe and TVSum dataset, we adopt the pre-trained image caption model GPT-2 to generate the caption for each frame.

XYI-xue · 2024-04-24T04:59:32Z

Hi, you can refer to the implementation details section in the main paper. For the SumMe and TVSum dataset, we adopt the pre-trained image caption model GPT-2 to generate the caption for each frame.

Thank you very much for your reply！ I would also like to ask, how did you obtain the original video data in the data set? I looked at the dataset file you provided and found that it only has relevant feature values for each video. Could you please provide the link or file where I can obtain the original video data that can be played?
Thanks for your help and looking forward to your reply.(☆▽☆)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doubts about data sources for text modal #18

Doubts about data sources for text modal #18

XYI-xue commented Apr 17, 2024

boheumd commented Apr 20, 2024

XYI-xue commented Apr 24, 2024

Doubts about data sources for text modal #18

Doubts about data sources for text modal #18

Comments

XYI-xue commented Apr 17, 2024

boheumd commented Apr 20, 2024

XYI-xue commented Apr 24, 2024