You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi author, I would like to ask how you got the transcribed text corresponding to the videos in the SumMe and TVSum datasets? Was it created manually or did you use an existing model?I am very much looking forward to your answer.
The text was updated successfully, but these errors were encountered:
Hi, you can refer to the implementation details section in the main paper. For the SumMe and TVSum dataset, we adopt the pre-trained image caption model GPT-2 to generate the caption for each frame.
Hi, you can refer to the implementation details section in the main paper. For the SumMe and TVSum dataset, we adopt the pre-trained image caption model GPT-2 to generate the caption for each frame.
Thank you very much for your reply! I would also like to ask, how did you obtain the original video data in the data set? I looked at the dataset file you provided and found that it only has relevant feature values for each video. Could you please provide the link or file where I can obtain the original video data that can be played?
Thanks for your help and looking forward to your reply.(☆▽☆)
Hi author, I would like to ask how you got the transcribed text corresponding to the videos in the SumMe and TVSum datasets? Was it created manually or did you use an existing model?I am very much looking forward to your answer.
The text was updated successfully, but these errors were encountered: