Long video test results did not meet expectations #38

ffiioonnaa · 2024-06-10T14:55:56Z

Hi, thanks for your work!
When I used the demo script to test the highlight detection task and temporal grounding task on my customed video, I found that the output timestamp is different every time I run it, and when I input a 30min long video, the output timestamp is often small,such as 1x second or 1xx second

rahulkrprajapati · 2024-06-11T12:07:12Z

Hey @ffiioonnaa , I also ran in the same issue. It might be due to the sampling capped at 96 frames. Changing the sampling rate would affect accuracy. I actually split the video into 2 - 5 min chunks and then ran the same prompt on each video and adjusted for the time difference for each video by adding the number of seconds that had passed in the previous chunks.

The accuracy and the timestamps were still not too good for me but it does seem to perform better this way for longer videos.

RenShuhuai-Andy · 2024-06-11T13:08:00Z

hi, thanks for your interests.

As shown in table.1 in our paper, the average video duration of training data is 190 seconds. Therefore, the model performs better on videos around 190 seconds long. When the video duration is too long (such as half an hour), the model's performance may deteriorate.

RenShuhuai-Andy closed this as completed Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long video test results did not meet expectations #38

Long video test results did not meet expectations #38

ffiioonnaa commented Jun 10, 2024

rahulkrprajapati commented Jun 11, 2024

RenShuhuai-Andy commented Jun 11, 2024

Long video test results did not meet expectations #38

Long video test results did not meet expectations #38

Comments

ffiioonnaa commented Jun 10, 2024

rahulkrprajapati commented Jun 11, 2024

RenShuhuai-Andy commented Jun 11, 2024