retrieve a video in real time #26

Lynneyyq · 2022-09-06T01:53:40Z

Hello, can this method retrieve a video in real time?

The paper says that "On YouTube Highlights and TVSum, we obtain clip-
level visual features using an I3D [4] pre-trained on Kinetics 400 [13] ”, which means that if an unknown video is verified, should audio and video features be extracted offline separately?
How to retrieve the highlighted part of a video in real time?

Thanks a lot.

yeliudev · 2022-09-21T21:48:35Z

We use pre-trained (and frozen) feature extractors for both visual and audio features. So that these features can be pre-extracted even at the training stage. When testing, you may also use these pre-trained expert models to extract the features first, and fed the features into UMT to detect the highlights.

Lynneyyq · 2022-09-23T02:36:06Z

Thanks for your reply!

By the way, Equation 1 in the paper is similar to the non-local attention. However, Equation1 uses the cross product, why not the dot product? What is the meaning of the cross product in Equation 1?

yeliudev · 2022-09-23T07:08:13Z

Eq. 1 is exactly the same as non-local attention. The cross product indicates the matrix multiplication between the q and k matrices, i.e. [N_q * d] x [d * N_k] -> [N_q * N_k]. This operation is identical to the dot product at the feature dimension.

yeliudev closed this as completed Sep 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

retrieve a video in real time #26

retrieve a video in real time #26

Lynneyyq commented Sep 6, 2022

yeliudev commented Sep 21, 2022

Lynneyyq commented Sep 23, 2022 •

edited

yeliudev commented Sep 23, 2022

retrieve a video in real time #26

retrieve a video in real time #26

Comments

Lynneyyq commented Sep 6, 2022

yeliudev commented Sep 21, 2022

Lynneyyq commented Sep 23, 2022 • edited

yeliudev commented Sep 23, 2022

Lynneyyq commented Sep 23, 2022 •

edited