Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retrieve a video in real time #26

Closed
Lynneyyq opened this issue Sep 6, 2022 · 3 comments
Closed

retrieve a video in real time #26

Lynneyyq opened this issue Sep 6, 2022 · 3 comments

Comments

@Lynneyyq
Copy link

Lynneyyq commented Sep 6, 2022

Hello, can this method retrieve a video in real time?

The paper says that "On YouTube Highlights and TVSum, we obtain clip-
level visual features using an I3D [4] pre-trained on Kinetics 400 [13] ”, which means that if an unknown video is verified, should audio and video features be extracted offline separately?
How to retrieve the highlighted part of a video in real time?

Thanks a lot.

@yeliudev
Copy link
Member

We use pre-trained (and frozen) feature extractors for both visual and audio features. So that these features can be pre-extracted even at the training stage. When testing, you may also use these pre-trained expert models to extract the features first, and fed the features into UMT to detect the highlights.

@Lynneyyq
Copy link
Author

Lynneyyq commented Sep 23, 2022

Thanks for your reply!

By the way, Equation 1 in the paper is similar to the non-local attention. However, Equation1 uses the cross product, why not the dot product? What is the meaning of the cross product in Equation 1?

@yeliudev
Copy link
Member

Eq. 1 is exactly the same as non-local attention. The cross product indicates the matrix multiplication between the q and k matrices, i.e. [N_q * d] x [d * N_k] -> [N_q * N_k]. This operation is identical to the dot product at the feature dimension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants