New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature extraction (i3d and optical flow) #7
Comments
For video feature extraction, do you use denseflow extract the rgb and optical flow firstly? |
@G-Apple1 No we didn't. This codebase can extract visual features directly from raw videos. |
For the feature extraction, i want to know more details. |
Thanks for your suggestions. We may release the demo script in the future. But in this phase, if you would like to run the model on your own datasets, we strongly recommend you to extract the features again on public datasets (e.g. QVHighlights, Charades-STA) using your own feature extractor and re-train our model on it (only a single GPU and a few hours are needed), as we also obtained the features from their authors and don't know the exact feature extraction pipeline as well. |
OK, thank you for your prompt reply. |
In your paper, you said to calculate a feature for 32 consecutive frames. Is it necessary to first extract features (number n) from the whole video (frame n) and then calculate the feature average of the frames in clip (frames 0-100)? Or do you want to calculate a feature for 32 consecutive segments extracted from clip (frames 0-100)? |
@Lvqin001 What do you mean by segments and clip? In our case, each feature vector contains the features of 32 consecutive frames, for example, a 160-frame video can be represented by 5 feature vectors. But each clip in the annotations may not be temporally aligned with each feature vector, that is, a clip is 2s long and may not exactly contain 32 frames, depending on the video's frame rate. So a feature vector can be regarded as belonging to a clip only when they have a temporal overlap more than 50%. |
For example, a clip may contain 3*32 frames, so it contains three feature vectors. Are these three feature vectors averaged? |
You are right. All the feature vectors belonging to the same clip are averaged. |
OK, thank you for your guidance. |
So a feature vector can be regarded as belonging to a clip only when they have a temporal overlap more than 50%. |
You said, "the optical flow features are only used in Charades-STA", I3D extracted the optical flow and RGB features in YouTube Highlights and TVSum, the code used torch.cat(video, optic). What is the purpose of splicing here? What does optical flow do here? |
@Lynneyyq Sorry for the mistake and thanks for pointing it out. We've double-checked the code and data. Both YouTube Highlights and TVSum use optical flow features as well. |
|
Hello, I would like to ask which code base do you use for i3d feature extraction and optical flow feature extraction mentioned in the data set paper? I want to reproduce it and then test my video.
The text was updated successfully, but these errors were encountered: