Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions Regarding Baidu Embeddings #51

Closed
yur1xpp opened this issue Jul 25, 2022 · 4 comments
Closed

Questions Regarding Baidu Embeddings #51

yur1xpp opened this issue Jul 25, 2022 · 4 comments

Comments

@yur1xpp
Copy link

yur1xpp commented Jul 25, 2022

Hi Silvio, I have some questions about the baidu emdeddings, I wonder if you have any information about them. I couldn't find any information regarding these questions from either of their github repo or the published paper:

  1. Is the baidu embeddings already gone through PCA, or was it still a "raw" features? I noticed it's in Tx8576 dimension, which could probably mean they are still "raw", am I right about this?
  2. In your opinion, if I were to reduce the dimension, would it be better to have PCA reduce them, or have them go through a FCL like in the implementation of TemporallyAwarePooling?
    self.feature_extractor = nn.Linear(self.input_size, 512)
  3. If they are still raw, do you have any idea what was the initial dimension before they are flatten to 8576? They used 398x224 video as mentioned in the paper, but it's not possible to reshape them to it. I was thinking maybe I could used them in video transformer based architecture (MViT etc) if I'm able to reshape them to the original dimension.
  4. Do we have any of their fine-tuned feature extraction code publicly available? I think not, but I'm just going to ask anyway in case you know any since their embeddings have very few public information available.

Thanks!

@SilvioGiancola
Copy link
Owner

SilvioGiancola commented Jul 26, 2022

Hi @yur1xpp

  1. AFAIK, those are "raw" features, not PCA'ed
  2. If you have memory to spare to train your architecture, I would go for a FCL as it drops the orthogonality constraints of the PCA. In contrast, the PCA is fully unsupervised, so you can train it offline before using the PCA'ed feature in the remaining of your network.
  3. The dimension 8576 come from the concatenation of 5 or 6 different features, from different encoders. You will need to analyzed that structure before getting back to a 2D map. Also, I believe not all features are extracted right after the flattening layer, but might have extra FC layers.
  4. Not that I am aware of. You might want to raise that issue on the github page from Baidu and put pressure on whomever extract custom features to release code publicly, both for feature extraction on new videos or to fine-tune the video encoder on soccer videos.

I hope that helps!

@yur1xpp
Copy link
Author

yur1xpp commented Aug 2, 2022

Thanks very much for those useful details & suggestions Silvio!

1-2. It looks like FCL does indeed required a lot of memory, best strategy for me is probably PCA it seems.
3-4. That's a good idea! I should open an issue on their repo and see how it goes. I was trying to extract my own feature using their technique but the details on their paper is quite vague.

Thanks again for the help!

@yur1xpp yur1xpp closed this as completed Aug 2, 2022
@yur1xpp
Copy link
Author

yur1xpp commented Aug 2, 2022

I found a closed issue on their repo, it might have answered some of the questions, but looks like they're not planning on releasing the code publicly (at least from a comment dated a year ago), unfortunately.
baidu-research/vidpress-sports#4 (comment)

@SilvioGiancola
Copy link
Owner

That's a pity, but I am sure fine-tuning even simple encoders like ResNET or an MViT using a similar trick would lead to very similar boost in performances. If you have reproducible code that train a better encoder, I would be happy to advertise it on our soccer-net github repos and websites, as it could very useful for future development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants