Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What was the best score of the size nb_frames and overlap? #14

Closed
circlebig opened this issue Jun 27, 2024 · 8 comments
Closed

What was the best score of the size nb_frames and overlap? #14

circlebig opened this issue Jun 27, 2024 · 8 comments

Comments

@circlebig
Copy link

Hi, may you tell me the number of overlap and nb_frames?

@Ahmed-Telili
Copy link

Hello, the number of frame was set to 30 and the overlap to 0.2

@circlebig
Copy link
Author

Thank you very much. May I ask some questions?

First, did you extract and train all the features of Konvid and then test on test patch features?

Second, if I set nb_frames 30 and overlap 0.2, the shape of feature is (2880, 2560) (backbone: mnasnet)
2880 might be len of video(240)* patches.
Did your computer work well in training that features?
My computer spec is i9-12900k, rtx3090, ram 78GiB. It seems 2880 is too big to train.

@Ahmed-Telili
Copy link

Ahmed-Telili commented Jul 9, 2024

Hi @circlebig, we used a pretraining backbone to extract feature from dataset, we used 25 as number of patches. So the shape of the features will be (30, 25, 2560), to avoid resource limitation, you can process the dataset image by image for features extraction.

@circlebig
Copy link
Author

Thank you. Was your number of features same as number of videos?

@Ahmed-Telili
Copy link

Welcome ;) The number of features depends solely on the backbone. For MnasNet, each video will have features of shape (30, 25, 2560)

@circlebig
Copy link
Author

Oh, I mean the number of features. For example, there are 1200 videos in Konvid data. So my number of features is 1200. Is is same as yours? 1200

@Ahmed-Telili
Copy link

It does not depend on the number of videos. For each video, you will get a tensor of shape (30, 25, 2560) representing its features. If you will compute the features of the Konvid dataset (1200 videos), you will get 1200 tensors.

@circlebig
Copy link
Author

1200 tensors right. Thank you .

@atelili atelili closed this as completed Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants