Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some details about the feature extraction. #2

Closed
sqiangcao99 opened this issue Dec 2, 2021 · 7 comments
Closed

Some details about the feature extraction. #2

sqiangcao99 opened this issue Dec 2, 2021 · 7 comments

Comments

@sqiangcao99
Copy link

sqiangcao99 commented Dec 2, 2021

Hi, thanks for generously sharing your code. When I try to extract optical flow features of Kinetic using BNInception, I encountered some problems.

  • I don't know the data preprocessing method for BNInception. Could you please provide a more complete code?
  • I notice there are some configurations to extract optical flow using denseflow. Which configuration of denseflow did you use? Such as: denseflow test.avi -b=20 -a=tvl1 -s=1 -v?
@xumingze0308
Copy link

Hi, we cannot release the code related to data, and we used the same configuration of denseflow as you shared.

@sqiangcao99
Copy link
Author

Besides, I notice the input channel of the BNinception is 10. Did you drop the optical flows of the first frame in a video chunk that has the size of 6?

@xumingze0308
Copy link

The optical flow is computed between two adjacent frames. Then, a video chunk of 6 frames will output 5 optical flow images, each with x and y channels.

@sqiangcao99
Copy link
Author

When I try to reproduce the results on THUMOS. There are 314 iterations for training, 354 iterations for testing, and 696 iterations for batch inference at the default settings. Are these the same as yours?

@xumingze0308
Copy link

Yes.

@LukasHedegaard
Copy link

LukasHedegaard commented Dec 20, 2021

Hi @xumingze0308 , thanks for sharing the codebase.
A few followup questions regarding the preparation of flow features:

  1. How did you resize the optical flow frames prior to processing with BNInception? Resize the shortest side to 224 and then center-crop 224x224?

  2. BNInception with (10, 224, 224) inputs produces (1024, 7, 7) outputs. Did you use average pool over the 7x7 spatial dimensions?

  3. As input to the BNInception block, did you stack 5 flow_x then 5 flow_y (i.e. [x,x,x,x,x,y,y,y,y,y], or did you interleave them (i.e. [x,y,x,y,x,y,x,y,x,y])

@xumingze0308
Copy link

xumingze0308 commented Dec 20, 2021

  1. We didn't resize the image, which should be in size of 320x180 in THUMOS, but used 10 crops following https://github.com/yjxiong/action-detection/blob/master/transforms.py.
  2. Yes, we used average pooling.
  3. We used [x,y,x,y,x,y,x,y,x,y].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants