Some details about the feature extraction. #2

sqiangcao99 · 2021-12-02T07:25:42Z

Hi, thanks for generously sharing your code. When I try to extract optical flow features of Kinetic using BNInception, I encountered some problems.

I don't know the data preprocessing method for BNInception. Could you please provide a more complete code?
I notice there are some configurations to extract optical flow using denseflow. Which configuration of denseflow did you use? Such as: denseflow test.avi -b=20 -a=tvl1 -s=1 -v?

The text was updated successfully, but these errors were encountered:

xumingze0308 · 2021-12-02T17:46:45Z

Hi, we cannot release the code related to data, and we used the same configuration of denseflow as you shared.

sqiangcao99 · 2021-12-03T11:38:17Z

Besides, I notice the input channel of the BNinception is 10. Did you drop the optical flows of the first frame in a video chunk that has the size of 6?

xumingze0308 · 2021-12-03T11:59:42Z

The optical flow is computed between two adjacent frames. Then, a video chunk of 6 frames will output 5 optical flow images, each with x and y channels.

sqiangcao99 · 2021-12-12T14:09:40Z

When I try to reproduce the results on THUMOS. There are 314 iterations for training, 354 iterations for testing, and 696 iterations for batch inference at the default settings. Are these the same as yours?

xumingze0308 · 2021-12-15T16:53:25Z

Yes.

LukasHedegaard · 2021-12-20T10:39:39Z

Hi @xumingze0308 , thanks for sharing the codebase.
A few followup questions regarding the preparation of flow features:

How did you resize the optical flow frames prior to processing with BNInception? Resize the shortest side to 224 and then center-crop 224x224?
BNInception with (10, 224, 224) inputs produces (1024, 7, 7) outputs. Did you use average pool over the 7x7 spatial dimensions?
As input to the BNInception block, did you stack 5 flow_x then 5 flow_y (i.e. [x,x,x,x,x,y,y,y,y,y], or did you interleave them (i.e. [x,y,x,y,x,y,x,y,x,y])

xumingze0308 · 2021-12-20T16:32:48Z

We didn't resize the image, which should be in size of 320x180 in THUMOS, but used 10 crops following https://github.com/yjxiong/action-detection/blob/master/transforms.py.
Yes, we used average pooling.
We used [x,y,x,y,x,y,x,y,x,y].

sqiangcao99 closed this as completed Dec 31, 2021

jbistanbul mentioned this issue Nov 15, 2023

The pre-extracted features and targets of TVSeries jbistanbul/MiniROAD#1

Closed

jbistanbul mentioned this issue Dec 28, 2023

Feature extraction jbistanbul/MiniROAD#2

Closed

Echo0125 mentioned this issue Apr 10, 2024

Request for the extracted features of TV Series and HDD datasets Echo0125/MAT-Memory-and-Anticipation-Transformer#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some details about the feature extraction. #2

Some details about the feature extraction. #2

sqiangcao99 commented Dec 2, 2021 •

edited

xumingze0308 commented Dec 2, 2021

sqiangcao99 commented Dec 3, 2021

xumingze0308 commented Dec 3, 2021

sqiangcao99 commented Dec 12, 2021

xumingze0308 commented Dec 15, 2021

LukasHedegaard commented Dec 20, 2021 •

edited

xumingze0308 commented Dec 20, 2021 •

edited

Some details about the feature extraction. #2

Some details about the feature extraction. #2

Comments

sqiangcao99 commented Dec 2, 2021 • edited

xumingze0308 commented Dec 2, 2021

sqiangcao99 commented Dec 3, 2021

xumingze0308 commented Dec 3, 2021

sqiangcao99 commented Dec 12, 2021

xumingze0308 commented Dec 15, 2021

LukasHedegaard commented Dec 20, 2021 • edited

xumingze0308 commented Dec 20, 2021 • edited

sqiangcao99 commented Dec 2, 2021 •

edited

LukasHedegaard commented Dec 20, 2021 •

edited

xumingze0308 commented Dec 20, 2021 •

edited