question about the I3D feature #33

JJBOY · 2020-06-19T13:35:05Z

In your paper, it says "We first uniformly divide each input video into 64-frame segments. We then use a two-stream Inflated 3D ConvNet (I3D) model pre-trained on Kinetics [5] to extract the segment features."
However, in your code

interval = 8
clip_length = 64
start_unit = int(min(ft_num - 1, np.floor(float(start_ind + off) / interval)))
end_unit = int(min(ft_num - 2, np.ceil(float(end_ind - clip_length) / interval)))

I guess minusing 64 means you do not use the last few frames not divisible by 64, but why should interval=8?
Is it means that you divide each input video into 8-frame?

By the way? Could you offer the I3D feature on ActivityNet? It's so time-comsuming to extrat.

The text was updated successfully, but these errors were encountered:

Alvin-Zeng · 2020-07-04T08:24:39Z

"do not use the last few frames not divisible by 64" --- correct
We extract features in using sliding windows and the stride is set to 8.
We have uploaded the I3D Anet features, please find them in README.md

JJBOY · 2020-07-04T08:28:25Z

Thanks very much for your reply and your feature！

JJBOY closed this as completed Jul 4, 2020

arc144 mentioned this issue Oct 2, 2020

Normalization and image size for I3D feature extraction #39

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about the I3D feature #33

question about the I3D feature #33

JJBOY commented Jun 19, 2020 •

edited

Loading

Alvin-Zeng commented Jul 4, 2020

JJBOY commented Jul 4, 2020

question about the I3D feature #33

question about the I3D feature #33

Comments

JJBOY commented Jun 19, 2020 • edited Loading

Alvin-Zeng commented Jul 4, 2020

JJBOY commented Jul 4, 2020

JJBOY commented Jun 19, 2020 •

edited

Loading