You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In your paper, it says "We first uniformly divide each input video into 64-frame segments. We then use a two-stream Inflated 3D ConvNet (I3D) model pre-trained on Kinetics [5] to extract the segment features."
However, in your code
I guess minusing 64 means you do not use the last few frames not divisible by 64, but why should interval=8?
Is it means that you divide each input video into 8-frame?
By the way? Could you offer the I3D feature on ActivityNet? It's so time-comsuming to extrat.
The text was updated successfully, but these errors were encountered:
In your paper, it says "We first uniformly divide each input video into 64-frame segments. We then use a two-stream Inflated 3D ConvNet (I3D) model pre-trained on Kinetics [5] to extract the segment features."
However, in your code
I guess minusing 64 means you do not use the last few frames not divisible by 64, but why should interval=8?
Is it means that you divide each input video into 8-frame?
By the way? Could you offer the I3D feature on ActivityNet? It's so time-comsuming to extrat.
The text was updated successfully, but these errors were encountered: