regarding hierarchical representation pattern #2

seyeeet · 2021-11-05T23:57:55Z

In the paper it mentioned that

To demonstrate its effectiveness, we conduct a non-hierarchical version by setting the size of the local window in all attention layers to 512, which is the largest window size in the last encoder/decoder block (out of memory if we directly set the size of the local window in all attention layers to the video length).
I was not able to figure out where in the code the attention layer has windows of 512. can you please point me to the right direction?

The text was updated successfully, but these errors were encountered:

ChinaYi · 2021-11-06T11:56:11Z

In libs/models/tcn.py(L472-L473),

self.layers = nn.ModuleList(
            [AttModule(2 ** i, num_f_maps, num_f_maps, r1, r2, att_type, 'encoder', alpha) for i in # 2**i
             range(num_layers)])

where num_layers=10, which means that the last attention module has the windows of 512. If you want to reproduce the ablation study, just replace the 2**i with 512, so that each layer will have windows of 512.

seyeeet · 2021-11-06T13:35:10Z

@ChinaYi thank you for your answer,

should windows of 512 works better or worse than 2**i in your opinion?
Also one more thing, should I use sliding attention option to achieve the best results in the paper?
finally, can you please let me know the parameters that is used to achieve the best performance for encoder and decoder. i mean te parameters that can lead to the best results in the paper. I notice the performance drops slightly when I go with the current default settings

ChinaYi · 2021-11-07T04:25:48Z

windows of 512 is worse than 2**i in our experiments.
Yes. sliding window approach is slightly better than block-wise approach.
The current setting is what I used. The reason that the performance is possibly due to the unstable training process in ASRF due to the boundary prediction. I strongly recommend you to pick the best model according to the validation set instead of directly use the model from 80 epoch. By the way, if you want to get out of the trivial param search, I recommend you to use the pure ASformer in https://github.com/ChinaYi/ASFormer , where the training process of our pure ASFormer is very stable and not sensitive to the training epochs.

seyeeet · 2021-11-07T15:04:48Z

thanks for the hints! appreciate it!

ChinaYi closed this as completed Nov 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regarding hierarchical representation pattern #2

regarding hierarchical representation pattern #2

seyeeet commented Nov 5, 2021 •

edited

ChinaYi commented Nov 6, 2021

seyeeet commented Nov 6, 2021 •

edited

ChinaYi commented Nov 7, 2021

seyeeet commented Nov 7, 2021

regarding hierarchical representation pattern #2

regarding hierarchical representation pattern #2

Comments

seyeeet commented Nov 5, 2021 • edited

ChinaYi commented Nov 6, 2021

seyeeet commented Nov 6, 2021 • edited

ChinaYi commented Nov 7, 2021

seyeeet commented Nov 7, 2021

seyeeet commented Nov 5, 2021 •

edited

seyeeet commented Nov 6, 2021 •

edited