A question about input dimensions #3

DaiZhenrong · 2022-03-26T12:56:52Z

Hi Ian Shih,
Thank for your amazing job.
In my memories，the input dimesions of Transformer is [batch_size, seq_len,d_model]. So I wonder that why your input dimesions of model is seq_len, batch_size, d_model?
I will appreciate that if I can your reply.

Thanks.

atosystem · 2022-03-26T14:08:10Z

@DaiZhenrong

For pytorch built-in transformer module, you can input with the dimension order of (batch_size, seq_len, d_model) or seq_len, batch_size, d_model. It depends on batch_first. For the former batch_first=True, and for the latter batch_first=False. I used the default setting in Theme Transformer, which is False.

Here is the doc for torch.nn.Tranformer

DaiZhenrong · 2022-03-27T11:07:46Z

oh, I got it. Thanks a lot for your reply！

DaiZhenrong · 2022-03-27T11:26:44Z

And I also wander that why seq_len you chose is 512 instead of 1024 or 256. I would appreciate it if you could reply.

atosystem · 2022-03-27T13:18:17Z

@DaiZhenrong
Yeah, I start my experiments using 512 (for lower memory). But in the ablation studies, I have run Theme Transformer on sequence length of 1024, which has a better performance in capturing the theme gaps. (For more details, please refer to the paper)

So, in general, the model performs better with a longer sequence length as long as your GPU memory permits.
Also, one may also replace the architecture of vanilla transformer with transformerXL or Longformer, which feature in modeling long sequences.

DaiZhenrong · 2022-03-28T03:38:24Z

Ok, I see. Thank you for your prompt reply, help and work!

DaiZhenrong closed this as completed Mar 27, 2022

atosystem added the question Further information is requested label Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about input dimensions #3

A question about input dimensions #3

DaiZhenrong commented Mar 26, 2022

atosystem commented Mar 26, 2022

DaiZhenrong commented Mar 27, 2022

DaiZhenrong commented Mar 27, 2022

atosystem commented Mar 27, 2022

DaiZhenrong commented Mar 28, 2022

A question about input dimensions #3

A question about input dimensions #3

Comments

DaiZhenrong commented Mar 26, 2022

atosystem commented Mar 26, 2022

DaiZhenrong commented Mar 27, 2022

DaiZhenrong commented Mar 27, 2022

atosystem commented Mar 27, 2022

DaiZhenrong commented Mar 28, 2022