Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about input dimensions #3

Closed
DaiZhenrong opened this issue Mar 26, 2022 · 5 comments
Closed

A question about input dimensions #3

DaiZhenrong opened this issue Mar 26, 2022 · 5 comments
Labels
question Further information is requested

Comments

@DaiZhenrong
Copy link

Hi Ian Shih,
Thank for your amazing job.
In my memories,the input dimesions of Transformer is [batch_size, seq_len,d_model]. So I wonder that why your input dimesions of model is seq_len, batch_size, d_model?
I will appreciate that if I can your reply.

Thanks.

@atosystem
Copy link
Owner

@DaiZhenrong

For pytorch built-in transformer module, you can input with the dimension order of (batch_size, seq_len, d_model) or seq_len, batch_size, d_model. It depends on batch_first. For the former batch_first=True, and for the latter batch_first=False. I used the default setting in Theme Transformer, which is False.

Here is the doc for torch.nn.Tranformer

@DaiZhenrong
Copy link
Author

oh, I got it. Thanks a lot for your reply!

@DaiZhenrong
Copy link
Author

And I also wander that why seq_len you chose is 512 instead of 1024 or 256. I would appreciate it if you could reply.

@atosystem
Copy link
Owner

@DaiZhenrong
Yeah, I start my experiments using 512 (for lower memory). But in the ablation studies, I have run Theme Transformer on sequence length of 1024, which has a better performance in capturing the theme gaps. (For more details, please refer to the paper)

So, in general, the model performs better with a longer sequence length as long as your GPU memory permits.
Also, one may also replace the architecture of vanilla transformer with transformerXL or Longformer, which feature in modeling long sequences.

@DaiZhenrong
Copy link
Author

Ok, I see. Thank you for your prompt reply, help and work!

@atosystem atosystem added the question Further information is requested label Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants