Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the positional encoding #67

Closed
Master-cai opened this issue Sep 25, 2023 · 2 comments
Closed

About the positional encoding #67

Master-cai opened this issue Sep 25, 2023 · 2 comments

Comments

@Master-cai
Copy link

Hi!
Your ablation experiments demonstrate the excellent performance of relative position encoding, howerver, I have two questions:

  1. The original RoPE uses Sinusodial encoding. I'm not very understanding why you use "Fourier features" instead of it.
  2. The original RoPE is designed for language, which is 1 dimension. If I'm not mistaken, you just use the 1-d RoPE to encode the position of keypoints in the code. Howerver, the image is 2 dimensions data, i thank it is not suitable. Or it is the reason that you use the "Fourier features" ?

Looking forward for your reply!

@sarlinpe
Copy link
Member

RoPE is indeed 1D. To adapt it to higher-dimensional data, some works like Lepard partition the feature space and treat each dimension individually, which is similar to a 1D sinusoidal encoding. As shown by Li et al, this introduces a bias along the basis axes. Using random learnable basis (Fourier Features) removes that bias and empirically learns a better data-dependent encoding.

@Master-cai
Copy link
Author

Thank you for your patient! this confirms my thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants