About the positional encoding #67

Master-cai · 2023-09-25T07:55:48Z

Hi!
Your ablation experiments demonstrate the excellent performance of relative position encoding, howerver, I have two questions:

The original RoPE uses Sinusodial encoding. I'm not very understanding why you use "Fourier features" instead of it.
The original RoPE is designed for language, which is 1 dimension. If I'm not mistaken, you just use the 1-d RoPE to encode the position of keypoints in the code. Howerver, the image is 2 dimensions data, i thank it is not suitable. Or it is the reason that you use the "Fourier features" ?

Looking forward for your reply!

sarlinpe · 2023-10-10T07:26:31Z

RoPE is indeed 1D. To adapt it to higher-dimensional data, some works like Lepard partition the feature space and treat each dimension individually, which is similar to a 1D sinusoidal encoding. As shown by Li et al, this introduces a bias along the basis axes. Using random learnable basis (Fourier Features) removes that bias and empirically learns a better data-dependent encoding.

Master-cai · 2023-10-10T07:42:44Z

Thank you for your patient! this confirms my thoughts.

Master-cai closed this as completed Oct 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the positional encoding #67

About the positional encoding #67

Master-cai commented Sep 25, 2023

sarlinpe commented Oct 10, 2023

Master-cai commented Oct 10, 2023

About the positional encoding #67

About the positional encoding #67

Comments

Master-cai commented Sep 25, 2023

sarlinpe commented Oct 10, 2023

Master-cai commented Oct 10, 2023