You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!
Your ablation experiments demonstrate the excellent performance of relative position encoding, howerver, I have two questions:
The original RoPE uses Sinusodial encoding. I'm not very understanding why you use "Fourier features" instead of it.
The original RoPE is designed for language, which is 1 dimension. If I'm not mistaken, you just use the 1-d RoPE to encode the position of keypoints in the code. Howerver, the image is 2 dimensions data, i thank it is not suitable. Or it is the reason that you use the "Fourier features" ?
Looking forward for your reply!
The text was updated successfully, but these errors were encountered:
RoPE is indeed 1D. To adapt it to higher-dimensional data, some works like Lepard partition the feature space and treat each dimension individually, which is similar to a 1D sinusoidal encoding. As shown by Li et al, this introduces a bias along the basis axes. Using random learnable basis (Fourier Features) removes that bias and empirically learns a better data-dependent encoding.
Hi!
Your ablation experiments demonstrate the excellent performance of relative position encoding, howerver, I have two questions:
Looking forward for your reply!
The text was updated successfully, but these errors were encountered: