New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale problem #5
Comments
The division by |
I see, thank you for the fast reply! |
SPE assumes that the attention mechanism will normalize QK^T by R^(1/2). In the Performer you never actually compute QK^T, so what you do instead is normalize both Q and K by R^(1/4), which is equivalent. This should normally happen automatically when you plug our modified keys/queries into an existing Performer implementation, e.g. here (where So you should check if your implementation of phi already does this normalization or not. |
Hey I am a little bit confused about the scale.
Inside SineSPE() you deal with the scale (both d^0.25 and num_realizations^0.25)
On the other hand when you show the application in pytorch, after applying the filter you divide by sqrt(num_realizations) again, why is that?
https://github.com/aliutkus/spe/blob/main/src/pytorch/examples/test_spe.ipynb
The text was updated successfully, but these errors were encountered: