-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regarding the dimension of query and key #11
Comments
No, the "half" here is just an architectural hyper-parameter, which controls efficiency-accuracy trade-off. Empirically, we found "half" is a good trade-off in our cases, but "one" or "quarter" might also work well. |
@csrhddlam Thanks. I guess "one" is supposed to give the best accuracy? Do you have an estimate of how better it is compared to "half"? |
Yes, more channels usually leads to better accuracy, but we did not study much about it. Personally, I won't expect much improvement by switching from "half" to "one". |
Thanks for the answer. |
Hi,
I observed in the code that the query's and key's dimensions are haft of the value's (
out_planes // 2
,group_planes // 2
). Is there a specific reason for that (apart making it faster)?Thanks.
The text was updated successfully, but these errors were encountered: