New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why 'ws 1 for stand attention' in your GroupAttention code? #12
Comments
This is an implementation choice. |
but the standard self-attention should use a window size equals to to feature size (ws = 7 in the last stage) |
In detailed implementation, Ws=7 does not work in the last stage. Please check the code. |
@cxxgtxy In your paper, the last stage you only use GSA. for 224 classification, the last stage feature size = 7x7, ws = 7 (LSA) and ws = 1 (GSA) is equal,but for detection or segmentation, the last stage feature size maybe not 7x7, ws = 7 (LSA) and ws = 1 (GSA) is not equal, dose this mean you use LSA and GSA at the same time for the last stage? |
It's a good question. The feature map size for the detection and segmentation task is indeed larger than 7*7. As for implementation, we use ws=1 (GSA) in the last stage (as classification). |
* add pytorch2onnx part * Update according to the latest mmcv * add docstring * update docs * update docs Co-authored-by: Jiarui XU <xvjiarui0826@gmail.com>
I find that in your implementation of GroupAttention in gvt.py, you comment that 'ws 1 for stand attention'.
However, I think ws means the window size, if ws=1, than the self-attention is only performed in a 1x1 window, which is not the standard self-attention.
The text was updated successfully, but these errors were encountered: