New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the implementation #7
Comments
Thanks for the questions. (1) If we do So doing (2) Yes, using [1 2 1] is essentially equivalent to bilinear interpolation |
Thanks for your quick reply, I will rethink deeply into these. |
I don't think these two are strictly equivalent. Here is the code snippet to verify it:
The assert cannot be passed. Furthermore, the increased FLOPs might contribute to the increased accuracy. |
Thanks for the code snippet. Your implementation differs from my statement in a critical way -- note the striding. The readme has a new plot showing accuracy vs run-time. |
Thanks for your quick reply and pointing out my mistakes. I update the code snippet but the assert still cannot be passed. Anyway, the difference becomes much smaller. |
Maybe you can print the norm of the error vs the norm of the output signal. I suspect the discrepancy is due to numerical issues. In any case, the equivalence should be provable. The avgpool is a convolution with a [1 1; 1 1] filter, and convolutions should be commutable. In fact, you could combine the two operations by applying [1 1; 1 1] to the 3x3 conv kernels, making a single 4x4 conv layer. |
The average difference value is about 0.01, which cannot be caused by numerical issues. |
Maybe you can take some time to go through the proof that convolutions are associative and communative. https://en.wikipedia.org/wiki/Convolution
(1.284619802805647e-15, 0.05549472197890282) |
Thanks a lot. I just figure it out. |
Thanks for your great work! I have two questions regarding the implementation details:
(1) In the situation of strided-convolution, why the BlurPool layer is placed after the ReLU rather than right next to the convolution?
It would be much more flexible if the conv and blurpool can be coupled.
I was considering the implementation in the pre-activation resnet.
(2) This question might be silly, but why not apply bilinear interpolation layers to downsample the feature map? I haven't seen any work use it.
The text was updated successfully, but these errors were encountered: