Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing BN layer with AccumBN layer results in poorer convergence #116

Open
andreped opened this issue Sep 11, 2023 · 0 comments
Open
Labels
bug Something isn't working

Comments

@andreped
Copy link
Owner

andreped commented Sep 11, 2023

Describe the bug
In the latest release of gradient-accumulator==0.5.2, there was added a method to add accum support to existing BN layers.

However, when attempting to use it in production, models seems to struggle to converge. We should benchmark this layer to verify that it is actually working as expected, and perhaps add units tests that capture if the approximation is too poor for production use before merging PR to the main branch.

Expected behavior
Swapping the BN layer with AccumBN should be seemless, transfer old weights to new layer, and yield better convergence than regular BN for accum_steps > 1 (in general).

Desktop (please complete the following information):

  • OS: [e.g. Ubuntu] Ubuntu
  • Version: [e.g. 20.04] 20.04
  • Python: [3.9] 3.8.10
  • TensorFlow: [2.8.0] 2.11.0
@andreped andreped added the bug Something isn't working label Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: To do
Development

No branches or pull requests

1 participant