Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak during CPU inference #7

Closed
debasish-mihup opened this issue Nov 6, 2021 · 2 comments
Closed

Memory leak during CPU inference #7

debasish-mihup opened this issue Nov 6, 2021 · 2 comments

Comments

@debasish-mihup
Copy link

I have trained efficient conformer transducer and during inference on a CPU in a flask based web app, I see there is a memory leak at

x, attention, hidden = block(x, mask)

x, attention, hidden = block(x, mask)

The memory used at the above line while running an inference is never released, thus causing OOM is some time. Increase in memory in each iteration gets lowered but not totally removed if I use jemalloc.

@debasish-mihup
Copy link
Author

Using last release version of jemalloc 5.2.1 and using torch.no_grad() fixed the memory leak.

@burchim
Copy link
Owner

burchim commented Nov 15, 2021

Hi,

Thanks for pointing this out!
The gradient of attention weights and hidden states aren't detached in attention layers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants