Memory leak during CPU inference #7

debasish-mihup · 2021-11-06T04:18:02Z

I have trained efficient conformer transducer and during inference on a CPU in a flask based web app, I see there is a memory leak at

EfficientConformer/models/encoders.py

Line 128 in 2f59ed2

x, attention, hidden = block(x, mask)

The memory used at the above line while running an inference is never released, thus causing OOM is some time. Increase in memory in each iteration gets lowered but not totally removed if I use jemalloc.

debasish-mihup · 2021-11-08T12:38:42Z

Using last release version of jemalloc 5.2.1 and using torch.no_grad() fixed the memory leak.

burchim · 2021-11-15T12:47:59Z

Hi,

Thanks for pointing this out!
The gradient of attention weights and hidden states aren't detached in attention layers.

debasish-mihup closed this as completed Nov 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak during CPU inference #7

Memory leak during CPU inference #7

debasish-mihup commented Nov 6, 2021

debasish-mihup commented Nov 8, 2021

burchim commented Nov 15, 2021

Memory leak during CPU inference #7

Memory leak during CPU inference #7

Comments

debasish-mihup commented Nov 6, 2021

debasish-mihup commented Nov 8, 2021

burchim commented Nov 15, 2021