Memory leak #344

zkdfbb · 2018-12-05T12:48:10Z

When I upgrade to 0.5.0 I found a huge memory usage growth. I checked version 0.2.0(include 0.3), the memory growth as:
4826, 5096, 5302
and 0.5.0(include 0.4.1) growth as:
6754, 8970, 11114

and when run evaluation, the last batch maybe smaller, so every evaluation I re read the datasets(tfrecords), the result under version 0.2.0 is:
4826, 5294, 5510

How can I solve this problem?

JanuszL · 2018-12-05T12:56:15Z

Hi,
This problem was reported in #328 as well. Now we don't have a good, general solution for that.
Could you tell more about pipeline do you use, what data set and etc?
Maybe this will help us to narrow down the problem you experiencing.
Tracked as DALI-425.
Br,
Janusz

zkdfbb · 2018-12-05T13:13:58Z

@JanuszL my own dataset, not public, using TFRecordPipeline. I'm afraid there is a bug between 0.3.0 and 0.4.0

JanuszL · 2018-12-05T13:26:33Z

Can you check this on some publicly available data set, so we have 1005 reproduction here as well?
Otherwise even if we fix something we cannot tell if it works for you.

mzient · 2020-02-04T17:31:37Z

Regarding host memory usage:
We're pleased to say that we've changed the allocation strategy for (non-pinned) CPU buffers. It reduces the memory consumption in RN50 training in PyTorch by almost 50%. Please check the latest master (or next successul nightly) and see if your issue is resolved.
The memory is now freed when a requested tensor is smaller than a given percentage of actual allocation. You can tweak it by setting the environment variable DALI_HOST_BUFFER_SHRINK_THRESHOLD=0.xx. The default value is 0.9. You can also set it in python using nvidia.dali.backend.SetHostBufferShrinkThreshold(threshold).

JanuszL · 2020-03-02T21:51:44Z

0.19 is out and should address this. Please reopen if it doesn't work.

JanuszL added bug Something isn't working question Further information is requested labels Dec 5, 2018

grez72 mentioned this issue Jan 28, 2019

memory use continuously increasing #486

Open

JanuszL removed the question Further information is requested label Jan 21, 2020

JanuszL added this to the Release_0.19.0 milestone Feb 4, 2020

JanuszL closed this as completed Mar 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak #344

Memory leak #344

zkdfbb commented Dec 5, 2018

JanuszL commented Dec 5, 2018

zkdfbb commented Dec 5, 2018

JanuszL commented Dec 5, 2018

mzient commented Feb 4, 2020

JanuszL commented Mar 2, 2020

Memory leak #344

Memory leak #344

Comments

zkdfbb commented Dec 5, 2018

JanuszL commented Dec 5, 2018

zkdfbb commented Dec 5, 2018

JanuszL commented Dec 5, 2018

mzient commented Feb 4, 2020

JanuszL commented Mar 2, 2020