Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generating indexed_binary files causes kernel OOM to kill process #181

Open
krehm opened this issue Apr 8, 2024 · 0 comments
Open

generating indexed_binary files causes kernel OOM to kill process #181

krehm opened this issue Apr 8, 2024 · 0 comments

Comments

@krehm
Copy link
Contributor

krehm commented Apr 8, 2024

While testing other code, I tried to generate 168 indexed_binary sample files using a single dlio_benchmark process. As each file is created, the memory of the process grows until by the time it is creating file number 49 the process's memory has reached 240 GB and the kernel kills the process.

The memory growth occurs in method generate() in indexed_binary_generator.py. Since only a single process was used (comm_size == 1) the else clause in that routine is what produces the sample files.

This statement causes the memory problem:

binary_data = struct.pack(myfmt, *records[:data_to_write])

I can add print statements before it that print including for file #49, but a print statement after it does not print when the process is killed. Googling, I found that 'struct' caches data. I couldn't find documentation on the caching policy, when or if evictions are ever done, but there is a function

struct._clearcache()

which, if called immediately after the binary_data has been written to data_file, releases the cache memory and the size of the process then stays reasonably constant as all 168 files are created.

krehm added a commit to krehm/dlio_benchmark that referenced this issue Apr 8, 2024
…gonne-lcf#181)

The struct.pack() call in generate() in indexed_binary_generator.py
caches the data that it produces, and apparently doesn't evict the
cache, such that after 49 indexed_binary files have been created the
kernel kills the process due to OOM.  This mod adds a call to
struct._clearcache() after each data files is written to release
the cached data, keeping the process memory size stable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant