Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge heap usage with large number of files and RE #16876

Closed
exoson opened this issue Nov 29, 2022 · 3 comments
Closed

Huge heap usage with large number of files and RE #16876

exoson opened this issue Nov 29, 2022 · 3 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@exoson
Copy link
Contributor

exoson commented Nov 29, 2022

Description of the bug:

Huge memory usage is observed when running a large number of tests in remote execution with a large number of input files with their individual runfiles trees created in starlark using ctx.runfiles. If you have the same number of tests with the same number of input files, but instead of create individual runfiles trees you create a single one, the memory usage stays low.

Looking at bazel code, I could not find a reason why the memory usage should be duplicated in one case but not duplicated in the other so this seems like a bug.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I have a repro for the issue in https://github.com/exoson/oom_repro. To repro the issue you will additionally need a .bazelrc file which includes configuration for remotely executing actions. I have used --config=remote_exec to configure remote execution in my setup.
Inside this repository, when running
bazel test --jobs=1000 --nocache_test_results --config=remote_exec //no_oom/...
everything will be fine and no huge heap usage will happen. When trying to run
bazel test --jobs=1000 --nocache_test_results --config=remote_exec //oom/...
The heap size will increase to my max of 16GB almost instantly when it starts to prepare for executing the tests. Also if you do not use remote execution, either invocation will have low memory usage.

Which operating system are you running Bazel on?

linux

What is the output of bazel info release?

release 5.3.2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

N/A

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

N/A

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@sgowroji sgowroji added type: bug untriaged team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Nov 29, 2022
@exoson
Copy link
Contributor Author

exoson commented Dec 5, 2022

Discussion in #6394 about the memory usage increase with async remote execution is most likely related.

@vladmos vladmos added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Dec 6, 2022
@coeuvre
Copy link
Member

coeuvre commented Jan 16, 2023

Does --experimental_remote_discard_merkle_trees make it better? #17120

I believe #6394 can be solved via virtual thread which we are evaluating.

@exoson
Copy link
Contributor Author

exoson commented Jan 16, 2023

Does --experimental_remote_discard_merkle_trees make it better? #17120

The build/test command line is now runnable with a reasonably small heap even though still using all of it and needing to do a good amount of garbage collecting.

Also realized that the no_oom side of the repro had some weird reason to not include the generated data files in the runfile tree prior to #16654. With that flag flipped on, the no_oom version seems to be behaving the same way as the oom version.

I guess this issue is fine to close now as the repro wasn't a repro after all and the problem is alleviated anyways with that flag.

@exoson exoson closed this as completed Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

No branches or pull requests

4 participants