Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computed C++ linker RAM is way off reality #17368

Open
jmmv opened this issue Jan 31, 2023 · 16 comments
Open

Computed C++ linker RAM is way off reality #17368

jmmv opened this issue Jan 31, 2023 · 16 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Local-Exec Issues and PRs for the Execution (Local) team type: bug

Comments

@jmmv
Copy link
Contributor

jmmv commented Jan 31, 2023

Description of the bug:

In our C++ builds, we have observed Bazel eagerly running more than a few link actions at once. Each of these linker executions can easily consume 8GB, so combining just a handful together can easily exceed RAM limits. Using -c dbg bumps the memory consumption to 20GB per linker, which is ridiculous. We are currently using ld.

The puzzling thing was that even using something like --local_ram_resources to limit Bazel to a few GB, say 2GB, did not help at all. Bazel kept scheduling too many linkers at once. This made me think that the RAM computations for the link actions were not working correctly.

Upon looking I noticed commit 01c10e0, which mentions that the new limits were set based on data collected at Google. Thus I'm surprised that this new behavior didn't work well.

I ended up patching CppLinkAction.java to dump the resources computed in estimateResourceConsumptionLocal and found that all of our link actions are assumed to only need 50MBs, which is way off reality. I also noticed that our problematic actions only have about ~600 inputs.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Come up with a link action that, with a small number of inputs, consumes multiple GBs. (This is what we observe but I have no idea at this point why the linker is behaving in this way.)

Craft a build file that contains multiple independent such actions.

Run the build and see Bazel trigger too many linkers at once, overwhelming the machine.

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

release 6.0.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

There is #6477 which would help if we had it, but I would prefer to avoid crafting our own rules just to paper over a deficiency of existing native rules. A tag or other attribute on the existing rules would work better from a usability perspective.

Any other information, logs, or outputs that you want to share?

I do not think the C++ link rule can provide a local resource set that works in all cases, if only because there are different linkers and they have different memory needs, and because even the same linker can exhibit vastly different behavior depending on compilation modes (see 8GB vs. 20GB when enabling dbg mode).

There should be a way for downstream users to configure the resource sets of the C++ rules without having to modify the Bazel source code. I'd suggest supporting cpu:X and ram:X tags or similar that override whatever the rules compute; this would have been sufficient for us to workaround the problem.

@ShreeM01 ShreeM01 added type: bug untriaged team-Local-Exec Issues and PRs for the Execution (Local) team labels Feb 1, 2023
@wilwell wilwell removed the untriaged label Feb 7, 2023
@wilwell wilwell self-assigned this Feb 7, 2023
@meisterT meisterT added the P2 We'll consider working on this in future. (Assignee optional) label Feb 7, 2023
@lm1baker
Copy link

Can report the same problem, bazel will happily start 20+ linker threads on a CI node with 108GB Ram, with each thread consuming 6GB. Setting e.g. --local-ram-resources=24576MB (about 25% of system ram) does not even help with this.
Manually tuning with --jobs (informed by OOMkiller logs) is the only workaround, though this does slow compilation and other tasks.

@jmmv
Copy link
Contributor Author

jmmv commented May 22, 2023

Another thought that came to mind: maybe it's the toolchain definition what should be able to supply the memory model for the linker and compiler. I'm not sure if this would be feasible, but maybe there could be a user-supplied function that takes the compilation mode and details about the inputs (like the total count) as parameters and returns the computed expected memory. Expressing arbitrary formulas via flags is not a great idea, and encoding limits via tags also doesn't scale.

@larsrc-google
Copy link
Contributor

For starlark-defined rules, you can give such a function indeed. But the CppLink function is based on the internal graph, which tends to have a huge amount of relatively small targets, where your case seems to have relatively few but large inputs. So we'd want to look at the size of the inputs, but that would be a more expensive computation.

@kjteske
Copy link
Contributor

kjteske commented Jul 7, 2023

We've got a similar use case, but with C++ compilations in addition to links:

Scattered in with thousands of normal-sized compiles, we have a number of large compiles (or links) where gcc takes >2GB of RAM. If enough of those get scheduled to run in parallel, we can OOM pretty easily even on beefy machines.

We generally know which of these compiles are huge, and there aren't that many of them, so we could tag these particular actions in BUILD files or our rules, and deal with the maintenance burden of keeping that up to date.

--experimental_action_resource_set gives Bazel a hook we could use... but I don't see a way to access that through the native cc rules, or even cc_common. If cc_common.compile() took an optional resource_set parameter that was forwarded to ctx.actions, I think that would give us the flexibility we're looking for.

@larsrc-google
Copy link
Contributor

Here's what I would suggest trying: In CppLinkAction.createSpawn(), pass the inputs instead of merely the count of inputs into estimateResourceConsumptionLocal. In there, compute the size of the inputs and use that as your estimate. Then see how close you get to reality. This will take longer to estimate due to looking at all the inputs, so benchmarking it would be crucial, but I think for linker-style actions it may be necessary.

@jmmv
Copy link
Contributor Author

jmmv commented Jul 17, 2023

That's a good idea. I modified CppLinkAction to capture the size of the inputs along with the counts, made it dump metrics to the log, and graphed the results. The size does seem to track consumption much better, at least for our builds with lld on Linux. This applies to fastbuild/dbg/opt with and without dynamic linking, as well as amd64 and arm64. Roughly, I'm observing that memory consumption is ~2x the size of the inputs.

I've left my changes in a custom branch that you can compare against 6.2.1 here (https://github.com/bazelbuild/bazel/compare/6.2.1...Snowflake-Labs:bazel:linker-memory-estimation?expand=1) but I have not created a PR yet because the branch mixes instrumentation with memory tweaks. If you find value in the instrumentation portion alone (computing input sizes and logging them), I can split that into its own PR for inclusion. Then, you could rerun whichever analysis you previously did with your data and see what comes up. (For example, I won't be able to do this for neither macOS nor Windows.)

@larsrc-google
Copy link
Contributor

Nice that your estimates are stable across various options variations. Did you do any benchmarks with your resulting memory estimations?

@jmmv
Copy link
Contributor Author

jmmv commented Aug 7, 2023

@larsrc-google No, I did not. I do not think our build or link actions are sufficiently large to gather any meaningful performance data of this change (other than seeing no OOMs).

What exactly would you like to see measured? A microbenchmark could help quantify the cost, but then, a microbenchmark wouldn't be realistic. In a real build, link actions are infrequent so unless their cost is massively different, they won't impact end-to-end build times. And in any case, this change only impacts local link actions, so if the majority of link actions happen remotely (as is also commonly the case for sufficiently-large builds), then this change would have no impact...

@larsrc-google
Copy link
Contributor

I would suggest using the JSON profile data on realistic builds run local-only (or local-for-link-only). That would allow you to measure realistic time differences between different estimates.

Doing this with compile actions like @kjteske talks about would be easier, I guess, as you would be able to see whether the machine gets overloaded/too few actions get scheduled.

It's also interesting to know if getting the file sizes takes a non-trivial amount of time, I hope those values are in memory, but stat'ing all the inputs all the time might be problematic.

One nice thing about your change is that it defers flattening the nestedset, so the estimation implementation has more say in whether to do that.

@jmmv
Copy link
Contributor Author

jmmv commented Aug 7, 2023

OK so now I'm confused. I thought you were concerned about the cost of the depset expansion + stating the files, which I think is going to be ~impossible to measure accurately for the reasons I described above. Furthermore, keep in mind what you saw in the patch: the stating only happens after Bazel has decided that it's going to run the action locally---which means that stating the files is not going to be wasted work. Stating the inputs will bring their metadata into memory if they weren't already there, which the linker is going to need anyway at a later stage... so such cost is going to be negligible.

But your last reply seems to imply that you want me to measure the impact of scheduling of the actions, and I'm not sure what you'd want to see there. Previous behavior = OOM / memory thrashing; new behavior = not OOM. There is no runtime comparison that makes sense in this case. Also, the model I computed is based on actual memory consumption measurements so I can see, in a spreadsheet, what the impact of the new predictions will be compared to the current ones or the real consumption numbers.

In any case: the patch I pasted is primarily about instrumenting the code to support grabbing the file input sizes and logging real memory consumption from which to derive a new model. I'm not ready to submit a new memory model, which is why I said I could split the patch into two. And I don't think I even have the means to propose a new model. I think it's Google who is in the best position to obtain metrics for the linker; after all, you have a much bigger and varied codebase from which to obtain a good signal.

jmmv added a commit to Snowflake-Labs/bazel that referenced this issue Aug 8, 2023
This PR instrumets the CppLinkAction to print log lines for every link
operation and dumps the number of inputs, the aggregate size of the
inputs, the predicted memory usage, and the actual consumed memory.

Calculating the aggregate size of the inputs is a costly operation
because we need to expand a nested set and stat all of its files, which
explains why this PR is complex.  I've modified the local memory
estimator to compute the inputs size exactly once per link as we need
this information both to predict memory consumption _and_ to emit the
log line after the spawn completes.

This is meant to aid in assessing the local resources prediction for
linking with the goal of adjusting the model, but that should be done
in a separate change.

Helps address bazelbuild#17368.
@jmmv
Copy link
Contributor Author

jmmv commented Aug 8, 2023

I've rebased my temporary code on top of master and removed the changes to the memory model in order to isolate the code to compute input sizes: #19203 .

jmmv added a commit to Snowflake-Labs/bazel that referenced this issue Aug 9, 2023
This PR instrumets the CppLinkAction to print log lines for every link
operation and dumps the number of inputs, the aggregate size of the
inputs, the predicted memory usage, and the actual consumed memory.

Calculating the aggregate size of the inputs is a costly operation
because we need to expand a nested set and stat all of its files, which
explains why this PR is complex.  I've modified the local memory
estimator to compute the inputs size exactly once per link as we need
this information both to predict memory consumption _and_ to emit the
log line after the spawn completes.

This is meant to aid in assessing the local resources prediction for
linking with the goal of adjusting the model, but that should be done
in a separate change.

Helps address bazelbuild#17368.
copybara-service bot pushed a commit that referenced this issue Aug 17, 2023
This PR instrumets the CppLinkAction to print log lines for every link operation and dumps the number of inputs, the aggregate size of the inputs, the predicted memory usage, and the actual consumed memory.

Calculating the aggregate size of the inputs is a costly operation because we need to expand a nested set and stat all of its files, which explains why this PR is complex.  I've modified the local memory estimator to compute the inputs size exactly once per link as we need this information both to predict memory consumption _and_ to emit the log line after the spawn completes.

This is meant to aid in assessing the local resources prediction for linking with the goal of adjusting the model, but that should be done in a separate change.

Helps address #17368.

Closes #19203.

PiperOrigin-RevId: 557888150
Change-Id: Icbfc94b4761497ce78a79baccc56ea15c5ca0053
@jmmv
Copy link
Contributor Author

jmmv commented Aug 18, 2023

Now that the preparatory changes to calculate input sizes are in, it's time to tackle the model update but I'm not sure what the best way to go about this would be.

Ideally, the model should be taken out of the native Java code and put into the Starlark toolchain definition. This way, even if the model is inaccurate (which I'm sure will be for random use cases because there are way too many variables involved), downstream Bazel users could trivially update it.

The idea I had was to extend the toolchain definition with a linker_resources_model attribute that provides a function, and such function takes the OS name, the number of inputs, their size, etc. Something like this:

def my_model(os=None, input_bytes=None, input_sizes=None):
    ... calculate something ...
    return {cpu: 1, ram: 2}

cc_toolchain(
   ...
   linker_resources_model = my_model,
)

Note that my_model is a Starlark callable that would have to be kept around and its evaluation must be delayed until action execution.

I started prototyping this today and had almost everything wired up... until I hit the roadblock that attributes can only have a narrow set of types, and a callable is not among those supported types. So I think this might be a dead end.

Do you have any other thoughts on this, or suggestions for an alternative approach?

@sinelaw
Copy link

sinelaw commented Nov 28, 2023

Hi! Any updates? We're encountering OOM due to large linker processes, and this would really help.

@tjgq
Copy link
Contributor

tjgq commented Dec 6, 2023

Marking untriaged to surface it again at our next bug triage meeting.

@wilwell wilwell removed the untriaged label Jan 9, 2024
@luispadron
Copy link

Is there any update on this issue? We're seeing ld terminate with segfault from a signal and believe this could be related.

Have folks found any workarounds in the meantime (that doesn't force setting jobs)?

@liam-baker-sm
Copy link

Frequently run builds where ld can take over 6GB of ram.
At the moment the only work around is to set --jobs or allocate less CPU's to the build machines.
If not caught the OOM killer usually targets bazel itself

Server terminated abruptly (error code: 14, error message: 'Socket closed', log file: '~.cache/bazel/[path]/server/jvm.out')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Local-Exec Issues and PRs for the Execution (Local) team type: bug
Projects
None yet
Development

No branches or pull requests