Allow CppLinkAction to inspect input sizes in resource estimation #19203

jmmv · 2023-08-08T23:02:38Z

This PR instrumets the CppLinkAction to print log lines for every link operation and dumps the number of inputs, the aggregate size of the inputs, the predicted memory usage, and the actual consumed memory.

Calculating the aggregate size of the inputs is a costly operation because we need to expand a nested set and stat all of its files, which explains why this PR is complex. I've modified the local memory estimator to compute the inputs size exactly once per link as we need this information both to predict memory consumption and to emit the log line after the spawn completes.

This is meant to aid in assessing the local resources prediction for linking with the goal of adjusting the model, but that should be done in a separate change.

Helps address #17368.

This PR instrumets the CppLinkAction to print log lines for every link operation and dumps the number of inputs, the aggregate size of the inputs, the predicted memory usage, and the actual consumed memory. Calculating the aggregate size of the inputs is a costly operation because we need to expand a nested set and stat all of its files, which explains why this PR is complex. I've modified the local memory estimator to compute the inputs size exactly once per link as we need this information both to predict memory consumption _and_ to emit the log line after the spawn completes. This is meant to aid in assessing the local resources prediction for linking with the goal of adjusting the model, but that should be done in a separate change. Helps address bazelbuild#17368.

kjteske · 2023-08-09T01:12:14Z

I'm interested / watching #17368, and haven't had a chance to try this out, but wonder if the constants used to estimate resources should be more configurable, rather than hardcoded - resource consumption likely varies greatly based on toolchain/linker, flags used (LTO, in particular, results in much more expensive gcc links), or details about the code being linked.

Perhaps not in this MR, but maybe in the future, we could consider command line flags to modify these constants, or a configurable bit added to cc_toolchain?

jmmv · 2023-08-09T02:09:19Z

@kjteske Oh yes, definitely. I'd prefer to see the toolchain definition extended via a function that allows the dynamic computation of compiler / linker resources because, as you say, there are too many variables to take into account and a single model isn't going to work.

That said, such a function will need to receive details about the action inputs as parameters and it should only be executed if the action is going to run locally. So I think this PR takes us in that direction anyway. Replacing the built-in model with an externally-supplied one should be a separate change.

Plus, having this merged would help us maintain our own internal patch of the model. After this change is in, the model tweaks are a few trivial modifications to just a few lines, which is much easier to maintain than this whole thing.

nikhilkalige · 2023-08-09T03:06:58Z

Would love to see this change too, I am being harassed for the past month about builds being killed :)

src/main/java/com/google/devtools/build/lib/rules/cpp/CppLinkAction.java

lberki · 2023-08-10T12:17:46Z

src/main/java/com/google/devtools/build/lib/rules/cpp/CppLinkAction.java

+            logger.atWarning().log(
+                "Linker metrics: failed to get size of %s: no metadata", input.getExecPath());
+          }
+        } catch (Exception e) {


let's not swallow Exceptions en bloc. Either only swallow IOException (worse) or re-throw it wrapped into ExecException (better)

The reason for swallowing exceptions here was that the input sizes aren't used yet, so it seems too heavy-handed to fail hard if something goes wrong. The idea was to have this in place first to measure linker behavior by looking at the logs, and once we have data and a better model, we can put it in place (and then make this a hard failure once we have observed no failures).

I've changed the code to catch IOException (the broad Exception was a left-over from a previous draft) and added a TODO.

What do you think?

I don't have a strong opinion here; I'd prefer failing hard because if stat() doesn't work on an input file, the action will very likely not succeed anyway (every input file must be checksummed so that the action checksum can be computed).

A mitigating factor is that this IOException only ever happens if I/O is needed to get the metadata, which AFAIU only ever happens during action input discovery, which is not something link actions do so arguably the way this exception is handled is only a theoretical concern.

iancha1992 · 2023-08-11T17:43:18Z

@lberki @jmmv I have retried. But looks like the tests are not passing. Can you please take a look?

cc: @bazelbuild/triage

jmmv · 2023-08-11T23:47:00Z

My bad. Tests should be fixed now.

lberki · 2023-09-11T14:50:12Z

FYI: I have to change the log level to FINE because it's spamming our logs quite a bit, and my professional estimate is that there are way fewer people who are interested in this log line than those who aren't so it's better to impose extra work (setting up logging so that these lines get logged) on the former group.

meisterT · 2023-09-12T14:25:42Z

src/main/java/com/google/devtools/build/lib/rules/cpp/CppLinkAction.java

+        lazyData = doCostlyEstimation();
+      }
+
+      logger.atInfo().log(


Late to the party - do you need this log for every link action or is it ok to get a sample? I am asking because this seems to cause a bit of log spam when looking at large builds.

Oops, sorry for the late reply.

Yes, I kinda needed it for every link action. If I look at our build, there are lots and lots of small link actions, but towards the end of the build, there are only a few large ones, each with a different memory "profile". If I were to sample these, I'd lose valuable information at that stage.

Is this still a problem after @lberki 's change to decrease the logging level?

sluongng · 2023-11-28T09:31:17Z

@lberki @jmmv It seems this PR was missed during cherry-pick to 7.0?

Edit: nvm, it did make it in. Sorry for the noise. 🙇

comius · 2024-01-05T13:29:04Z

Update: this logging was removed in 9291ed0 (at head). Hopefully Bazel 7.0 gives you enough time to collect the necessary statistics, that can then be used on CppLinkAction in Starlark.

Unfortunately I had no easy way to keep supporting the loggint at head.

jmmv · 2024-01-19T04:26:08Z

@comius Thanks for the heads up. I wonder though... what's the plan to be able to measure this long-term? Modeling the behavior of the compiler and linker is not a one-time thing. The compiler and linker evolve over time, and Bazel supports many more than just one pair, so there will always be a need to evaluate their resource consumption in order to tweak Bazel's internal model...

comius · 2024-02-05T14:43:17Z

One possibility is to add logging for all spawn actions or if that’s too much implement a filter to log only a given set of mnemonics. cc @lberki wdyt?

lberki · 2024-02-06T11:38:00Z

I think that's very reasonable. Logging the estimate / consume resource use has value for everything, not just for C++ linking and honestly, the way we logged the data for C++ linking was kind of a wart. I could imagine emitting the same data on (e.g.) the BES so that people don't have to pore over the Java logs.

In general, moving all this to the SpawnAction level would be great.

jmmv requested review from lberki and oquenchil as code owners August 8, 2023 23:02

github-actions bot added awaiting-review PR is awaiting review from an assigned reviewer team-Rules-CPP Issues for C++ rules labels Aug 8, 2023

jmmv mentioned this pull request Aug 8, 2023

Computed C++ linker RAM is way off reality #17368

Open

oquenchil requested review from comius and buildbreaker2021 and removed request for oquenchil August 9, 2023 07:58

lberki reviewed Aug 9, 2023

View reviewed changes

Address review comments

ab31c6b

jmmv force-pushed the linker-inputs-size branch from a0a4874 to ab31c6b Compare August 9, 2023 18:00

lberki reviewed Aug 10, 2023

View reviewed changes

Address review comments

27994c4

lberki approved these changes Aug 11, 2023

View reviewed changes

lberki added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Aug 11, 2023

Fix tests

641c671

copybara-service bot closed this in 0a74086 Aug 17, 2023

github-actions bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Aug 17, 2023

meisterT reviewed Sep 12, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow CppLinkAction to inspect input sizes in resource estimation #19203

Allow CppLinkAction to inspect input sizes in resource estimation #19203

jmmv commented Aug 8, 2023

kjteske commented Aug 9, 2023

jmmv commented Aug 9, 2023

nikhilkalige commented Aug 9, 2023

lberki Aug 10, 2023

jmmv Aug 10, 2023

lberki Aug 11, 2023

iancha1992 commented Aug 11, 2023

jmmv commented Aug 11, 2023

lberki commented Sep 11, 2023

meisterT Sep 12, 2023

jmmv Oct 4, 2023

sluongng commented Nov 28, 2023 •

edited

Loading

comius commented Jan 5, 2024

jmmv commented Jan 19, 2024

comius commented Feb 5, 2024

lberki commented Feb 6, 2024

Allow CppLinkAction to inspect input sizes in resource estimation #19203

Allow CppLinkAction to inspect input sizes in resource estimation #19203

Conversation

jmmv commented Aug 8, 2023

kjteske commented Aug 9, 2023

jmmv commented Aug 9, 2023

nikhilkalige commented Aug 9, 2023

lberki Aug 10, 2023

Choose a reason for hiding this comment

jmmv Aug 10, 2023

Choose a reason for hiding this comment

lberki Aug 11, 2023

Choose a reason for hiding this comment

iancha1992 commented Aug 11, 2023

jmmv commented Aug 11, 2023

lberki commented Sep 11, 2023

meisterT Sep 12, 2023

Choose a reason for hiding this comment

jmmv Oct 4, 2023

Choose a reason for hiding this comment

sluongng commented Nov 28, 2023 • edited Loading

comius commented Jan 5, 2024

jmmv commented Jan 19, 2024

comius commented Feb 5, 2024

lberki commented Feb 6, 2024

sluongng commented Nov 28, 2023 •

edited

Loading