Allow use scrubbing with remote execution (build actions with scrubbed inputs locally) #20070

bozaro · 2023-11-07T14:06:21Z

Since cannot run remote cache key scrubbing action with remote execution, run this actions only locally. Actions without scrubbed input can still be executed remotely.

Same idea as:

Add experimental_remote_cache_key_ignore_stamping to skip volatile stamping files on compute shared cache key #16240

tjgq · 2023-11-07T18:11:08Z

src/main/java/com/google/devtools/build/lib/remote/RemoteExecutionService.java

@@ -341,7 +341,8 @@ public CachePolicy getWriteCachePolicy(Spawn spawn) {
  public boolean mayBeExecutedRemotely(Spawn spawn) {
    return remoteCache instanceof RemoteExecutionCache
        && remoteExecutor != null
-        && Spawns.mayBeExecutedRemotely(spawn);
+        && Spawns.mayBeExecutedRemotely(spawn)
+        && !hasScrubbedInput(spawn, scrubber);


I think it makes sense to allow remote execution for non-scrubbed actions, but I have some concerns with this implementation:

In addition to inputs, we might also scrub command line arguments and set a scrubbing salt, so this check is incomplete.

Expanding/iterating the inputs is a fairly expensive operation, and with this change we're going to do it twice for every matching action (first time here, second time when computing the actual cache key). I'd prefer to not make the decision to run remotely conditional on the input expansion.

Can we get away with simply scrubber.forSpawn(spawn) == null here? Scrubber#forSpawn is defined to return non-null iff at least one of the configuration rules matches the action; this is, strictly speaking, not a guarantee that scrubbing will occur (applying the respective transform might result in zero changes) but that ought to be the case with a carefully written configuration.

Can we get away with simply scrubber.forSpawn(spawn) == null here?

No, looks like I have non null spawnScrubber for every target with files input with configuration like:

rules { transform { omitted_inputs: "^bazel-out/volatile-status\\.txt$" } }

Looks like I can create rules filter by set label or mnemonic pattern. But collecting these rules will be a very difficult task:

most stamping actions are declared in third-party code;

in case of an error, I get a quiet cache miss.

In the use case for stamping that I'm most familiar with (build practices at Google), there's a relatively small number of targets that perform stamping - typically only one per language/ruleset. Each of these targets creates a single language-specific library embedding the workspace status, which may then get linked into every binary (or other top-level deployable artifact) for that language. This way, only a few, easy to identify targets depend on the workspace status files.

It sounds like your use of stamping is much more widespread, and can affect an unbounded number of targets, as opposed to a relatively small number of relatively "well-known" ones. Could you provide some more information on how your setup uses stamping?

Also - are you looking for a solution specifically for volatile-status.txt, or does this have to generalize to any input file? For example, suppose that action A consumes volatile-status.txt and produces an arbitrary file a.out (incorporating information from volatile-status.txt). Then a later action B consumes a.out. Would you then also need to scrub a.out?

Then a later action B consumes a.out. Would you then also need to scrub a.out?

Nope. Only volatile-status.txt.

We are trying to use stamping only for the top-level artifacts:

for executable files;

for docker images.

There are really few such goals.

The purpose of stamping is to save information in this file about which revision the file was compiled from. To do this, the revision number is transmitted via volatile-status.txt.

The problem is that stamping is enabled not for concrete targets, but globally via the --stamp flag. In some cases, it can affect, for example, on bison binary for generate yacc files in one of the third-party libraries we use. If for this target volatile-status.txt becomes significant, then we will get a cache miss for this and all dependent targets on every build. In this case, we get parasitic pressure on the cache, which is quite difficult to detect.

Maybe, before implementing scrubbing support with remote build, it makes sense to do the following:

use scrubber.forSpawn(spawn) == null cheap check for `mayBeExecutedRemotely' by default;

use an expensive check with iteration over files enabling by the experimental flag like --experimental_scrubbing-remote-build-expensive-check (or create enum like --experimental_scrubbing-with-remote-build with values deny, unmatched-local, unchanged-local).

Alright, let's go through the problems one by one:

Bazel 6 stamping vs remote cache/remote build

I think this problem is easily solvable. If we implement the solution I outlined above, it becomes possible to write a single rule of the form:

rule { matcher { has_volatile_input: true } transform { omitted_inputs: "^bazel-out/volatile-status\\.txt$" } }

and it will apply to every action that has volatile-status.txt in its inputs, without having to list those actions explicitly in the configuration. (There's a little bit of redundancy between has_volatile_input and omitted_inputs, but it's intentional: separating the matcher and the transform brings clarity and is a property I'd like to keep.)

However, the configuration above will not apply to an action that produces an output derived from volatile-status.txt (that's what the next problem is about).

Bazel 6 stamping is broken

It's unclear to me whether this is a bug in how stamping works, or an intended limitation. I don't mean to be dismissive; all I'm saying is that I didn't design the stamping feature and haven't used it all that much, so I can't authoritatively declare that there is a bug, and much less that cache key scrubbing is the correct way to solve it.

My suggestion here is to open a separate issue describing the limitations you see in stamping as it's currently implemented, so that people more knowledgeable about that part of Bazel can help brainstorm potential solutions. If we determine that scrubbing is the appropriate way to solve it, I'm happy to resume the search for a reasonable implementation; but I'd first like to be sure that there aren't simpler solutions.

Bazel 7 scrubbing configuration limitations

I completely agree with your assessment: scrubbing configurations are both hard to write and hard to debug.

The "hard to debug" is somewhat easier to solve: we need good tools to investigate the root cause of cache misses. We're currently trying to make some improvements in that direction in #18643, but there's still work to do.

The "hard to write" problem, as you correctly surmise, doesn't have a clear solution in sight. I share your sentiment that the better design entails having the rules themselves provide the scrubbing configuration to Bazel, so that users aren't forced to understand implementation details just so they can write the configuration. But we don't want rules to unilaterally decide that scrubbing should be used, either: scrubbing trades off correctness for performance, and Bazel should strive for correctness by default; some amount of user input would still be required in that world.

In conclusion

I don't want this discussion to drag on forever. I offer the following: if we agree that the first of these problems (scrubbing triggered by the presence of volatile-status.txt) is worth solving on its own, I volunteer to (re-)write the PR myself and get it submitted. I will tentatively say that there's still time to do this before 7.1.0, but it might slip to 7.2.0 since I have other things on my plate at the moment. At the same time, I encourage you to start a separate issue/discussion regarding stamping of non-toplevel targets. Does that sound good to you?

If you add the matcher has_volatile_input and allow remote assembly for all rules that do not use scrubbing, then this should completely cover the problem of using stamping from Bazel 6:

there will be one rule that allows local assembly of stamping action;

this rule will behave the same as the local Bazel cache.

@tjgq, I have moved the changes about executing scrubbed actions locally to PR #21288 (check only by scrubber.forSpawn(spawn) != null without checking the affected input files).

I tried adding `has_volatile_input' but failed: this requires strong refactoring.

Thanks, I will import #21288 today.

The has_volatile_input implementation is still on me, as promised above; I'm hoping to get to it next week.

I had some time this week to chat with other folks on the team about stamping, and to think about how this feature might be implemented. Unfortunately, the conclusion that I came to is that I don't know how to implement this in a satisfactory way, and I really don't want to rush adding a feature we might later regret. So I'm going to put this on hold at least until 7.2.0. Sorry.

The internal discussions I had also reinforced a point that I made earlier in this thread: if you have so many targets consuming volatile-info.txt that it's difficult to explicitly enumerate all of them in the scrubbing config, you're likely doing something wrong (at least according to the way we originally designed stamping to be used). I'd expect you to have at most one target per language that consumes volatile-info.txt and produces a wrapper library around its contents suitable for that language; your top-level binary targets should access the volatile information through that library, and shouldn't themselves depend on volatile-info.txt directly. Thus, you only have to configure scrubbing for the small number of targets that generate the per-language wrapper libraries, which I don't think is an unreasonable thing to ask users to do.

I still haven't read a convincing explanation for why your build graph can't be organized in this way.

bozaro · 2024-01-11T05:15:25Z

This PR works fine on release-7.0.0rc2 and bozaro:scrubbing-with-remote-execution branches, but on 7.0.0 tag it's requre patch like joomcode@663beb6 (produced by cherry-pick 2b700b5 after bisect between 7.0.0 and bozaro:scrubbing-with-remote-execution).

The check seems to ignore transitive dependencies:

    var inputFiles = spawn.getInputFiles();
    for (ActionInput inputFile : inputFiles.getLeaves()) {
      if (spawnScrubber.shouldOmitInput(inputFile)) {
        return true;
      }
    }

bozaro · 2024-01-11T08:29:11Z

I am sorry... Rebasing to 7.0.0 tag was a bad idea...

…on is enabled. As discussed in #20070, it's sometimes useful to scrub actions in a build that is otherwise remote. Closes #21288. PiperOrigin-RevId: 607723207 Change-Id: I31403de74fb81d07aac765cca68b2991b0230496

…on is enabled. As discussed in bazelbuild#20070, it's sometimes useful to scrub actions in a build that is otherwise remote. Closes bazelbuild#21288. PiperOrigin-RevId: 607723207 Change-Id: I31403de74fb81d07aac765cca68b2991b0230496

… execution is enabled. (#21384) As discussed in #20070, it's sometimes useful to scrub actions in a build that is otherwise remote. Closes #21288. Commit 8e0343c PiperOrigin-RevId: 607723207 Change-Id: I31403de74fb81d07aac765cca68b2991b0230496 Co-authored-by: Artem V. Navrotskiy <bozaro@yandex.ru>

bozaro requested a review from a team as a code owner November 7, 2023 14:06

github-actions bot added awaiting-review PR is awaiting review from an assigned reviewer team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Nov 7, 2023

bozaro changed the title ~~Allow locally build actions with scrubbed inputs~~ Allow use scrubbing with remote execution (build actions with scrubbed inputs locally) Nov 7, 2023

bozaro mentioned this pull request Nov 7, 2023

Add experimental_remote_cache_key_ignore_stamping to skip volatile stamping files on compute shared cache key #16240

Closed

tjgq reviewed Nov 7, 2023

View reviewed changes

Allow locally build actions with scrubbed inputs

981d219

bozaro force-pushed the scrubbing-with-remote-execution branch from c6920d6 to 981d219 Compare November 8, 2023 06:35

fixup! Allow locally build actions with scrubbed inputs

d8e9a3b

bozaro requested a review from tjgq November 9, 2023 06:14

fixup! Allow locally build actions with scrubbed inputs

0dacc65

bozaro force-pushed the scrubbing-with-remote-execution branch from 0dacc65 to 74213f2 Compare January 11, 2024 08:27

bozaro requested review from a team, lberki, brandjon, tetromino, oquenchil, gregestren, fweikert, Wyverald, meteorcloudy, ahumesky, ted-xie and kevin1e100 as code owners January 11, 2024 08:27

bozaro force-pushed the scrubbing-with-remote-execution branch from 74213f2 to 0dacc65 Compare January 11, 2024 08:28

bozaro requested a review from fmeum January 11, 2024 08:34

bozaro mentioned this pull request Jan 15, 2024

Looks like volatile-status.txt content affects the caching key on shared cache #16231

Open

bozaro mentioned this pull request Feb 10, 2024

Allow locally build actions with scrubber #21288

Closed

bazel-io mentioned this pull request Feb 16, 2024

[7.1.0] Let scrubbed actions fall back to local execution when remote execution is enabled. #21384

Merged

tetromino removed their request for review March 6, 2024 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow use scrubbing with remote execution (build actions with scrubbed inputs locally) #20070

Allow use scrubbing with remote execution (build actions with scrubbed inputs locally) #20070

bozaro commented Nov 7, 2023

tjgq Nov 7, 2023

bozaro Nov 8, 2023 •

edited

tjgq Nov 9, 2023

bozaro Nov 10, 2023

bozaro Nov 10, 2023 •

edited

tjgq Jan 29, 2024 •

edited

bozaro Jan 31, 2024

bozaro Feb 15, 2024

tjgq Feb 15, 2024

tjgq Feb 22, 2024

bozaro commented Jan 11, 2024 •

edited

bozaro commented Jan 11, 2024 •

edited

Allow use scrubbing with remote execution (build actions with scrubbed inputs locally) #20070

Are you sure you want to change the base?

Allow use scrubbing with remote execution (build actions with scrubbed inputs locally) #20070

Conversation

bozaro commented Nov 7, 2023

tjgq Nov 7, 2023

Choose a reason for hiding this comment

bozaro Nov 8, 2023 • edited

Choose a reason for hiding this comment

tjgq Nov 9, 2023

Choose a reason for hiding this comment

bozaro Nov 10, 2023

Choose a reason for hiding this comment

bozaro Nov 10, 2023 • edited

Choose a reason for hiding this comment

tjgq Jan 29, 2024 • edited

Choose a reason for hiding this comment

bozaro Jan 31, 2024

Choose a reason for hiding this comment

bozaro Feb 15, 2024

Choose a reason for hiding this comment

tjgq Feb 15, 2024

Choose a reason for hiding this comment

tjgq Feb 22, 2024

Choose a reason for hiding this comment

bozaro commented Jan 11, 2024 • edited

bozaro commented Jan 11, 2024 • edited

bozaro Nov 8, 2023 •

edited

bozaro Nov 10, 2023 •

edited

tjgq Jan 29, 2024 •

edited

bozaro commented Jan 11, 2024 •

edited

bozaro commented Jan 11, 2024 •

edited