Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow use scrubbing with remote execution (build actions with scrubbed inputs locally) #20070

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

bozaro
Copy link
Contributor

@bozaro bozaro commented Nov 7, 2023

Since cannot run remote cache key scrubbing action with remote execution, run this actions only locally. Actions without scrubbed input can still be executed remotely.

Same idea as:

@bozaro bozaro requested a review from a team as a code owner November 7, 2023 14:06
@github-actions github-actions bot added awaiting-review PR is awaiting review from an assigned reviewer team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Nov 7, 2023
@bozaro bozaro changed the title Allow locally build actions with scrubbed inputs Allow use scrubbing with remote execution (build actions with scrubbed inputs locally) Nov 7, 2023
@@ -341,7 +341,8 @@ public CachePolicy getWriteCachePolicy(Spawn spawn) {
public boolean mayBeExecutedRemotely(Spawn spawn) {
return remoteCache instanceof RemoteExecutionCache
&& remoteExecutor != null
&& Spawns.mayBeExecutedRemotely(spawn);
&& Spawns.mayBeExecutedRemotely(spawn)
&& !hasScrubbedInput(spawn, scrubber);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to allow remote execution for non-scrubbed actions, but I have some concerns with this implementation:

  1. In addition to inputs, we might also scrub command line arguments and set a scrubbing salt, so this check is incomplete.
  2. Expanding/iterating the inputs is a fairly expensive operation, and with this change we're going to do it twice for every matching action (first time here, second time when computing the actual cache key). I'd prefer to not make the decision to run remotely conditional on the input expansion.

Can we get away with simply scrubber.forSpawn(spawn) == null here? Scrubber#forSpawn is defined to return non-null iff at least one of the configuration rules matches the action; this is, strictly speaking, not a guarantee that scrubbing will occur (applying the respective transform might result in zero changes) but that ought to be the case with a carefully written configuration.

Copy link
Contributor Author

@bozaro bozaro Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get away with simply scrubber.forSpawn(spawn) == null here?

No, looks like I have non null spawnScrubber for every target with files input with configuration like:

rules {
  transform {
    omitted_inputs: "^bazel-out/volatile-status\\.txt$"
  }
}

Looks like I can create rules filter by set label or mnemonic pattern. But collecting these rules will be a very difficult task:

  • most stamping actions are declared in third-party code;
  • in case of an error, I get a quiet cache miss.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the use case for stamping that I'm most familiar with (build practices at Google), there's a relatively small number of targets that perform stamping - typically only one per language/ruleset. Each of these targets creates a single language-specific library embedding the workspace status, which may then get linked into every binary (or other top-level deployable artifact) for that language. This way, only a few, easy to identify targets depend on the workspace status files.

It sounds like your use of stamping is much more widespread, and can affect an unbounded number of targets, as opposed to a relatively small number of relatively "well-known" ones. Could you provide some more information on how your setup uses stamping?

Also - are you looking for a solution specifically for volatile-status.txt, or does this have to generalize to any input file? For example, suppose that action A consumes volatile-status.txt and produces an arbitrary file a.out (incorporating information from volatile-status.txt). Then a later action B consumes a.out. Would you then also need to scrub a.out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then a later action B consumes a.out. Would you then also need to scrub a.out?

Nope. Only volatile-status.txt.

We are trying to use stamping only for the top-level artifacts:

  • for executable files;
  • for docker images.

There are really few such goals.

The purpose of stamping is to save information in this file about which revision the file was compiled from. To do this, the revision number is transmitted via volatile-status.txt.

The problem is that stamping is enabled not for concrete targets, but globally via the --stamp flag. In some cases, it can affect, for example, on bison binary for generate yacc files in one of the third-party libraries we use. If for this target volatile-status.txt becomes significant, then we will get a cache miss for this and all dependent targets on every build. In this case, we get parasitic pressure on the cache, which is quite difficult to detect.

Copy link
Contributor Author

@bozaro bozaro Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, before implementing scrubbing support with remote build, it makes sense to do the following:

  • use scrubber.forSpawn(spawn) == null cheap check for `mayBeExecutedRemotely' by default;
  • use an expensive check with iteration over files enabling by the experimental flag like --experimental_scrubbing-remote-build-expensive-check (or create enum like --experimental_scrubbing-with-remote-build with values deny, unmatched-local, unchanged-local).

Copy link
Contributor

@tjgq tjgq Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, let's go through the problems one by one:

Bazel 6 stamping vs remote cache/remote build

I think this problem is easily solvable. If we implement the solution I outlined above, it becomes possible to write a single rule of the form:

rule {
  matcher {
    has_volatile_input: true
  }
  transform {
    omitted_inputs: "^bazel-out/volatile-status\\.txt$"
  }
}

and it will apply to every action that has volatile-status.txt in its inputs, without having to list those actions explicitly in the configuration. (There's a little bit of redundancy between has_volatile_input and omitted_inputs, but it's intentional: separating the matcher and the transform brings clarity and is a property I'd like to keep.)

However, the configuration above will not apply to an action that produces an output derived from volatile-status.txt (that's what the next problem is about).

Bazel 6 stamping is broken

It's unclear to me whether this is a bug in how stamping works, or an intended limitation. I don't mean to be dismissive; all I'm saying is that I didn't design the stamping feature and haven't used it all that much, so I can't authoritatively declare that there is a bug, and much less that cache key scrubbing is the correct way to solve it.

My suggestion here is to open a separate issue describing the limitations you see in stamping as it's currently implemented, so that people more knowledgeable about that part of Bazel can help brainstorm potential solutions. If we determine that scrubbing is the appropriate way to solve it, I'm happy to resume the search for a reasonable implementation; but I'd first like to be sure that there aren't simpler solutions.

Bazel 7 scrubbing configuration limitations

I completely agree with your assessment: scrubbing configurations are both hard to write and hard to debug.

The "hard to debug" is somewhat easier to solve: we need good tools to investigate the root cause of cache misses. We're currently trying to make some improvements in that direction in #18643, but there's still work to do.

The "hard to write" problem, as you correctly surmise, doesn't have a clear solution in sight. I share your sentiment that the better design entails having the rules themselves provide the scrubbing configuration to Bazel, so that users aren't forced to understand implementation details just so they can write the configuration. But we don't want rules to unilaterally decide that scrubbing should be used, either: scrubbing trades off correctness for performance, and Bazel should strive for correctness by default; some amount of user input would still be required in that world.

In conclusion

I don't want this discussion to drag on forever. I offer the following: if we agree that the first of these problems (scrubbing triggered by the presence of volatile-status.txt) is worth solving on its own, I volunteer to (re-)write the PR myself and get it submitted. I will tentatively say that there's still time to do this before 7.1.0, but it might slip to 7.2.0 since I have other things on my plate at the moment. At the same time, I encourage you to start a separate issue/discussion regarding stamping of non-toplevel targets. Does that sound good to you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add the matcher has_volatile_input and allow remote assembly for all rules that do not use scrubbing, then this should completely cover the problem of using stamping from Bazel 6:

  • there will be one rule that allows local assembly of stamping action;
  • this rule will behave the same as the local Bazel cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjgq, I have moved the changes about executing scrubbed actions locally to PR #21288 (check only by scrubber.forSpawn(spawn) != null without checking the affected input files).

I tried adding `has_volatile_input' but failed: this requires strong refactoring.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will import #21288 today.

The has_volatile_input implementation is still on me, as promised above; I'm hoping to get to it next week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some time this week to chat with other folks on the team about stamping, and to think about how this feature might be implemented. Unfortunately, the conclusion that I came to is that I don't know how to implement this in a satisfactory way, and I really don't want to rush adding a feature we might later regret. So I'm going to put this on hold at least until 7.2.0. Sorry.

The internal discussions I had also reinforced a point that I made earlier in this thread: if you have so many targets consuming volatile-info.txt that it's difficult to explicitly enumerate all of them in the scrubbing config, you're likely doing something wrong (at least according to the way we originally designed stamping to be used). I'd expect you to have at most one target per language that consumes volatile-info.txt and produces a wrapper library around its contents suitable for that language; your top-level binary targets should access the volatile information through that library, and shouldn't themselves depend on volatile-info.txt directly. Thus, you only have to configure scrubbing for the small number of targets that generate the per-language wrapper libraries, which I don't think is an unreasonable thing to ask users to do.

I still haven't read a convincing explanation for why your build graph can't be organized in this way.

@bozaro bozaro force-pushed the scrubbing-with-remote-execution branch from c6920d6 to 981d219 Compare November 8, 2023 06:35
@bozaro bozaro requested a review from tjgq November 9, 2023 06:14
@bozaro
Copy link
Contributor Author

bozaro commented Jan 11, 2024

This PR works fine on release-7.0.0rc2 and bozaro:scrubbing-with-remote-execution branches, but on 7.0.0 tag it's requre patch like joomcode@663beb6 (produced by cherry-pick 2b700b5 after bisect between 7.0.0 and bozaro:scrubbing-with-remote-execution).

The check seems to ignore transitive dependencies:

    var inputFiles = spawn.getInputFiles();
    for (ActionInput inputFile : inputFiles.getLeaves()) {
      if (spawnScrubber.shouldOmitInput(inputFile)) {
        return true;
      }
    }

@bozaro
Copy link
Contributor Author

bozaro commented Jan 11, 2024

I am sorry... Rebasing to 7.0.0 tag was a bad idea...

copybara-service bot pushed a commit that referenced this pull request Feb 16, 2024
…on is enabled.

As discussed in #20070, it's sometimes useful to scrub actions in a build that is otherwise remote.

Closes #21288.

PiperOrigin-RevId: 607723207
Change-Id: I31403de74fb81d07aac765cca68b2991b0230496
bazel-io pushed a commit to bazel-io/bazel that referenced this pull request Feb 16, 2024
…on is enabled.

As discussed in bazelbuild#20070, it's sometimes useful to scrub actions in a build that is otherwise remote.

Closes bazelbuild#21288.

PiperOrigin-RevId: 607723207
Change-Id: I31403de74fb81d07aac765cca68b2991b0230496
github-merge-queue bot pushed a commit that referenced this pull request Feb 20, 2024
… execution is enabled. (#21384)

As discussed in #20070, it's
sometimes useful to scrub actions in a build that is otherwise remote.

Closes #21288.

Commit
8e0343c

PiperOrigin-RevId: 607723207
Change-Id: I31403de74fb81d07aac765cca68b2991b0230496

Co-authored-by: Artem V. Navrotskiy <bozaro@yandex.ru>
@tetromino tetromino removed their request for review March 6, 2024 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-review PR is awaiting review from an assigned reviewer team-Remote-Exec Issues and PRs for the Execution (Remote) team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants