Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore AC if any of the outputs is missing from the CAS when building without the bytes #16656

Closed
wants to merge 1 commit into from

Conversation

coeuvre
Copy link
Member

@coeuvre coeuvre commented Nov 4, 2022

Currently, when checking for the remote cache, we only check the whether an AC exists and its exit_code is 0 to decided whether we hit the cache.

This is fine for the normal mode because we immediately download all the outputs after hit the cache. If any of them is missing, we can detect the case, ignore the AC and continue to run the spawn.

However, for Build without the Bytes, we don't download the outputs until Bazel needs them which could happens later in the build. If the output is missing, we cannot rerun the generating action (unless with action rewinding which isn't available yet).

This PR fixes that by checking whether outputs are missing from CAS using findMissingBlobs when checking the AC for Build without the Bytes. If any of the outputs is missing, we ignore the AC and continue to run the spawn.

Note that, HttpCache currently doesn't know support findMissingBlobs so the check is disabled when using HttpCache.

Working towards #10880.

@coeuvre coeuvre changed the title Ignore AC if any of the outputs is missing in the CAS when building without the bytes Ignore AC if any of the outputs is missing from the CAS when building without the bytes Nov 4, 2022
@brentleyjones
Copy link
Contributor

So all remote caches that support BwtB do this already, on their end where there doesn't need to be a round trip, the lookups can be cached, etc.

So two comments:

  • ideally the remote cache would know that bazel is doing this so they can stop making the duplicated optimization on their end. I wonder if some gRPC metadata can be sent or something?
  • but also maybe we can have a way of disabling this if we know the remote cache will only return AC results that have corresponding CAS items? Maybe the remote cache can return a capability?

Just trying to make sure we can stay as performant as possible here.

@brentleyjones
Copy link
Contributor

Of note, this fixes issues with the disk_cache where people delete old items, but not all items. So even if we went with option 2 above, the disk_cache would need to have FindMissingBlobs called on it.

@coeuvre
Copy link
Member Author

coeuvre commented Nov 4, 2022

So all remote caches that support BwtB do this already, on their end where there doesn't need to be a round trip, the lookups can be cached, etc.

The protocol doesn't force it but suggests to do that.

// Implementations SHOULD ensure that any blobs referenced from the
// [ContentAddressableStorage][build.bazel.remote.execution.v2.ContentAddressableStorage]
// are available at the time of returning the
// [ActionResult][build.bazel.remote.execution.v2.ActionResult] and will be
// for some period of time afterwards. The lifetimes of the referenced blobs SHOULD be increased
// if necessary and applicable.

So if all remote caches do that, I am wondering whether this PR is necessary.

For disk cache, we can just do something similar inside getActionResult.

@brentleyjones
Copy link
Contributor

If a remote cache doesn't do this, and it evicts CAS items, it breaks with BwtB right now. And maybe there are some remote caches out there that are waiting for this change.

So I think Bazel should do it in order to more broadly support remote caches, but ideally the user can disable it, and also have Bazel tell the remote cache if it's doing it or not, so the remote can optimize around it.

@brentleyjones
Copy link
Contributor

and also have Bazel tell the remote cache if it's doing it or not

This is key, because for non-BwtB builds, ideally the remote cache could stop checking CAS items before return an AC result. Right now these caches are doing extra work for non-BwtB builds, which could then be avoided.

@coeuvre
Copy link
Member Author

coeuvre commented Nov 4, 2022

How about we guard this check behind a flag and disable it by default? The idea is Bazel assume the remote cache will do the check (as the protocol suggests). For users who connect to a remote cache that doesn't do the check, they can enable the check at Bazel side manually.

@coeuvre
Copy link
Member Author

coeuvre commented Nov 4, 2022

and also have Bazel tell the remote cache if it's doing it or not

I guess this requires a protocol change?

@brentleyjones
Copy link
Contributor

Possibly. Unless there is a way to signal it without violating the current protocol.

@coeuvre
Copy link
Member Author

coeuvre commented Nov 8, 2022

We agreed in an discussion that the SHOULD is strong enough for Bazel to assume that the server will make sure the referenced blobs are available at the time of returning the AC.

I will create another PR to update disk cache. Closing this now.

@coeuvre coeuvre closed this Nov 8, 2022
@ShreeM01 ShreeM01 added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants