Use of --experimental_remote_cache_async results in invalid requests to reapi #15279
Labels
P1
I'll work on this now. (Assignee required)
team-Remote-Exec
Issues and PRs for the Execution (Remote) team
type: bug
Description of the bug:
Looks like when this newer experimental flag is in use the bazel client often writes and attempts to read 'empty' blobs (hash
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
, the shasum of an empty blob) that have a length other than zero in the BS resource identifier, eg.e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/103
which is of course is invalid (there's no way the empty/null hash is correct if the length is non zero).We are able to reliably to reproduce this behavior by building
//src:bazel
inbazelbuild/bazel
against both various deployments of our cache service as well as some of the open-source implementations such as bazel-remote and found the same issue universally.If I recall correctly this seemed to occur more often on linux than on macos, but that may have just been coincidence, I'll try to find some time to run this again on my MBP soon.
One additional insight: a member of my team took a look at the content of some of these invalid blobs and found that they contained a proto encoded message containing targets in the serialized data, for example
cat
ing the blob shows the contentsOf course there could be a valid intermediate action executed that's is outputting a proto encoded blob and this isn't surprising at all, but this doesn't really look like a
blob
we'd expect over the bytestream in most cases, so I thought it might be worth flagging 🤔 If someone recognizes this proto schema based on the above maybe it will point the investigation in the right direction, or maybe it's not a relevant factor at all 🤷What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
bazel build //src:bazel --remote_cache=grpcs://<some-cache> --experimental_remote_cache_async
; if you'd like to test against a local instance of bazel-remote, then something like so:During the first build you'll see output like
but a build against a clean empty cache should succeed.
The next builds against a populated cache will always fail, again typically with validation errors returned by the remote cache, like so:
and these subsequent builds of
//src:bazel
always seem to fail with this final error:Which operating system are you running Bazel on?
linux
What is the output of
bazel info release
?observed in 5.1.1, 5.0.0
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse master; git rev-parse HEAD
?Have you found anything relevant by searching the web?
Nothing out there
Any other information, logs, or outputs that you want to share?
here's a bit more output (some of these server-side validation messages are coming from bazel-remote, but still relevant as this behavior can be observed on any cache backend implementation)
The text was updated successfully, but these errors were encountered: