Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of --experimental_remote_cache_async results in invalid requests to reapi #15279

Open
zachgrayio opened this issue Apr 18, 2022 · 2 comments
Assignees
Labels
P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@zachgrayio
Copy link

Description of the bug:

Looks like when this newer experimental flag is in use the bazel client often writes and attempts to read 'empty' blobs (hash e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, the shasum of an empty blob) that have a length other than zero in the BS resource identifier, eg. e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/103 which is of course is invalid (there's no way the empty/null hash is correct if the length is non zero).

We are able to reliably to reproduce this behavior by building //src:bazel in bazelbuild/bazel against both various deployments of our cache service as well as some of the open-source implementations such as bazel-remote and found the same issue universally.

If I recall correctly this seemed to occur more often on linux than on macos, but that may have just been coincidence, I'll try to find some time to run this again on my MBP soon.

One additional insight: a member of my team took a look at the content of some of these invalid blobs and found that they contained a proto encoded message containing targets in the serialized data, for example cating the blob shows the contents

5//src/main/java/com/google/devtools/build/lib/util:os⏎ 

Of course there could be a valid intermediate action executed that's is outputting a proto encoded blob and this isn't surprising at all, but this doesn't really look like a blob we'd expect over the bytestream in most cases, so I thought it might be worth flagging 🤔 If someone recognizes this proto schema based on the above maybe it will point the investigation in the right direction, or maybe it's not a relevant factor at all 🤷

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

bazel build //src:bazel --remote_cache=grpcs://<some-cache> --experimental_remote_cache_async; if you'd like to test against a local instance of bazel-remote, then something like so:

mkdir ~/br; docker run -u 1000:1000 -v ~/br:/data -p 9090:8080 -p 9092:9092 buchgr/bazel-remote-cache &
bazel clean && bazel build //src:bazel --remote_cache=grpc://localhost:9092 --experimental_remote_cache_async
bazel clean && bazel build //src:bazel --remote_cache=grpc://localhost:9092 --experimental_remote_cache_async

During the first build you'll see output like

WARNING: Remote Cache: Error while uploading artifact with digest 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/72'
WARNING: Remote Cache: Error while uploading artifact with digest 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/4096'
WARNING: Remote Cache: INVALID_ARGUMENT: Invalid zero-length SHA256 hash

but a build against a clean empty cache should succeed.

The next builds against a populated cache will always fail, again typically with validation errors returned by the remote cache, like so:

WARNING: Remote Cache: INVALID_ARGUMENT: Invalid zero-length SHA256 hash

and these subsequent builds of //src:bazel always seem to fail with this final error:

ERROR: /home/user/code/repos/bazel/src/main/java/com/google/devtools/build/lib/skyframe/serialization/BUILD:43:13: Building src/main/java/com/google/devtools/build/lib/skyframe/serialization/libconstants.jar (1 source file) failed: (Exit 1): java failed: error executing command external/remotejdk11_linux/bin/java -XX:-CompactStrings '--add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED' ... (remaining 17 arguments skipped)
Target //src:bazel failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 6.487s, Critical Path: 4.48s
INFO: 1703 processes: 1626 remote cache hit, 75 internal, 1 linux-sandbox, 1 local.
FAILED: Build did NOT complete successfully

Which operating system are you running Bazel on?

linux

What is the output of bazel info release?

observed in 5.1.1, 5.0.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

What's the output of `git remote get-url origin; git rev-parse master; git rev-parse HEAD` ?

Have you found anything relevant by searching the web?

Nothing out there

Any other information, logs, or outputs that you want to share?

here's a bit more output (some of these server-side validation messages are coming from bazel-remote, but still relevant as this behavior can be observed on any cache backend implementation)

FAILED: Build did NOT complete successfully
2022/04/18 23:34:35 GRPC BYTESTREAM WRITE FAILED: uploads/227cf6b2-7978-4b71-a0ab-fab4a7e37949/blobs/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/62 Cache Put failed: checksums don't match. Expected e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, found d3e644654dddf4f6488a3a802fe329ee7c81aff66dafa3b0ac3886859303722f
2022/04/18 23:34:35 GRPC BYTESTREAM WRITE FAILED: uploads/5650826d-26e8-44f1-997c-e862f58ec52f/blobs/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/69 Cache Put failed: checksums don't match. Expected e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, found fe873fb0e89ac725381987607bfffde96a25e4890d8492a93f7b551d3ccfde63
2022/04/18 23:34:35 GRPC BYTESTREAM WRITE FAILED: uploads/d84731b2-2f6f-49d1-8ac9-32a7c6f7381d/blobs/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/78 Cache Put failed: checksums don't match. Expected e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, found 407d08d9e790e233d27503024e8eae4adaaeae984b63f8465c4e32b34127976a
2022/04/18 23:34:35 GRPC BYTESTREAM WRITE FAILED: uploads/ab31e52b-c38a-482d-8602-2e3fe9fe1449/blobs/37446575700829a11278ad3a550f244f45d5ae4fe1552778fa4f041f9eaeecf6/103 Cache Put failed: checksums don't match. Expected 37446575700829a11278ad3a550f244f45d5ae4fe1552778fa4f041f9eaeecf6, found ee402977e3caf5FAILED: Build did NOT complete successfully
Waiting for remote cache: 4 uploads
2022/04/18 23:34:36 GRPC BYTESTREAM WRITE FAILED: uploads/227cf6b2-7978-4b71-a0ab-fab4a7e37949/blobs/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495WARNING: Remote Cache: Error while uploading artifact with digest 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/62'
FAILED: Build did NOT complete successfully
Waiting for remote cache: 4 uploads
2022/04/18 23:34:36 GRPC BYTESTREAM WRITE FAILED: uploads/5650826d-26e8-44f1-997c-e862f58ec52f/blobs/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495WARNING: Remote Cache: Error while uploading artifact with digest 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/69'
FAILED: Build did NOT complete successfully
Waiting for remote cache: 3 uploads
2022/04/18 23:34:37 GRPC BYTESTREAM WRITE FAILED: uploads/d84731b2-2f6f-49d1-8ac9-32a7c6f7381d/blobs/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495WARNING: Remote Cache: Error while uploading artifact with digest 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/78'
FAILED: Build did NOT complete successfully
Waiting for remote cache: 2 uploads
2022/04/18 23:34:37 GRPC BYTESTREAM WRITE FAILED: uploads/ab31e52b-c38a-482d-8602-2e3fe9fe1449/blobs/37446575700829a11278ad3a550f244f45d5ae4fe1552778fa4fWARNING: Remote Cache: Error while uploading artifact with digest '37446575700829a11278ad3a550f244f45d5ae4fe1552778fa4f041f9eaeecf6/103'
FAILED: Build did NOT complete successfully
@brentleyjones
Copy link
Contributor

@bazelbuild/remote-execution

@sgowroji sgowroji added type: bug untriaged team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Apr 19, 2022
@coeuvre coeuvre added P1 I'll work on this now. (Assignee required) and removed untriaged labels Apr 19, 2022
@chiragramani
Copy link
Contributor

Are we also seeing this issue on Bazel 6.2.1? just curious to know the current status of this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

No branches or pull requests

5 participants