Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--experimental_remote_cache_ttl=0s causes --remote_download_outputs=toplevel (Build without the Bytes) to rerun actions #22592

Closed
dws opened this issue May 30, 2024 · 2 comments
Assignees
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: support / not a bug (process) untriaged

Comments

@dws
Copy link
Contributor

dws commented May 30, 2024

Description of the bug:

When using remote execution in conjunction with both --experimental_remote_cache_ttl=0s and --remote_download_outputs=toplevel (Build without the Bytes), Bazel will rebuild certain targets that it ought to consider to be up to date.

Which category does this issue belong to?

Remote Execution

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Instructions

First, download and unpack repro.tar.gz. This will yield the following files:

.bazelrc
BUILD.bazel
MODULE.bazel
README.adoc
WORKSPACE.bazel
repro-docker.sh
repro.cpp.in
repro.sh
tools/status.sh

Here is how to use these files to see whether a given Bazel binary
exhibits a problem with excessive rebuilding, where Bazel rebuilds a
target that it ought to consider to be already up to date.

System Requirements

This reproducer has been tried on x86_64 Ubuntu 20.04 with Bazels
version 6.2.0 and up.

This reproducer assumes that you have a Remote Execution service
available at localhost:8980. You can set one up by downloading

https://github.com/bazelbuild/bazel-buildfarm/archive/refs/tags/2.10.0.tar.gz

unpacking it, and in the unpacked directory, running

./examples/bf-run start

This requires Docker to be avaliable on the system. See
https://bazelbuild.github.io/bazel-buildfarm/docs/quick_start/ for more.

Running The Reproducer

In order to run the reproducer with a given Bazel binary, run

./repro.sh <path-to-bazel-binary>

If the reproducer prints PASS and exits with status 0, then the Bazel binary
is good and does not rebuild a target that is already up to date.

If the reproducer prints FAIL and exits with a nonzero status, then the
Bazel binary is bad and rebuilds a target that is already up to date.

Running The Reproducer Under Docker

To be extra careful, you can run the reproducer in a Docker environment
that matches the environment that the local Buildfarm worker is running
in. In order to do this, run

./repro-docker.sh <path-to-bazel-binary>

This arranges to run ./repro.sh with the given Bazel binary in a
transient Docker container created from the same image used to run the
local Buildfarm worker.

Results

This reproducer identifies Bazel commit 1ebb04b as introducing this
problem. This commit introduced the --experimental_remote_cache_ttl
flag.

Which operating system are you running Bazel on?

Ubuntu 20.04

What is the output of bazel info release?

release 7.2.0rc2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

Yes, this is a regression from Bazel 6. Commit 1ebb04b introduced the problem. The problem exists in all Bazel 7 releases to date, from 7.0.0 through 7.2.0rc2.

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

When the reproducer above exhibits the problem, Bazel's --explain output is:

Executing action 'Executing genrule //:repro_cpp': One of the files has changed.
Executing action 'Compiling repro.cpp': One of the files has changed.

These messages would possibly be more helpful if they said which file had changed,
because in this case, none of the source files have actually changed, as can be seen
by inspecting the reproducer.

If you do the initial build of the reproducer target with --remote_download_outputs=all,
then subsequent builds with --remote_download_outputs=toplevel will see the target
as up to date and will not rebuild it. So it appears that --remote_download_outputs=toplevel
fails to download or update something that Bazel needs in order to tell if the target is up to date.

For context, we are using --experimental_remote_cache_ttl=0s in order to avoid a different
remote execution issue, where the Bazel client sees missing digests when the remote execution
service changes. @werkt can provide more detailed information on that. This ticket is not about
that problem -- we simply want a way to defeat the TTL without breaking BwoB.

@github-actions github-actions bot added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label May 30, 2024
@dws
Copy link
Contributor Author

dws commented May 30, 2024

@coeuvre FYI since this problem originates with commit 1ebb04b

@coeuvre
Copy link
Member

coeuvre commented Jun 3, 2024

It's WAI. With --remote_download_outputs=toplevel, Bazel skip downloading the content of intermediate outputs but only store metadata (e.g. the digest, remote location, ttl, etc.) about them. --experimental_remote_cache_ttl=0s tells Bazel the metadata it stores will be expired after 0s, i.e. Bazel has to discard its in-memory cache about the metadata in a follow-up incremental build and rerun the action.

That being said, the rerun should be able hit the remote cache.

For --remote_download_outputs=all, all outputs are downloaded so --experimental_remote_cache_ttl doesn't take effect.

@coeuvre coeuvre changed the title --experimental_remote_cache_ttl=0s breaks --remote_download_outputs=toplevel (Build without the Bytes) --experimental_remote_cache_ttl=0s causes --remote_download_outputs=toplevel (Build without the Bytes) to rerun actions Jun 3, 2024
@tjgq tjgq closed this as not planned Won't fix, can't repro, duplicate, stale Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: support / not a bug (process) untriaged
Projects
None yet
Development

No branches or pull requests

6 participants