Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Builds invalidate cache from test runs and vice versa #13186

Open
smolkaj opened this issue Mar 9, 2021 · 15 comments
Open

Builds invalidate cache from test runs and vice versa #13186

smolkaj opened this issue Mar 9, 2021 · 15 comments
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Configurability Issues for Configurability team type: support / not a bug (process)
Milestone

Comments

@smolkaj
Copy link

smolkaj commented Mar 9, 2021

Description of the problem / feature request:

Starting from an empty bazel cache, when we run

bazel build //...  # empty cache
bazel test //...  # build cache not reused?
bazel build //...  # cache invaldiated by test run?
bazel test //...
...

on our project, every single command seems to be rebuilding the cache; we observe very slow builds (~1h).

The following works:

baze build //...  # first build is slow
bazel build //...  # almost instant
bazel build //...  # almost instant

Similarly for testing:

baze test //...  # first build is slow
bazel test //...  # almost instant
bazel test //...  # almost instant

We have tried --distinct_host_configuration=false but it doesn't solve the issue.

What operating system are you running Bazel on?

Ubuntu 20.04

What's the output of bazel info release?

release 4.0.0

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

git@github.com:pins/pins-infra.git
80a739d0e2f15edbb185cc9061f66202a70fffd8
80a739d0e2f15edbb185cc9061f66202a70fffd8

Any other information, logs, or outputs that you want to share?

@smolkaj smolkaj changed the title Build cache invalidates test cache and vice versa Builds invalidate cache from test runs and vice versa Mar 9, 2021
@aiuto aiuto added team-Configurability Issues for Configurability team untriaged P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Mar 31, 2021
@aiuto
Copy link
Contributor

aiuto commented Mar 31, 2021

@sdtwigg Is this a duplicate of the work you are doing with trim test?

@smolkaj
Copy link
Author

smolkaj commented Mar 31, 2021

Thanks for looking into this, @aiuto.

It may be worth adding that this is a non-trivial issue for us. The usual developer workflow is to edit code, then run bazel build //... && bazel test //.... Due to this issue, that command takes 2h (!) to complete, and we do not know of a workaround. Just running bazel test //... is not good enough because not all build targets are covered by tests.

@zachgrayio
Copy link

doesnt bazel test ... fall back to build for non test targets? we dont really ever rely on this but I thought I read that somewhere once. seems to be the case in quick local test but unsure 🤷

and can you share output or anything at least here? are you seeing <message>, discarding analysis cache. or anything between runs? are build and test commands getting the exact same config settings passed such that you should actually expect to share cache hits? are you using a remote cache/exec service and/or disk cache?

sorry if these are obvious, unsure of your level of experience, but so far this is just sounding similar to different build/test configs maybe via bazelrc; trying the usual --incompatible_strict_action_env might be useful?

@smolkaj
Copy link
Author

smolkaj commented Mar 31, 2021

Thanks for the various pointers, @zachgrayio.

doesnt bazel test ... fall back to build for non test targets?

Not that I know, but will give it a try.

are you using a remote cache/exec service and/or disk cache?

We're using the Bazel defaults, which I assume is a disk cache.

I will share the output of Bazel once I have it.
For now, here is our .bazelrc:

build --cxxopt='-std=c++17'
build --host_cxxopt='-std=c++17'
test --cxxopt='-std=c++17'
test --host_cxxopt='-std=c++17'
run --cxxopt='-std=c++17'
run --host_cxxopt='-std=c++17'

# Required for UPB (libprotobuf_mutator dependency) to compile.
build --copt='-Wno-error=stringop-truncation'
build --host_copt='-Wno-error=stringop-truncation'
test --copt='-Wno-error=stringop-truncation'
test --host_copt='-Wno-error=stringop-truncation'
run --copt='-Wno-error=stringop-truncation'
run --host_copt='-Wno-error=stringop-truncation'

# To allow loops with int and comparison against a .size() that's size_t.
build --copt='-Wno-error=sign-compare'
build --host_copt='-Wno-error=sign-compare'
test --copt='-Wno-error=sign-compare'
test --host_copt='-Wno-error=sign-compare'
run --copt='-Wno-error=sign-compare'
run --host_copt='-Wno-error=sign-compare'

trying the usual --incompatible_strict_action_env might be useful?

Thanks, I'll give that a try.

@bocon13
Copy link

bocon13 commented Nov 20, 2021

@smolkaj I think I found the issue:

test --cxxopts includes build --cxxopts

.bazel rc
------------
build --cxxopt='-std=c++17'
test --cxxopt='-std=c++17'
$ bazel test -s //...
...
gcc ... '-std=c++17' '-std=c++17'

Technically, this means that builds and tests are done with "different" options, so the cache is invalidated and a rebuild occurs. If we avoid specifying test and run opts, caching seems to work.

@smolkaj
Copy link
Author

smolkaj commented Dec 7, 2021

Thanks @bocon13, great catch finding the root cause!

To the Bazel authors: is it expected behavior that the cache gets discarded when the build options change?

@gregestren gregestren added type: support / not a bug (process) untriaged and removed P2 We'll consider working on this in future. (Assignee optional) labels Feb 1, 2022
@gregestren
Copy link
Contributor

CC'ing @aranguyen , who's been looking recently into how Bazel interprets flag and bazelrc parsing.

Yes, it is expected that the cache is discarded when build options change. I'm not sure it's documented well, and I'm happy to support suggestions for better documentation. This is covered in

// Note that clearing the analysis cache is currently required for correctness. It is also
// helpful to save memory.
//
// If we had more memory, fixing the correctness issue (see also b/144932999) would allow us
// to not invalidate the cache, leading to potentially better performance on incremental
// builds.
- it's not just a nicety but actually required for correctness due to subtle issues with Skfyrame.

b/144932999 is a Google bug with no special secrets: it repeats this comment. I believe the issue was it's possible for different flag combos to produce the same action, so when Skyframe is asked to execute an action is can get confused about which variation to use.

Conceptually, '-std=c++17' '-std=c++17' clearly shouldn't matter. Perhaps @aranguyen has some ideas about why that flag replicates and if there's any way to a better practical outcome.

@aranguyen
Copy link
Contributor

For the flag --cxxopt, mutiple uses are accumulated. In the build case, Bazel evaluates this to be cxxopt=[-std=c++17]. In the test case, since test inherits from build and the user is specifying the flag again, Bazel sees this as cxxopt=[-std=c++17, -std=c++17] making it different. Hence, the cache is invalidated.

In this case, I would suggest to not have the test line in the bazelrc file if it is not different from the build line and let inheritance take care of it. @smolkaj, would this work?

There is actually a similar github issue and both can probably be fixed by introducing deduplication for flags that allow multiple values. But this is low priority.

@aranguyen aranguyen added this to the flags cleanup milestone Feb 2, 2022
@aranguyen aranguyen added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Feb 2, 2022
@smolkaj
Copy link
Author

smolkaj commented Feb 2, 2022

Yes, thanks, this works for us, though I think it is quite brittle and it would be nice to fix the underlying issue.

@github-actions
Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Jun 13, 2023
@smolkaj
Copy link
Author

smolkaj commented Jun 17, 2023

This issue is not stale. It would be nice to get a fix.

@github-actions github-actions bot removed the stale Issues or PRs that are stale (no activity for 30 days) label Jun 18, 2023
@j2kun
Copy link

j2kun commented Mar 12, 2024

IMO, what would be nice is a tool that users could run that would determine the diff in the env/flags that would allow us to determine why the cache is being invalidated. Or even just a way to determine "will the cache invalidate if I run it now" but don't actually run it. I often find the cache invalidates for seemingly no reason, and this is one of the biggest issues new users have when onboarding to my bazel-based project.

@brentleyjones
Copy link
Contributor

Does --noallow_analysis_cache_discard help in that regard?

@j2kun
Copy link

j2kun commented Mar 12, 2024

That does help, thanks!

@gregestren
Copy link
Contributor

FYI @susinmotion and @katre are looking at the Skyframe restrictions described at #13186 (comment) and seeing how much we can lift them.

I'm hoping in the near-term we can avoid invalidating the exec-configured parts of the analysis cache (i.e. tools/compilers). They generally don't care about flags set at the command line. And they can be a surprisingly large part of the build graph.

I'm not sure if that would kick in for --copt, etc. It might. And if it doesn't it might be straightforward to adjust it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Configurability Issues for Configurability team type: support / not a bug (process)
Projects
None yet
Development

No branches or pull requests

8 participants