Builds invalidate cache from test runs and vice versa #13186

smolkaj · 2021-03-09T15:23:46Z

Description of the problem / feature request:

Starting from an empty bazel cache, when we run

bazel build //...  # empty cache
bazel test //...  # build cache not reused?
bazel build //...  # cache invaldiated by test run?
bazel test //...
...

on our project, every single command seems to be rebuilding the cache; we observe very slow builds (~1h).

The following works:

baze build //...  # first build is slow
bazel build //...  # almost instant
bazel build //...  # almost instant

Similarly for testing:

baze test //...  # first build is slow
bazel test //...  # almost instant
bazel test //...  # almost instant

We have tried --distinct_host_configuration=false but it doesn't solve the issue.

What operating system are you running Bazel on?

Ubuntu 20.04

What's the output of `bazel info release`?

release 4.0.0

What's the output of `git remote get-url origin ; git rev-parse master ; git rev-parse HEAD` ?

git@github.com:pins/pins-infra.git
80a739d0e2f15edbb185cc9061f66202a70fffd8
80a739d0e2f15edbb185cc9061f66202a70fffd8

Any other information, logs, or outputs that you want to share?

The text was updated successfully, but these errors were encountered:

aiuto · 2021-03-31T04:18:04Z

@sdtwigg Is this a duplicate of the work you are doing with trim test?

smolkaj · 2021-03-31T04:44:37Z

Thanks for looking into this, @aiuto.

It may be worth adding that this is a non-trivial issue for us. The usual developer workflow is to edit code, then run bazel build //... && bazel test //.... Due to this issue, that command takes 2h (!) to complete, and we do not know of a workaround. Just running bazel test //... is not good enough because not all build targets are covered by tests.

zachgrayio · 2021-03-31T04:56:08Z

doesnt bazel test ... fall back to build for non test targets? we dont really ever rely on this but I thought I read that somewhere once. seems to be the case in quick local test but unsure 🤷

and can you share output or anything at least here? are you seeing <message>, discarding analysis cache. or anything between runs? are build and test commands getting the exact same config settings passed such that you should actually expect to share cache hits? are you using a remote cache/exec service and/or disk cache?

sorry if these are obvious, unsure of your level of experience, but so far this is just sounding similar to different build/test configs maybe via bazelrc; trying the usual --incompatible_strict_action_env might be useful?

smolkaj · 2021-03-31T05:17:24Z

Thanks for the various pointers, @zachgrayio.

doesnt bazel test ... fall back to build for non test targets?

Not that I know, but will give it a try.

are you using a remote cache/exec service and/or disk cache?

We're using the Bazel defaults, which I assume is a disk cache.

I will share the output of Bazel once I have it.
For now, here is our .bazelrc:

build --cxxopt='-std=c++17'
build --host_cxxopt='-std=c++17'
test --cxxopt='-std=c++17'
test --host_cxxopt='-std=c++17'
run --cxxopt='-std=c++17'
run --host_cxxopt='-std=c++17'

# Required for UPB (libprotobuf_mutator dependency) to compile.
build --copt='-Wno-error=stringop-truncation'
build --host_copt='-Wno-error=stringop-truncation'
test --copt='-Wno-error=stringop-truncation'
test --host_copt='-Wno-error=stringop-truncation'
run --copt='-Wno-error=stringop-truncation'
run --host_copt='-Wno-error=stringop-truncation'

# To allow loops with int and comparison against a .size() that's size_t.
build --copt='-Wno-error=sign-compare'
build --host_copt='-Wno-error=sign-compare'
test --copt='-Wno-error=sign-compare'
test --host_copt='-Wno-error=sign-compare'
run --copt='-Wno-error=sign-compare'
run --host_copt='-Wno-error=sign-compare'

trying the usual --incompatible_strict_action_env might be useful?

Thanks, I'll give that a try.

bocon13 · 2021-11-20T04:13:23Z

@smolkaj I think I found the issue:

test --cxxopts includes build --cxxopts

.bazel rc
------------
build --cxxopt='-std=c++17'
test --cxxopt='-std=c++17'

$ bazel test -s //...
...
gcc ... '-std=c++17' '-std=c++17'

Technically, this means that builds and tests are done with "different" options, so the cache is invalidated and a rebuild occurs. If we avoid specifying test and run opts, caching seems to work.

smolkaj · 2021-12-07T01:09:07Z

Thanks @bocon13, great catch finding the root cause!

To the Bazel authors: is it expected behavior that the cache gets discarded when the build options change?

gregestren · 2022-02-01T20:42:18Z

CC'ing @aranguyen , who's been looking recently into how Bazel interprets flag and bazelrc parsing.

Yes, it is expected that the cache is discarded when build options change. I'm not sure it's documented well, and I'm happy to support suggestions for better documentation. This is covered in

bazel/src/main/java/com/google/devtools/build/lib/skyframe/SkyframeBuildView.java

Lines 334 to 339 in 3cd5f84

    
           // Note that clearing the analysis cache is currently required for correctness. It is also 
        
           // helpful to save memory. 
        
           // 
        
           // If we had more memory, fixing the correctness issue (see also b/144932999) would allow us 
        
           // to not invalidate the cache, leading to potentially better performance on incremental 
        
           // builds.

- it's not just a nicety but actually required for correctness due to subtle issues with Skfyrame.

b/144932999 is a Google bug with no special secrets: it repeats this comment. I believe the issue was it's possible for different flag combos to produce the same action, so when Skyframe is asked to execute an action is can get confused about which variation to use.

Conceptually, '-std=c++17' '-std=c++17' clearly shouldn't matter. Perhaps @aranguyen has some ideas about why that flag replicates and if there's any way to a better practical outcome.

aranguyen · 2022-02-02T09:06:15Z

For the flag --cxxopt, mutiple uses are accumulated. In the build case, Bazel evaluates this to be cxxopt=[-std=c++17]. In the test case, since test inherits from build and the user is specifying the flag again, Bazel sees this as cxxopt=[-std=c++17, -std=c++17] making it different. Hence, the cache is invalidated.

In this case, I would suggest to not have the test line in the bazelrc file if it is not different from the build line and let inheritance take care of it. @smolkaj, would this work?

There is actually a similar github issue and both can probably be fixed by introducing deduplication for flags that allow multiple values. But this is low priority.

smolkaj · 2022-02-02T17:24:29Z

Yes, thanks, this works for us, though I think it is quite brittle and it would be nice to fix the underlying issue.

github-actions · 2023-06-13T01:34:21Z

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

smolkaj · 2023-06-17T04:50:43Z

This issue is not stale. It would be nice to get a fix.

j2kun · 2024-03-12T13:53:27Z

IMO, what would be nice is a tool that users could run that would determine the diff in the env/flags that would allow us to determine why the cache is being invalidated. Or even just a way to determine "will the cache invalidate if I run it now" but don't actually run it. I often find the cache invalidates for seemingly no reason, and this is one of the biggest issues new users have when onboarding to my bazel-based project.

brentleyjones · 2024-03-12T19:32:17Z

Does --noallow_analysis_cache_discard help in that regard?

j2kun · 2024-03-12T20:11:37Z

That does help, thanks!

gregestren · 2024-03-15T21:25:05Z

FYI @susinmotion and @katre are looking at the Skyframe restrictions described at #13186 (comment) and seeing how much we can lift them.

I'm hoping in the near-term we can avoid invalidating the exec-configured parts of the analysis cache (i.e. tools/compilers). They generally don't care about flags set at the command line. And they can be a surprisingly large part of the build graph.

I'm not sure if that would kick in for --copt, etc. It might. And if it doesn't it might be straightforward to adjust it.

smolkaj changed the title ~~Build cache invalidates test cache and vice versa~~ Builds invalidate cache from test runs and vice versa Mar 9, 2021

aiuto added team-Configurability Issues for Configurability team untriaged P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Mar 31, 2021

gregestren added type: support / not a bug (process) untriaged and removed P2 We'll consider working on this in future. (Assignee optional) labels Feb 1, 2022

aranguyen added this to the flags cleanup milestone Feb 2, 2022

aranguyen added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Feb 2, 2022

gregestren mentioned this issue May 18, 2022

Investigate flag change performance costs #15520

Open

4 tasks

github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Jun 13, 2023

github-actions bot removed the stale Issues or PRs that are stale (no activity for 30 days) label Jun 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Builds invalidate cache from test runs and vice versa #13186

Builds invalidate cache from test runs and vice versa #13186

smolkaj commented Mar 9, 2021

aiuto commented Mar 31, 2021

smolkaj commented Mar 31, 2021

zachgrayio commented Mar 31, 2021

smolkaj commented Mar 31, 2021

bocon13 commented Nov 20, 2021 •

edited

smolkaj commented Dec 7, 2021

gregestren commented Feb 1, 2022

aranguyen commented Feb 2, 2022

smolkaj commented Feb 2, 2022

github-actions bot commented Jun 13, 2023

smolkaj commented Jun 17, 2023

j2kun commented Mar 12, 2024 •

edited

brentleyjones commented Mar 12, 2024

j2kun commented Mar 12, 2024

gregestren commented Mar 15, 2024

Builds invalidate cache from test runs and vice versa #13186

Builds invalidate cache from test runs and vice versa #13186

Comments

smolkaj commented Mar 9, 2021

Description of the problem / feature request:

What operating system are you running Bazel on?

What's the output of bazel info release?

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

Any other information, logs, or outputs that you want to share?

aiuto commented Mar 31, 2021

smolkaj commented Mar 31, 2021

zachgrayio commented Mar 31, 2021

smolkaj commented Mar 31, 2021

bocon13 commented Nov 20, 2021 • edited

smolkaj commented Dec 7, 2021

gregestren commented Feb 1, 2022

aranguyen commented Feb 2, 2022

smolkaj commented Feb 2, 2022

github-actions bot commented Jun 13, 2023

smolkaj commented Jun 17, 2023

j2kun commented Mar 12, 2024 • edited

brentleyjones commented Mar 12, 2024

j2kun commented Mar 12, 2024

gregestren commented Mar 15, 2024

What's the output of `bazel info release`?

What's the output of `git remote get-url origin ; git rev-parse master ; git rev-parse HEAD` ?

bocon13 commented Nov 20, 2021 •

edited

j2kun commented Mar 12, 2024 •

edited