Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate builds due to transitions of targets that don't depend on build settings #13281

Open
aherrmann opened this issue Mar 30, 2021 · 7 comments
Labels
not stale Issues or PRs that are inactive but not considered stale P2 We'll consider working on this in future. (Assignee optional) team-Configurability Issues for Configurability team type: feature request

Comments

@aherrmann
Copy link
Contributor

Description of the problem:

Targets that don't depend on a user defined build setting are effectively duplicated when depended upon by targets that depend on a build setting and use a configuration transition.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

  • Check out aherrmann/bazel-configuration-rebuild@2be2a62

  • Follow the steps in the README:
    The target //:cat is independent of the user defined build flag //:flag.
    The target //:flag_cat on the other hand depends on the value of the flag and
    on //:cat as well.

    The target //:cat is only built once, and cached under the same action cache
    key, if the configuration is set via command-line flag.

    $ bazel clean --expunge; bazel build //:flag_cat --disk_cache=.cache/default --build_event_publish_all_actions --build_event_json_file=default.json
    
    $ bazel clean --expunge; bazel build //:flag_cat --//:flag=different_flag --disk_cache=.cache/different --build_event_publish_all_actions --build_event_json_file=different.json
    
    $ jq 'select(.id.actionCompleted.label == "//:cat")' <default.json | jq -s length
    1
    
    $ jq 'select(.id.actionCompleted.label == "//:cat")' <different.json | jq -s length
    1
    
    $ rg 'bin/cat.txt' .cache/*/ac
    .cache/default/ac/b9/b966d571c54c4aa37fc5ae7629f95caaa01ad79474fdc406c9e9a79341c950f6
    2:"bazel-out/k8-fastbuild/bin/cat.txtD
    
    .cache/different/ac/b9/b966d571c54c4aa37fc5ae7629f95caaa01ad79474fdc406c9e9a79341c950f6
    2:"bazel-out/k8-fastbuild/bin/cat.txtD
    

    The targets //:transition_default and //:transition_different use a
    configuration transition to set the value of the build setting //:flag to
    "default_flag" and "different_flag" respectively.

    The target //:cat is built twice and cached under different action cache keys
    when building these transition targets, even though //:cat does not depend on
    the build setting.

    $ bazel clean --expunge; bazel build //:transition_default //:transition_different --disk_cache=.cache/transition --build_event_publish_all_actions --build_event_json_file=transition.json
    
    $ jq 'select(.id.actionCompleted.label == "//:cat")' <transition.json | jq -s length
    2
    
    $ rg 'bin/cat.txt' .cache/transition/ac
    .cache/transition/ac/b9/b966d571c54c4aa37fc5ae7629f95caaa01ad79474fdc406c9e9a79341c950f6
    2:"bazel-out/k8-fastbuild/bin/cat.txtD
    
    .cache/transition/ac/8a/8a979615135482bd13afbdf6004d5c87c0a5f3427bcedc299f496eeff4946e29
    2:2bazel-out/k8-fastbuild-ST-1d73fa6ab729/bin/cat.txtD
    

    The CAS key is of course the same, as the contents of the output are the same.

  • Preferably, //:cat would only be built once in the transition case.

What operating system are you running Bazel on?

Ubuntu 20.10

What's the output of bazel info release?

release 4.0.0

Have you found anything relevant by searching the web?

This seems to be the same issue as reported on the following mailing list thread: https://groups.google.com/g/bazel-discuss/c/zVEc7gzbyu0/m/5UcZ8aXOBQAJ .
This seems to fall under the following umbrella issue: #6526 .
I decided to create a dedicated issue nonetheless, since I didn't see the specific issue of dependencies that don't depend on the build setting being rebuilt and cached separately on the issue tracker.
As I understand, #12568 suggests to address this issue through the setting's visibility attribute. However, that seems like a coarse approximation in general. A setting with public visibility might still only affect few targets.

Any other information, logs, or outputs that you want to share?

The use-case that prompted this issue is using user defined build settings to define feature flags. We then want to test different settings of feature flags in the same bazel test //... run, so we define transitions to generate targets that pin the feature flags. This causes unrelated dependencies to be rebuilt, e.g. core dependencies like com_google_protobuf. Ideally, only targets that directly depend on the feature flag, or transitively depend on a target that depends on the feature flag would be rebuilt.

@aiuto aiuto added Starlark configuration Starlark transitions, build_settings untriaged team-Configurability Issues for Configurability team and removed Starlark configuration Starlark transitions, build_settings labels Mar 31, 2021
@aiuto aiuto added this to the configurability:trimming milestone Apr 7, 2021
@aiuto aiuto added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Apr 7, 2021
@Warchant
Copy link
Contributor

Exists in 4.1.0

@gregestren
Copy link
Contributor

I switched this to a feature request, although I understand the experience you're expressing. I did that because this is a Bazel design challenge, which would take focused design improvements to resolve. It's not a bug in the current design: the current behavior is expected for the given design.

What you're describing falls into some form of "trimming". If you're specifically concerned about actions running on remote executors, the work I'm doing at #6526 may address that and I'd encourage you to follow that issue.

@aherrmann
Copy link
Contributor Author

@gregestren Thank you for the update! Makes sense, I understand the reasoning.

What you're describing falls into some form of "trimming". If you're specifically concerned about actions running on remote executors, the work I'm doing at #6526 may address that and I'd encourage you to follow that issue.

Thanks for the reference. In this issue here I'm not concerned about remote execution but about the configuration aspect.

  • For example the community|enterprise feature switch described here.
  • Another use-case we've considered would be the various configurations that targets can be built under in Haskell: E.g. the language runtime can be linked in various combinations of "ways", such as "vanilla", "threaded", "debug", .... Similarly, Haskell libraries can be built in various configurations such as static, dynamic, debug, profiling, .... These can have complex implications on the dependency graph. E.g. profiling mode only supports static linking, but Template Haskell requires dynamically linked library dependencies (on Unix by default - there are alternatives, but they have downsides). So, to build a library that uses Template Haskell in profiling mode we need to build it's dependencies both "static with profiling" and "dynamic without profiling". Transitions could be an elegant way to express this, however, in light of this current issue that doesn't seem feasible. An alternative is to have rules emit actions for all these various combinations and then have dependencies pick the variants that they need in their configuration, but, that produces a combinatorial number of actions and is not very elegant.

@gregestren
Copy link
Contributor

What's your primary concern about this behavior? I.e. build slowness? Memory usage?

The basic challenge is knowing where in the dep graph the dependency on the flag actually ends. And how to encode that in a way that remains correct (don't remove the flag prematurely), performance, and not burdensome for the user (i.e. manually tagging every edge on the graph that should keep the flag, which doesn't scale well.

Are there any alternative configurations where the part of the graph that needs the flag is its own distinct subgraph? i.e. isolate the part of the graph to propagate the flag by flag structure? I think that opens up more options.

As an aside, I'm surprised to read about collision at https://github.com/aherrmann/bazel-transitions-demo#gotchas. I didn't look super-thoroughly, but that should be hard to actually do in practice (i.e. collisions shouldn't really happen unless someone is egregiously abusing a rule). Is that specific to pkg_tar?

@aherrmann
Copy link
Contributor Author

What's your primary concern about this behavior? I.e. build slowness? Memory usage?

Build slowness induced by unnecessary rebuilds of the same target. E.g. external dependencies such as com_google_protobuf are built multiple times even though they are unaffected by the feature flag. (See "Any other information, logs, or outputs that you want to share?" in the issue description)

Are there any alternative configurations where the part of the graph that needs the flag is its own distinct subgraph? i.e. isolate the part of the graph to propagate the flag by flag structure? I think that opens up more options.

I'm not sure I understand what this means. Do you have an example in mind of what this would look like?
Looking at the minimal example, I'd say //:cat is independent of the flag and everything else forms the sub-graph that depends on the flag. Nonetheless, also //:cat is rebuilt under transitions to different flag values.

As an aside, I'm surprised to read about collision at https://github.com/aherrmann/bazel-transitions-demo#gotchas. I didn't look super-thoroughly, but that should be hard to actually do in practice (i.e. collisions shouldn't really happen unless someone is egregiously abusing a rule). Is that specific to pkg_tar?

The issue occurs when one combines the same target under different configurations in rules or other mechanisms that operate on paths relative to the package or $(BINDIR) rather than execRoot. I.e. Bazel might distinguish the outputs as bazel-out/k8-fastbuild/bin/app/bin and bazel-out/k8-fastbuild-ST-391c2674bb99/bin/app/bin, but pkg_tar will map them both to /bin.
Similarly, paths in the runfiles tree are based on paths relative to $(BINDIR). aherrmann/bazel-transitions-demo@0344cb1 illustrates this issue. Running //app:runfiles produces

$ bazel run //app:runfiles
bin-ce
Using community edition
Data file: ENTERPRISE

bin-ee
Using enterprise edition
Data file: ENTERPRISE

bin
Using enterprise edition
Data file: ENTERPRISE

(i.e. there is a collision on both bin and data.txt)

This seems to be sensitive to the order in which dependencies are declared: If I swap the deps

 sh_binary(
     name = "runfiles",
     srcs = ["runfiles.sh"],
-    data = ["bin-ce", "bin-ee"],
+    data = ["bin-ee", "bin-ce"],
     deps = ["@bazel_tools//tools/bash/runfiles"],
 )

then the output becomes

bin-ce
Using community edition
Data file: COMMUNITY

bin-ee
Using enterprise edition
Data file: COMMUNITY

bin
Using community edition
Data file: COMMUNITY

I guess whether this counts as egregious abuse depends on the use-case. We abandoned the transitions based approach because of this issue, so I can't provide a production example. But the following made-up example doesn't seem too absurd: Following the community vs. enterprise version example above, we may want to write an integration test that both enterprise and community clients can participate in the same network. In that case both versions would be runfile dependencies of the integration test target. At that point we need to be careful not to trigger the issue above.

@github-actions
Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Jun 13, 2023
@fmeum
Copy link
Collaborator

fmeum commented Jun 13, 2023

@bazelbuild/triage not stale

@sgowroji sgowroji removed the stale Issues or PRs that are stale (no activity for 30 days) label Jun 13, 2023
@iancha1992 iancha1992 added the not stale Issues or PRs that are inactive but not considered stale label Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not stale Issues or PRs that are inactive but not considered stale P2 We'll consider working on this in future. (Assignee optional) team-Configurability Issues for Configurability team type: feature request
Projects
None yet
Development

No branches or pull requests

7 participants