-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please allow remote execution to have its own Platform, separate from host and target. #5309
Comments
Bazel currently has the execution platform, which is intended to represent both remote and local execution. You can specify the available execution platforms in two ways:
This already is present, and already is working with toolchain resolution, with one important restriction: it only works with rules that support toolchain resolution. This currently includes many Skylark rules, and we are in the process of fixing bugs in the implementation for C++ toolchain resolution using the native rules. Let me know what further questions you have, and where the existing documentation is deficient, so we can clear everything up and make this easier to use. |
I understand that, but a single execution platform is not sufficient when actions within the same build might execute on different platforms. Case 1: running Bazel with Case 2: Since Bazel makes the host/remote distinction by action instead of rule, it's possible for two actions in the same rule to run on different platforms. In this case, the current rule-level toolchain resolution will get confused and may resolve a toolchain that isn't usable for one of the actions. |
Each configured target has a single execution platform, which is used for all actions generated by that configured target. Bazel as a whole can have any number of registered execution platforms (via the command-line flag, or the WORKSPACE function), and will use toolchain resolution to choose the best one. The logic is documented here: https://docs.bazel.build/versions/master/toolchains.html#toolchain-resolution Case 1: Genrule currently doesn't respect toolchain resolution. This is actually a pretty difficult problem, but one we intend to tackle shortly. Case 2: This is not currently the case. There's no fundamental reason for it, except that the APIs don't support it, but currently every action generated from a single configured target has the same execution platform set. We are open to relaxing that restriction in the future, but it will need a pretty compelling use case. Do you have a specific problem that I can take a look at to help you solve the problem you are having? |
This is not true in Bazel 0.13 (haven't updated to 0.14 yet) -- actions in the same target can execute on different platforms, depending on execution strategy:
This also appears to be untrue in 0.13 -- when I pass a target in the genrule's
I want to run Bazel on a MacOS laptop, and have certain heavy actions (Java/Scala compilation) run on a distributed buildfarm via the remote execution protocol. The buildfarm workers are running on Linux. |
I see where I was confused. You are right, strategies such as remote, sandboxed, etc can be set per-action. However, the execution platform is set per-target. I can see where this will cause problems when being mixed together. Do you have an error case I can use to try and fix this? I know @ulfjack is planning some changes to the entire strategy system, but I don't know how that will affect remote/local execution. All rules that don't participate in toolchain resolution set the host platform as the execution platform, leading to the cases you saw. For these, a |
I'm not sure that the set of execution platforms is related to this issue. As described in the original post, I think the best way to support the use case of a multi-platform distributed build is to have a |
The core issue is that there isn't a single remote platform: your remote execution system could have several types of workers, and they can all be used in a build. However, I definitely see the value of having a way to specify the default remote platform for legacy cases, and will look into implementing that shortly. That should help fix your issue with genrules, if I understand it correctly. |
resolution is not used. If the flag is not set, the execution platform will be the host platform. Fixes bazelbuild#5309. RELNOTES: Adds the --legacy_fallback_execution_platform flag to specify a fallback execution platform whentoolchain resolution is not used. Change-Id: I5e91c209cf5f043e29fb512c0ef81385a44d4817
@katre, I don't think that'll work. Let me elaborate. @jmillikin-stripe, can you confirm that this is what you want: You want to run certain actions remotely, even though the remote execution platform is incompatible with your local machine, e.g., local machine is a Mac, remote machine is a Linux machine. This is at odds with Bazel's current design, although the existence of the --*_strategy flags allows you to effectively override where an individual action is executed. In its current design, Bazel allows rules to introspect the execution platform on which actions will run. Rule analysis generates the action, and it has to generate the action in its final form, taking into account whatever peculiarities of the underlying platform. For example, an action running on Windows has to use windows-style paths (this is the biggest immediately visible difference) for all paths referenced in the action (binary to run, inputs, outputs, temporary locations, etc.). There is currently a hard barrier between analyzing a rule and executing an action, and therefore, the execution platform has to be set before we analyze a rule. It also has to be set manually for two reasons - there may be any number of execution platforms, and we also don't want loading+analysis to depend on whether or not a remote execution system is known and whether we can talk to it. At this time, the rules API only provides access to a single execution platform - there used to be no explicit model of the execution platform, but this is what was implicitly the case as part of the BuildConfiguration. As such, a single rule cannot generate actions for different execution platforms. There are cases where the execution platforms are sufficiently similar that things happen to work, and the availability of the --*_strategy flags allows the user to remote execution to an strictly speaking incompatible execution without the rule knowing. However, this is not a viable long term strategy. In the short term, I'd suggest that we provide a way to override an action's associated execution platform. This allows users to get Bazel to do what they say, even if it doesn't make sense from Bazel's point of view. In the long term, I'd suggest that we allow rules to access multiple execution platforms within a single rule. We'll have to carefully think about the right APIs for this, and the right APIs may require 'rule fragments'. The concept behind a rule fragment is that rule authors can identify parts of the action graph that have execution platform consistency requirements, and enforce that all corresponding actions are configured with the same execution platform by declaring smaller 'rule fragments', each of which has one execution platform. By making this explicit, and by making each fragment independent (with Bazel explicitly controlling communication between fragments), we can allow Bazel to defer analysis of such a fragment until execution time, to retrofit it to the actually selected execution platform. Rule fragments would also allow us to solve another issue with configurations related to output paths. Consider a java_binary rule with native code - the java_binary rule has to declare a dependency on the C++ configuration, even though the pure Java compilation part of the rule only requires the Java configuration. Even with configuration subsetting, this means that it's non-trivial to make the Java compilation use different output paths that do not contain the C++ configuration fragment. I hope this all made sense. |
(We have been using some hacks to allow certain rules to generate actions for different execution platforms.) |
I want to run heavy processes, such as compilation, remotely. The remote workers' machines are incompatible with my local machine. I want to retain the ability to run certain rules locally, such as for tests that are not yet fully hermetic. These targets would be specified by tags like
To be clear, while I've documented Bazel's current behavior here, I don't actually want that behavior. I have no use case where executing a target's actions on different platforms is useful. Selecting local/remote strategy by action mnemonic, instead of by rule type, seems confusing and fragile to me.
I think this requirement is broadly compatible with a
This seems like it would be difficult for rule authors to use, and I'm worried that most open-source language rule implementations would not implement it properly. |
What would the --remote_platform flag do, though? We can't make it available to rules because the execution platform is part of the BuildConfiguration - if we made the 'remote_platform' part of the BuildConfiguration, all rules would have access to the 'remote_platform' in addition to the execution platform. What should the rules do with that information? Maybe the suggestion is that Bazel would pick one or the other as execution platform on a per rule basis and only make that one available to the rule? That'd obey the one execution platform per rule constraint. How would that interact with toolchains? (On a related note, please please please do not call it --remote_platform. There is nothing 'remote' about it - it might be compatible with the local host, or we might be running it in a local docker container or in a local VM. If we have to, then call it --per_rule_execution_platform, or something.) |
This morning I sent out #5322, which adds a flag to set the execution platform for rules that don't use toolchain resolution (including all legacy rules). This can't be any more fine-grained, there is no way during analysis to know whether a target/action will be executed remotely or locally. |
If we're using the execution platform to decide where to execute the actions for a rule, then we could have a per-rule selection of execution platforms. |
@ulfjack We're not doing that currently, is that planned to be changed? I think it's a great idea but I am not sure I have time to make the change. |
I would imagine that when Bazel is configuring a target, it would decide whether that target is executed entirely on "host" or entirely on "remote". Depending on this choice, it would use either
It's named |
@katre this issue is currently marked as a feature request, although there may be a regression here as well because of your work? |
I am not completely certain about the --host_platform flag, but the "host" prefix in Bazel usually describes the execution platform, which may or may not be compatible with the local machine. Bazel can run builds without doing any actual work on the local machine whatsoever, in which case the "host" moniker is completely misleading. Adding --remote_platform makes the confusion even worse, because that one may actually refer to the local machine. For example, you might have a 'host' configuration of linux, with all actions run on a remote machine, cross-compiling for a 'target' configuration of mac os, which happens to be the local machine. We really need to get our terminology straight. |
The There is definitely a lot of confusion here, unfortunately, which is why I'm working on new platform and toolchain features to simplify the configuration and make this more straightforward. Also, moving more native rules to use toolchain resolution will help. |
Closing this due to it not being consistent with the current direction of the code. |
This is now supported via exec groups: https://bazel.build/extending/exec-groups
|
Description of the problem / feature request:
Bazel distinguishes between the "target platform", which binaries are being compiled for, and "host platform", which binaries run from build actions are compiled for. The introduction of remote execution means that there's a third machine involved, the remote worker, which may be of a different platform. This confuses Bazel, because it assumes that all actions run on a platform compatible with the host.
Please add a
--remote_platform
flag (easy), with associated toolchain resolution plumbing (hard + lots of work + messy?), so that I can runbazel build --remote_platform=//some-platforms:linux
on a MacOS machine and have Linux binaries be compiled when they're going to be run remotely.What operating system are you running Bazel on?
The Bazel client is running on MacOS, the Bazel build worker is running on Linux.
The text was updated successfully, but these errors were encountered: