Please allow remote execution to have its own Platform, separate from host and target. #5309

jmillikin-stripe · 2018-06-01T03:16:38Z

Description of the problem / feature request:

Bazel distinguishes between the "target platform", which binaries are being compiled for, and "host platform", which binaries run from build actions are compiled for. The introduction of remote execution means that there's a third machine involved, the remote worker, which may be of a different platform. This confuses Bazel, because it assumes that all actions run on a platform compatible with the host.

Please add a --remote_platform flag (easy), with associated toolchain resolution plumbing (hard + lots of work + messy?), so that I can run bazel build --remote_platform=//some-platforms:linux on a MacOS machine and have Linux binaries be compiled when they're going to be run remotely.

What operating system are you running Bazel on?

The Bazel client is running on MacOS, the Bazel build worker is running on Linux.

The text was updated successfully, but these errors were encountered:

jin · 2018-06-01T03:32:28Z

cc @katre @buchgr

katre · 2018-06-01T09:02:00Z

Bazel currently has the execution platform, which is intended to represent both remote and local execution.

You can specify the available execution platforms in two ways:

Use the --extra_execution_platforms flag
Use the register_execution_platforms function in your WORKSPACE

This already is present, and already is working with toolchain resolution, with one important restriction: it only works with rules that support toolchain resolution. This currently includes many Skylark rules, and we are in the process of fixing bugs in the implementation for C++ toolchain resolution using the native rules.

Let me know what further questions you have, and where the existing documentation is deficient, so we can clear everything up and make this easier to use.

jmillikin-stripe · 2018-06-01T17:11:28Z

Bazel currently has the execution platform, which is intended to represent both remote and local execution.

I understand that, but a single execution platform is not sufficient when actions within the same build might execute on different platforms.

Case 1: running Bazel with --genrule_strategy=remote, any tools used in that genrule must be compiled for the remote platform. They will fail to run if built for the host platform. Other tools, used in non-genrule actions, must be built for the host platform.

Case 2: Since Bazel makes the host/remote distinction by action instead of rule, it's possible for two actions in the same rule to run on different platforms. In this case, the current rule-level toolchain resolution will get confused and may resolve a toolchain that isn't usable for one of the actions.

katre · 2018-06-01T17:26:19Z

Each configured target has a single execution platform, which is used for all actions generated by that configured target.

Bazel as a whole can have any number of registered execution platforms (via the command-line flag, or the WORKSPACE function), and will use toolchain resolution to choose the best one. The logic is documented here: https://docs.bazel.build/versions/master/toolchains.html#toolchain-resolution

Case 1: Genrule currently doesn't respect toolchain resolution. This is actually a pretty difficult problem, but one we intend to tackle shortly.

Case 2: This is not currently the case. There's no fundamental reason for it, except that the APIs don't support it, but currently every action generated from a single configured target has the same execution platform set.

We are open to relaxing that restriction in the future, but it will need a pretty compelling use case.

Do you have a specific problem that I can take a look at to help you solve the problem you are having?

jmillikin-stripe · 2018-06-01T17:45:44Z

Each configured target has a single execution platform, which is used for all actions generated by that configured target.

This is not true in Bazel 0.13 (haven't updated to 0.14 yet) -- actions in the same target can execute on different platforms, depending on execution strategy:

def _test_rule(ctx):
  out_1 = ctx.actions.declare_file("test_rule_1.txt")
  out_2 = ctx.actions.declare_file("test_rule_2.txt")
  ctx.actions.run_shell(
      outputs = [out_1],
      command = "uname -a > " + out_1.path,
      mnemonic = "TestRule1",
  )
  ctx.actions.run_shell(
      outputs = [out_2],
      command = "uname -a > " + out_2.path,
      mnemonic = "TestRule2",
  ) 
  return [
      DefaultInfo(
          files = depset([out_1, out_2]),
      ),
  ]

test_rule = rule(_test_rule)

$ bazel build --strategy=TestRule2=remote //:my_test_target
[...]
Target //:toolchain_target up-to-date:
  bazel-bin/test_rule_1.txt
  bazel-bin/test_rule_2.txt
$ cat bazel-bin/test_rule_1.txt
Darwin st-jmillikin1.local 17.4.0 Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST 2017; root:xnu-4570.41.2~1/RELEASE_X86_64 x86_64
$ cat bazel-bin/test_rule_2.txt
Linux 6a8038c4ed0b 4.9.36-moby #1 SMP Wed Jul 12 15:29:07 UTC 2017 x86_64 GNU/Linux

Genrule currently doesn't respect toolchain resolution. This is actually a pretty difficult problem, but one we intend to tackle shortly.

This also appears to be untrue in 0.13 -- when I pass a target in the genrule's tools attr, that target's rule is called with a resolved toolchain matching the host platform. I've verified this toolchain changes when I manually set --host_platform.

Do you have a specific problem that I can take a look at to help you solve the problem you are having?

I want to run Bazel on a MacOS laptop, and have certain heavy actions (Java/Scala compilation) run on a distributed buildfarm via the remote execution protocol. The buildfarm workers are running on Linux.

katre · 2018-06-01T17:50:09Z

I see where I was confused. You are right, strategies such as remote, sandboxed, etc can be set per-action. However, the execution platform is set per-target. I can see where this will cause problems when being mixed together. Do you have an error case I can use to try and fix this? I know @ulfjack is planning some changes to the entire strategy system, but I don't know how that will affect remote/local execution.

All rules that don't participate in toolchain resolution set the host platform as the execution platform, leading to the cases you saw. For these, a --default_execution_platform flag might make sense, to allow specifying the difference. Would that help with the problems you are seeing with genrule?

jmillikin-stripe · 2018-06-01T17:57:40Z

I'm not sure that the set of execution platforms is related to this issue. As described in the original post, I think the best way to support the use case of a multi-platform distributed build is to have a --remote_platform flag that would tell Bazel to resolve toolchains for that platform when running actions remotely.

katre · 2018-06-04T09:07:29Z

The core issue is that there isn't a single remote platform: your remote execution system could have several types of workers, and they can all be used in a build.

However, I definitely see the value of having a way to specify the default remote platform for legacy cases, and will look into implementing that shortly. That should help fix your issue with genrules, if I understand it correctly.

resolution is not used. If the flag is not set, the execution platform will be the host platform. Fixes bazelbuild#5309. RELNOTES: Adds the --legacy_fallback_execution_platform flag to specify a fallback execution platform whentoolchain resolution is not used. Change-Id: I5e91c209cf5f043e29fb512c0ef81385a44d4817

ulfjack · 2018-06-04T11:53:57Z

@katre, I don't think that'll work. Let me elaborate.

@jmillikin-stripe, can you confirm that this is what you want: You want to run certain actions remotely, even though the remote execution platform is incompatible with your local machine, e.g., local machine is a Mac, remote machine is a Linux machine.

This is at odds with Bazel's current design, although the existence of the --*_strategy flags allows you to effectively override where an individual action is executed.

In its current design, Bazel allows rules to introspect the execution platform on which actions will run. Rule analysis generates the action, and it has to generate the action in its final form, taking into account whatever peculiarities of the underlying platform. For example, an action running on Windows has to use windows-style paths (this is the biggest immediately visible difference) for all paths referenced in the action (binary to run, inputs, outputs, temporary locations, etc.).

There is currently a hard barrier between analyzing a rule and executing an action, and therefore, the execution platform has to be set before we analyze a rule. It also has to be set manually for two reasons - there may be any number of execution platforms, and we also don't want loading+analysis to depend on whether or not a remote execution system is known and whether we can talk to it.

At this time, the rules API only provides access to a single execution platform - there used to be no explicit model of the execution platform, but this is what was implicitly the case as part of the BuildConfiguration. As such, a single rule cannot generate actions for different execution platforms.

There are cases where the execution platforms are sufficiently similar that things happen to work, and the availability of the --*_strategy flags allows the user to remote execution to an strictly speaking incompatible execution without the rule knowing. However, this is not a viable long term strategy.

In the short term, I'd suggest that we provide a way to override an action's associated execution platform. This allows users to get Bazel to do what they say, even if it doesn't make sense from Bazel's point of view.

In the long term, I'd suggest that we allow rules to access multiple execution platforms within a single rule. We'll have to carefully think about the right APIs for this, and the right APIs may require 'rule fragments'. The concept behind a rule fragment is that rule authors can identify parts of the action graph that have execution platform consistency requirements, and enforce that all corresponding actions are configured with the same execution platform by declaring smaller 'rule fragments', each of which has one execution platform. By making this explicit, and by making each fragment independent (with Bazel explicitly controlling communication between fragments), we can allow Bazel to defer analysis of such a fragment until execution time, to retrofit it to the actually selected execution platform.

Rule fragments would also allow us to solve another issue with configurations related to output paths. Consider a java_binary rule with native code - the java_binary rule has to declare a dependency on the C++ configuration, even though the pure Java compilation part of the rule only requires the Java configuration. Even with configuration subsetting, this means that it's non-trivial to make the Java compilation use different output paths that do not contain the C++ configuration fragment.

I hope this all made sense.

ulfjack · 2018-06-04T11:55:16Z

(We have been using some hacks to allow certain rules to generate actions for different execution platforms.)

jmillikin-stripe · 2018-06-04T13:22:43Z

You want to run certain actions remotely, even though the remote execution platform is incompatible with your local machine, e.g., local machine is a Mac, remote machine is a Linux machine.

I want to run heavy processes, such as compilation, remotely. The remote workers' machines are incompatible with my local machine. I want to retain the ability to run certain rules locally, such as for tests that are not yet fully hermetic. These targets would be specified by tags like local.

This is at odds with Bazel's current design, although the existence of the --*_strategy flags allows you to effectively override where an individual action is executed.

To be clear, while I've documented Bazel's current behavior here, I don't actually want that behavior. I have no use case where executing a target's actions on different platforms is useful. Selecting local/remote strategy by action mnemonic, instead of by rule type, seems confusing and fragile to me.

It also has to be set manually for two reasons - there may be any number of execution platforms, and we also don't want loading+analysis to depend on whether or not a remote execution system is known and whether we can talk to it.

I think this requirement is broadly compatible with a --remote_platform flag. Or --remote_platforms if you want to have multiple ones. I agree that analysis should not depend on the reachability of remote workers, so Bazel shouldn't try to auto-discover the remote worker's platform. It should obey the flag set by the user.

In the long term, I'd suggest that we allow rules to access multiple execution platforms within a single rule.

This seems like it would be difficult for rule authors to use, and I'm worried that most open-source language rule implementations would not implement it properly.

ulfjack · 2018-06-04T13:37:03Z

I think this requirement is broadly compatible with a --remote_platform flag. Or --remote_platforms if you want to have multiple ones. I agree that analysis should not depend on the reachability of remote workers, so Bazel shouldn't try to auto-discover the remote worker's platform. It should obey the flag set by the user.

What would the --remote_platform flag do, though? We can't make it available to rules because the execution platform is part of the BuildConfiguration - if we made the 'remote_platform' part of the BuildConfiguration, all rules would have access to the 'remote_platform' in addition to the execution platform. What should the rules do with that information?

Maybe the suggestion is that Bazel would pick one or the other as execution platform on a per rule basis and only make that one available to the rule? That'd obey the one execution platform per rule constraint. How would that interact with toolchains?

(On a related note, please please please do not call it --remote_platform. There is nothing 'remote' about it - it might be compatible with the local host, or we might be running it in a local docker container or in a local VM. If we have to, then call it --per_rule_execution_platform, or something.)

katre · 2018-06-04T13:40:49Z

This morning I sent out #5322, which adds a flag to set the execution platform for rules that don't use toolchain resolution (including all legacy rules). This can't be any more fine-grained, there is no way during analysis to know whether a target/action will be executed remotely or locally.

ulfjack · 2018-06-04T13:46:05Z

If we're using the execution platform to decide where to execute the actions for a rule, then we could have a per-rule selection of execution platforms.

katre · 2018-06-04T13:47:59Z

@ulfjack We're not doing that currently, is that planned to be changed? I think it's a great idea but I am not sure I have time to make the change.

jmillikin-stripe · 2018-06-04T13:49:21Z

What would the --remote_platform flag do, though? We can't make it available to rules because the execution platform is part of the BuildConfiguration - if we made the 'remote_platform' part of the BuildConfiguration, all rules would have access to the 'remote_platform' in addition to the execution platform. What should the rules do with that information?

Maybe the suggestion is that Bazel would pick one or the other as execution platform on a per rule basis and only make that one available to the rule? That'd obey the one execution platform per rule constraint. How would that interact with toolchains?

I would imagine that when Bazel is configuring a target, it would decide whether that target is executed entirely on "host" or entirely on "remote". Depending on this choice, it would use either --host_platform or --remote_platform to resolve toolchains.

(On a related note, please please please do not call it --remote_platform. There is nothing 'remote' about it - it might be compatible with the local host, or we might be running it in a local docker container or in a local VM. If we have to, then call it --per_rule_execution_platform, or something.)

It's named --remote_platform to match the existing flag it pairs with, --remote_executor. There are other names that might be used like --executor_platform or --worker_platform, but they have their own issues.

ulfjack · 2018-06-04T13:49:53Z

@katre this issue is currently marked as a feature request, although there may be a regression here as well because of your work?

ulfjack · 2018-06-04T13:55:12Z

I am not completely certain about the --host_platform flag, but the "host" prefix in Bazel usually describes the execution platform, which may or may not be compatible with the local machine. Bazel can run builds without doing any actual work on the local machine whatsoever, in which case the "host" moniker is completely misleading. Adding --remote_platform makes the confusion even worse, because that one may actually refer to the local machine.

For example, you might have a 'host' configuration of linux, with all actions run on a remote machine, cross-compiling for a 'target' configuration of mac os, which happens to be the local machine. We really need to get our terminology straight.

katre · 2018-06-04T13:59:33Z

The --host_platform flag specifically refers to the host that Bazel is running on, and is separate from the set of execution platforms available to the entire build, or the specific execution platform chosen for a particular configured target (although the host platform is also available for use as an execution platform).

There is definitely a lot of confusion here, unfortunately, which is why I'm working on new platform and toolchain features to simplify the configuration and make this more straightforward. Also, moving more native rules to use toolchain resolution will help.

katre · 2018-11-20T19:42:57Z

Closing this due to it not being consistent with the current direction of the code.

uri-canva · 2023-03-14T22:59:15Z

This is now supported via exec groups: https://bazel.build/extending/exec-groups

Execution groups allow for multiple execution platforms within a single target. Each execution group has its own toolchain dependencies and performs its own toolchain resolution.

jin added type: feature request category: extensibility > configurability labels Jun 1, 2018

katre self-assigned this Jun 1, 2018

katre mentioned this issue Jun 4, 2018

Add a new flag to specify the default execution platform #5322

Closed

talya mentioned this issue Jun 14, 2018

can't use docker spawn strategy on mac #5397

Closed

katre closed this as completed Nov 20, 2018

aiuto removed team-Configurability Issues for Configurability team labels Feb 4, 2019

aiuto added team-Configurability Issues for Configurability team and removed z-category: extensibility > configurability labels Feb 4, 2019

solsjo mentioned this issue Sep 8, 2021

Support for simulating precompiled artifacts VUnit/vunit#733

Open

lukts30 mentioned this issue Nov 10, 2023

Document mixed/heterogeneous builds across different operating systems facebook/buck2#487

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please allow remote execution to have its own Platform, separate from host and target. #5309

Please allow remote execution to have its own Platform, separate from host and target. #5309

jmillikin-stripe commented Jun 1, 2018 •

edited

jin commented Jun 1, 2018

katre commented Jun 1, 2018

jmillikin-stripe commented Jun 1, 2018

katre commented Jun 1, 2018

jmillikin-stripe commented Jun 1, 2018

katre commented Jun 1, 2018

jmillikin-stripe commented Jun 1, 2018

katre commented Jun 4, 2018

ulfjack commented Jun 4, 2018

ulfjack commented Jun 4, 2018

jmillikin-stripe commented Jun 4, 2018

ulfjack commented Jun 4, 2018

katre commented Jun 4, 2018

ulfjack commented Jun 4, 2018

katre commented Jun 4, 2018

jmillikin-stripe commented Jun 4, 2018

ulfjack commented Jun 4, 2018

ulfjack commented Jun 4, 2018

katre commented Jun 4, 2018

katre commented Nov 20, 2018

uri-canva commented Mar 14, 2023

Please allow remote execution to have its own Platform, separate from host and target. #5309

Please allow remote execution to have its own Platform, separate from host and target. #5309

Comments

jmillikin-stripe commented Jun 1, 2018 • edited

Description of the problem / feature request:

What operating system are you running Bazel on?

jin commented Jun 1, 2018

katre commented Jun 1, 2018

jmillikin-stripe commented Jun 1, 2018

katre commented Jun 1, 2018

jmillikin-stripe commented Jun 1, 2018

katre commented Jun 1, 2018

jmillikin-stripe commented Jun 1, 2018

katre commented Jun 4, 2018

ulfjack commented Jun 4, 2018

ulfjack commented Jun 4, 2018

jmillikin-stripe commented Jun 4, 2018

ulfjack commented Jun 4, 2018

katre commented Jun 4, 2018

ulfjack commented Jun 4, 2018

katre commented Jun 4, 2018

jmillikin-stripe commented Jun 4, 2018

ulfjack commented Jun 4, 2018

ulfjack commented Jun 4, 2018

katre commented Jun 4, 2018

katre commented Nov 20, 2018

uri-canva commented Mar 14, 2023

jmillikin-stripe commented Jun 1, 2018 •

edited