Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctx.fragments.apple.single_arch_cpu returns wrong cpu of execution platform #14291

Closed
thii opened this issue Nov 17, 2021 · 15 comments
Closed

ctx.fragments.apple.single_arch_cpu returns wrong cpu of execution platform #14291

thii opened this issue Nov 17, 2021 · 15 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) platform: apple team-Configurability Issues for Configurability team type: bug

Comments

@thii
Copy link
Member

thii commented Nov 17, 2021

Description of the problem / feature request:

When initiating a build from an M1 Mac with an Intel-based Mac RBE cluster (or vise versa), tools are built for the host platform instead of for the execution platform.

What underlying problem are you trying to solve?

Tools are built for the wrong platform can't run (natively).

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

From an M1 Mac:

git clone https://github.com/bazelbuild/rules_swift.git
cd rules_swift

platform_mappings

platforms:
  //platforms:macos_x86_64
    --cpu=darwin_x86_64

flags:
  --cpu=darwin_x86_64
  --apple_platform_type=macos
    //platforms:macos_x86_64

Add this declaration to the top-level BUILD file:

platform(
    name = "macos_x86_64",
    constraint_values = [
        "@platforms//cpu:x86_64",
        "@platforms//os:macos",
    ],
)
bazel build --spawn_strategy=remote --strategy=SwiftCompile=remote --extra_execution_platforms=//:macos_x86_64 --remote_executor=your.remote.executor //examples/xplatform/hello_world # plus your rbe specific flags
ERROR: /path/rules_swift/examples/xplatform/hello_world/BUILD:5:13: Compiling Swift module examples_xplatform_hello_world_hello_world failed: (Exit 34): Remote Execution Failure:
Internal: Task "/uploads/31921032-458b-4742-8c79-14c59b707d87/blobs/d90fb23d69726ae4bb04ba31e9493c2167b159bf53b4a3d8a6d66c83bc8ab9ba/195" already attempted 5 times. Last failure: rpc error: code = Unavailable desc = fork/exec bazel-out/darwin_arm64-opt-exec-8F99CFCD/bin/tools/worker/worker: bad CPU type in executable

java.io.IOException: com.google.devtools.build.lib.remote.ExecutionStatusException: INTERNAL: Task "/uploads/31921032-458b-4742-8c79-14c59b707d87/blobs/d90fb23d69726ae4bb04ba31e9493c2167b159bf53b4a3d8a6d66c83bc8ab9ba/195" already attempted 5 times. Last failure: rpc error: code = Unavailable desc = fork/exec bazel-out/darwin_arm64-opt-exec-8F99CFCD/bin/tools/worker/worker: bad CPU type in executable
        at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.executeRemotely(GrpcRemoteExecutor.java:226)
        at com.google.devtools.build.lib.remote.RemoteExecutionService.executeRemotely(RemoteExecutionService.java:1218)
        at com.google.devtools.build.lib.remote.RemoteSpawnRunner.lambda$exec$2(RemoteSpawnRunner.java:264)
        at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:244)
        at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:125)
        at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:114)
        at com.google.devtools.build.lib.remote.RemoteSpawnRunner.exec(RemoteSpawnRunner.java:239)
        at com.google.devtools.build.lib.exec.SpawnRunner.execAsync(SpawnRunner.java:245)
        at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:146)
        at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:108)
        at com.google.devtools.build.lib.actions.SpawnStrategy.beginExecution(SpawnStrategy.java:47)
        at com.google.devtools.build.lib.exec.SpawnStrategyResolver.beginExecution(SpawnStrategyResolver.java:68)
        at com.google.devtools.build.lib.analysis.actions.SpawnAction.beginExecution(SpawnAction.java:328)
        at com.google.devtools.build.lib.actions.Action.execute(Action.java:134)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$5.execute(SkyframeActionExecutor.java:909)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:1078)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:1033)
        at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:152)
        at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:91)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:496)
        at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:856)
        at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.computeInternal(ActionExecutionFunction.java:349)
        at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:169)
        at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:590)
        at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:382)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: com.google.devtools.build.lib.remote.ExecutionStatusException: INTERNAL: Task "/uploads/31921032-458b-4742-8c79-14c59b707d87/blobs/d90fb23d69726ae4bb04ba31e9493c2167b159bf53b4a3d8a6d66c83bc8ab9ba/195" already attempted 5 times. Last failure: rpc error: code = Unavailable desc = fork/exec bazel-out/darwin_arm64-opt-exec-8F99CFCD/bin/tools/worker/worker: bad CPU type in executable
        at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.handleStatus(GrpcRemoteExecutor.java:70)
        at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.getOperationResponse(GrpcRemoteExecutor.java:82)
        at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.lambda$executeRemotely$0(GrpcRemoteExecutor.java:185)
        at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:244)
        at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:125)
        at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:114)
        at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.lambda$executeRemotely$1(GrpcRemoteExecutor.java:139)
        at com.google.devtools.build.lib.remote.util.Utils.refreshIfUnauthenticated(Utils.java:519)
        at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.executeRemotely(GrpcRemoteExecutor.java:137)
        ... 27 more
Target //examples/xplatform/hello_world:hello_world failed to build
INFO: Elapsed time: 15.413s, Critical Path: 14.76s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully

What operating system are you running Bazel on?

macOS Big Sur 11.6

What's the output of bazel info release?

release 6.0.0-pre.20211101.2

Any other information, logs, or outputs that you want to share?

The exec cpu value is coming from here https://github.com/bazelbuild/rules_swift/blob/5ea8f0a1a0b9372408a9d0a85f2cd81b6c9f510b/swift/internal/xcode_swift_toolchain.bzl#L643; a workaround is to replace it with the cpu from cc_toolchain:

@brentleyjones
Copy link
Contributor

cc: @kaylathar

@brentleyjones
Copy link
Contributor

It seems the new apple platforms stuff will fix this, but it would be useful to have this fixed in some way in the 5.0 release.

@brentleyjones
Copy link
Contributor

@meteorcloudy can you help get this triaged? Thanks!

@meteorcloudy
Copy link
Member

/cc @susinmotion for apple support
/cc @coeuvre for REB support

Can you help look into this issue?

@susinmotion
Copy link
Contributor

Sadly this is a bit out of my field of expertise... @katre, do you know the timeline for this?

@katre
Copy link
Member

katre commented Jan 24, 2022

I would have expected the platform mapping to work there, and cause the flag to be set properly.

Is it possible to reproduce this without the Apple M1 hardware?

@gregestren gregestren added the team-Configurability Issues for Configurability team label Jan 24, 2022
@gregestren
Copy link
Contributor

Quickly looking I'm not sure if this is a platforms vs. Apple rules issue. But i think configurability can offer more input on the question as it now stands, so tagging team-Configurability for now.

@gregestren gregestren added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Jan 26, 2022
thii added a commit to bazelbuild/rules_swift that referenced this issue Jan 26, 2022
…`ctx.fragments.apple` when building for darwin cpus (#749)

Workaround for bazelbuild/bazel#14291.

`ctx.fragments.apple.single_arch_cpu` is returning the default macos
cpu for the host platform for tools while it should return the cpu of
the execution platform.
@thii
Copy link
Member Author

thii commented Jan 27, 2022

@katre You can reproduce this from an Intel Mac (and probably Linux too):

Create a Bazel workspace with the following files:

my_binary.bzl

def _impl(ctx):
    ctx.actions.expand_template(
        template = ctx.file._binary_template,
        output = ctx.outputs.out,
        substitutions = {"%CPU%": ctx.fragments.apple.single_arch_cpu},
        is_executable = True,
    )
    return DefaultInfo(executable = ctx.outputs.out)

my_binary = rule(
    _impl,
    attrs = {
        "out": attr.output(),
        "_binary_template": attr.label(
            allow_single_file = True,
            default = Label("//:binary.sh.tpl"),
        ),
    },
    fragments = ["apple"],
)

my_rule.bzl

def _impl(ctx):
    ctx.actions.run(
        executable = ctx.executable.tool,
        arguments = [ctx.outputs.out.path],
        outputs = [ctx.outputs.out],
    )
    return DefaultInfo(files = depset([ctx.outputs.out]))

my_rule = rule(
    _impl,
    attrs = {
        "out": attr.output(),
        "tool": attr.label(
            cfg = "exec",
            executable = True,
        ),
    },
    fragments = ["apple"],
)

binary.sh.tpl

#!/bin/bash
# This is an %CPU% binary
echo %CPU% > "$1"

BUILD

load(":my_binary.bzl", "my_binary")
load(":my_rule.bzl", "my_rule")

my_binary(
    name = "bin",
    out = "bin.exe",
)

my_rule(
    name = "a",
    out = "a.out",
    tool = ":bin",
)

platform(
    name = "macos_arm64",
    constraint_values = [
        "@platforms//cpu:arm64",
        "@platforms//os:macos",
    ],
)

platform_mappings

platforms:
  //:macos_arm64
    --cpu=darwin_arm64

flags:
  --cpu=darwin_arm64
  --apple_platform_type=macos
    //:macos_arm64

Build the a target with the extra execution platform:

bazel build -s --extra_execution_platforms=//:macos_arm64 a
INFO: Analyzed target //:a (4 packages loaded, 19 targets configured).
INFO: Found 1 target...
SUBCOMMAND: # //:a [action 'Action a.out', configuration: 5456a5ed8cf67b8b3fed76a2365aa2f629f11b9ca3005aea194fdff9b065d983, execution platform: //:macos_arm64]
(cd /private/var/tmp/_bazel_admin/e5ea37a2992896f774f28a77d604884a/execroot/__main__ && \
  exec env - \
  bazel-out/darwin_arm64-opt-exec-C16863B4/bin/bin.exe bazel-out/darwin-fastbuild/bin/a.out)
# Configuration: 5456a5ed8cf67b8b3fed76a2365aa2f629f11b9ca3005aea194fdff9b065d983
# Execution platform: //:macos_arm64
Target //:a up-to-date:
  bazel-bin/a.out
INFO: Elapsed time: 0.501s, Critical Path: 0.25s
INFO: 5 processes: 4 internal, 1 local.
INFO: Build completed successfully, 5 total actions

Inspect the tool:

cat bazel-out/darwin_arm64-opt-exec-C16863B4/bin/bin.exe
#!/bin/bash
# This is an x86_64 binary
echo x86_64 > "$1"

The expected contents of the bin.exe file is

#!/bin/bash
# This is an arm64 binary
echo arm64 > "$1"

@katre
Copy link
Member

katre commented Jan 28, 2022

Solid reproduction. I'll take a look, thanks.

@katre
Copy link
Member

katre commented Jan 28, 2022

Okay, I've created a repo for the repro, and updated it to get more information, this is interesting:

cpu: darwin_arm64
apple.single_arch_cpu: x86_64
platform: //:macos_arm64
host platform: @local_config_platform//:host

So the cpu and platform are as expected, it's just apple.single_arch_cpu that's wrong. Time to debug.

@katre
Copy link
Member

katre commented Jan 28, 2022

Okay, after far too much debugging it turns out to be fairly simple so far:

  1. The implementation of single_architecture_cpu in this case simply returns the value of --ios_cpu
  2. The default value for --ios_cpu is x86_64
  3. The exec transition doesn't do anything with --ios_cpu, so it simply stays as the default.

I tried explicitly setting --ios_cpu in the platform_mappings, and it still sees the default, which is confusing.

@katre
Copy link
Member

katre commented Jan 28, 2022

Okay, there we go. I mis-read the code while debugging: AppleConfiguration.getSingleArchitecture is returning the value of --macos_cpus. And, when I add --macos_cpus=arm64 to the platform mapping, the value of ctx.fragments.apple.single_architecture_cpu is set to arm64.

I don't know if this will fix your original crash, but give it a try.

If you think the value of ctx.fragments.apple.single_architecture_cpu should default to picking something from --cpu if --cpu starts with darwin_ (similarly to what happens for ios), then update this issue and we'll re-assign it to the Apple rules team.

Or send a PR, the code for this should be very similar.

@keith
Copy link
Member

keith commented Jan 28, 2022

Note that --ios_cpu is publicly unused, but sounds like it's still used at google so we can't remove it #13872

@thii
Copy link
Member Author

thii commented Jan 31, 2022

I sent a PR here, please take a look:

@meteorcloudy meteorcloudy added this to the 5.1 release blockers milestone Feb 1, 2022
@Wyverald
Copy link
Member

Wyverald commented Feb 3, 2022

@bazel-io fork 5.1

Wyverald pushed a commit that referenced this issue Feb 9, 2022
… tools when host cpu and exec cpu are different (#14751)

Fixes #14291

Closes #14665.

PiperOrigin-RevId: 425886938
(cherry picked from commit fce7ea8)

Co-authored-by: Thi Doan <t@thi.im>
tymurmustafaiev pushed a commit to tymurmustafaiev/rules_swift that referenced this issue Jul 19, 2023
…`ctx.fragments.apple` when building for darwin cpus (bazelbuild#749)

Workaround for bazelbuild/bazel#14291.

`ctx.fragments.apple.single_arch_cpu` is returning the default macos
cpu for the host platform for tools while it should return the cpu of
the execution platform.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) platform: apple team-Configurability Issues for Configurability team type: bug
Projects
None yet
8 participants