Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK 20 update broke non-ascii argument encoding on macOS #18792

Closed
tetromino opened this issue Jun 27, 2023 · 8 comments
Closed

JDK 20 update broke non-ascii argument encoding on macOS #18792

tetromino opened this issue Jun 27, 2023 · 8 comments
Labels
P2 We'll consider working on this in future. (Assignee optional) team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug

Comments

@tetromino
Copy link
Contributor

Description of the bug:

Non-ascii arguments to spawned tools (both in genrule and in ctx.actions.run) are encoded as mojibake garbage with Bazel at HEAD on macOS.

Everything works fine with Bazel 6.2.1 on macOS, and with any Bazel version (including at HEAD) on Linux.

bazelisk bisect identified ecf9b9a as the culprit.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

BUILD file:

genrule(
    name = "genrule",
    cmd = "echo привет > \"$@\"",
    outs = ["genrule.out"],
)

With Bazel at HEAD on macOS, get garbage:

$ USE_BAZEL_VERSION=last_green bazelisk build //:genrule && cat bazel-bin/genrule.out
2023/06/27 14:55:01 Using unreleased version at commit 044a14cca2747aeff258fc71eaeb153c08cb34d5
INFO: Analyzed target //:genrule (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //:genrule up-to-date:
  bazel-bin/genrule.out
INFO: Elapsed time: 0.152s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
пÑ�ивеÑ

With Bazel 6.2.1 on macOS, get expected result:

$ bazelisk build //:genrule && cat bazel-bin/genrule.out
INFO: Analyzed target //:genrule (5 packages loaded, 7 targets configured).
INFO: Found 1 target...
Target //:genrule up-to-date:
  bazel-bin/genrule.out
INFO: Elapsed time: 3.882s, Critical Path: 0.10s
INFO: 2 processes: 1 internal, 1 darwin-sandbox.
INFO: Build completed successfully, 2 total actions
привет

Which operating system are you running Bazel on?

macOS Ventura 13.4.1

What is the output of bazel info release?

development version

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

Bazelisk with USE_BAZEL_VERSION=last_green

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

ecf9b9a

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@aiuto
Copy link
Contributor

aiuto commented Jun 27, 2023

/sub

@aiuto
Copy link
Contributor

aiuto commented Jun 27, 2023

I think there are ways to make this fail on linux too.
If we spill args to a file instead of the command line I am seeing that a single 'é' will emit as the 4 octal bytes 303 203 302 251.

é = u00E9 =utf-8=> C3 A9 =(utf-8 again)=> C3 83 C2 A9 =%o=> 303 203 302 251 

That's a clear 8859-1 -> UTF-8 mis-encoding.

@iancha1992 iancha1992 added the team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. label Jun 27, 2023
@meisterT
Copy link
Member

cc @zhengwei143 @coeuvre

@aiuto
Copy link
Contributor

aiuto commented Jun 28, 2023

This is a problem for linux also if you do it right.

bazelbuild/rules_pkg#712

  • is a PR that should be good, but fails on a unicode problem only on macos with bazel at head.
  • it creates an action by making an args list of a bunch of strings. When the command line gets built, the args are written just fine.

bazelbuild/rules_pkg#714

  • takes the same code base, but creates the action by doing the proper thing - calling ctx.actions.args() and using arg.append for each arg.
  • it we don't force spliling args to a file. It mostly works just like 712.
  • if we do force writing the params file, it fails in many ways.

@tetromino
Copy link
Contributor Author

I think that the CI failures in bazelbuild/rules_pkg#714 may be a red herring: it's bug (or more likely, several bugs) in the argparse module in certain versions of python, not a bug in Bazel :/

Look at the errors in https://buildkite.com/bazel/rules-pkg/builds/2466#01890405-7614-4d1f-a408-14e3bff80a92: there is no good reason for argparse to decode the param file using ascii given that you have set PYTHONIOENCODING, PYTHONUTF8, LANG, and LC_CTYPE.

I think argparse will finally be properly fixed in python 3.12 (currently in beta) by python/cpython#93277

@tetromino
Copy link
Contributor Author

But bazelbuild/rules_pkg#714 does demonstrate a Bazel failure specific to macOS: the param file (bazel-out/darwin_x86_64-fastbuild/bin/tests/deb/fizzbuzz_4.5.6_all.deb-0.params) gets incorrectly encoded.

@aiuto
Copy link
Contributor

aiuto commented Jun 29, 2023 via email

@tetromino
Copy link
Contributor Author

tetromino commented Jun 29, 2023

I think there are at least two separate problems in Bazel here, with separate code paths:

  • argument vector encoding when spawning a process on macos - fixed by f00439d
  • parameter file encoding on macos

copybara-service bot pushed a commit that referenced this issue Jun 29, 2023
…encoding

Starting with JDK 19, on Unix platforms, the argument vector passed to
ProcessBuilder gets encoded to bytes by java.lang.ProcessImpl using the
sun.jnu.encoding encoding.

This causes a problem on macOS, where
* we switched to JDK 20 as of ecf9b9a
* (as on all platforms) argument strings originating from Starlark are
  in Bazel's pseudo latin1 encoding (i.e. byte arrays stored as strings)
* sun.jnu.encoding is hard-coded as utf-8 by the JVM, regardless of
  what we set for the file.encoding property.

This means we need to recode argv from pseudo latin1 to utf-8 before
passing them to ProcessBuilder, so that ProcessImpl can reverse the
process and pass correctly encoded byte arrays to the OS.

Partially addresses #18792

PiperOrigin-RevId: 544366528
Change-Id: I1acb70e489123e0baa190c569e6625259c39de78
@zhengwei143 zhengwei143 removed their assignment Jul 3, 2023
@Wyverald Wyverald added team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website and removed team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. labels Jul 11, 2023
@meteorcloudy meteorcloudy added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug
Projects
None yet
Development

No branches or pull requests

9 participants