-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert paths, arguments, and env vars to Unicode for remote execution. #15333
Conversation
9bfa94a
to
ee38ef6
Compare
cc @tjgq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not entirely obvious to me that Platform
has the same issue (its contents originate from command line arguments, not Starlark). Adding a test would be the way to be sure, but feel free to punt on it, since non-ASCII values are probably not as common there.
FYI, while this PR is most definitely welcome, I suspect it doesn't completely fix #14381, since that bug report also mentions an encoding issue in bazel aquery
. So feel free to mention it, but I'd suggest leaving it open for now.
src/main/java/com/google/devtools/build/lib/remote/util/Utils.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/util/Utils.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/merkletree/MerkleTree.java
Outdated
Show resolved
Hide resolved
I think the bad I've tweaked this PR so it also fixes
While platform properties can be obtained from CLI flags, they also come from Starlark (the For reference, the following property configurations are equivalent:
And result in the same (incorrect)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tweaked this PR so it also fixes
aquery
, using the new helper functions.
Thanks! Do you mind also adding a test for bazel aquery
, probably in src/test/shell/integration/aquery_test.sh?
While platform properties can be obtained from CLI flags, they also come from Starlark (the
platform
rule orexec_properties
common attribute). I agree that non-ASCII values in platform properties are probably rare, so not going to worry about it for now, but it might be worth fixing in the future.
Thanks, I wasn't aware of the exec_properties
attribute. I agree that we don't have to fix this right now.
src/main/java/com/google/devtools/build/lib/remote/util/Utils.java
Outdated
Show resolved
Hide resolved
Done -- I added a test for Fixing the other For those, I've added a |
80c01c2
to
2c7327c
Compare
Note on the Also, apparently, on Windows strings for file paths are Unicode and strings from Starlark are UTF-8. I extended the logic in Here's how the new
|
* to be durable against unusual encoding settings, and does not guarantee | ||
* that the escaping process is reverseable. | ||
* | ||
* Non-printable ASCII characters are formatted as `{U+00XX}`. Characters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with the standard-ish \uXXXX
notation to make the output copy-pastable into a script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/main/java/com/google/devtools/build/lib/remote/util/Utils.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'm going to import this now. Apologies for my confusion earlier and the review delay (I was out on Friday).
…utput and remote execution protocol. (#15447) For historical reasons, Bazel internally encodes non-ASCII in BUILD/bzl files by taking individual input bytes (assumed to be UTF-8) and storing them in a String as the corresponding Latin1 characters. This encoding must be undone whenever these strings escape to the outside world. Fixes #14381. Closes #15333. PiperOrigin-RevId: 445941013 Co-authored-by: John Millikin <john@john-millikin.com>
Fixes #14381
I didn't touch the
Platform
construction because it seems to be used in parts of Bazel that aren't specific to remote execution. Maybe it'd be fine to change too? Comments welcomed.