Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue parsing surrogate pair encoded emoji in pipeline JSON file #1798

Open
jim opened this issue Oct 21, 2022 · 3 comments
Open

Issue parsing surrogate pair encoded emoji in pipeline JSON file #1798

jim opened this issue Oct 21, 2022 · 3 comments

Comments

@jim
Copy link

jim commented Oct 21, 2022

We're seeing an issue where the buildkite agent fails to parse the JSON-formatted pipeline file we're passing to buildkite-agent pipeline upload:

Pipeline parsing of "tmpjdd2BG" failed (Failed to parse tmpjdd2BG: line 531: found invalid Unicode character escape code)

I was able to reproduce the issue on the Go Playground using the same YAML library that the agent uses. The Go standard library JSON parser handles the input without an issue.

I found a related issue on the upstream go-yaml project.

@triarius
Copy link
Contributor

Hi, we can't commit to fixing at the moment, but we will accept back-ported fixes from upstream or elsewhere. In the mean time you could try to replace unicode emojis with buildkite's emoji syntax from https://github.com/buildkite/emojis. eg :bug: will be rendered as some form of 🐛.

@DrJosh9000
Copy link
Contributor

I was having a play with this to see if the upgrade to gopkg.in/yaml.v3 (#1930) helped in any way. It does not.

But here's another maybe-workaround for you: yaml.v3 can parse UTF-16 strings, little- or big-endian, where such surrogate pairs are needed for the higher codepoints. (Hmm. Why use surrogate pairs in UTF-8?) There are even tests: https://github.com/go-yaml/yaml/blob/f6f7691b1fdeb513f56608cd2c32c51f8194bf51/decode_test.go#L742

But, the parser seems to rely on there being a byte-order mark (https://go.dev/play/p/EUgD5iJlI2m) even though it is supposed to be able match the correct encoding without one (https://yaml.org/spec/1.2.2/ section 5.2).

@123sarahj123
Copy link
Contributor

A PR has been created to add support for surrogate pairs: go-yaml/yaml#1029

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants