Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unicode bug with "format_raw_parse_error" sample #1044

Merged
merged 2 commits into from
Feb 10, 2021
Merged

Fix unicode bug with "format_raw_parse_error" sample #1044

merged 2 commits into from
Feb 10, 2021

Conversation

mattbaker
Copy link
Contributor

@mattbaker mattbaker commented Feb 9, 2021

Bit of a strange one, but I think I tracked it down, so I'm curious what folks think.

We ran into this issue after a user started sending some strange input our way that was (I suspect correctly) failing to parse. The weird part was we were seeing exceptions inside Absinthe.Plug as it attempted to JSON encode an error message to send back to the user, the exceptions looked something like this:

(Jason.EncodeError) invalid byte 0xE2 in <<80, 97, 114, 115, 105, 110, 103, 32, 102, 97, 105, 108, 101, 100, 32, 97, 116, 32, 96, 226, 128, 156, 110, 97, 109, 101, 61, 226, 128, 96>>

(jason 1.2.2) lib/jason.ex:150: Jason.encode!/2
(absinthe_plug 1.5.0) lib/absinthe/plug.ex:507: Absinthe.Plug.encode/4
(endpoint 0.0.1) lib/plug/router.ex:284: Endpoint.Pipeline.dispatch/2
[...]

The bytes 226 (0xE2) and 128 (0x80) are of interest here. If you remove them we have a valid string again:

"Parsing failed at `“name=`"

Which leads us to Absinthe.Phase.Parse.format_raw_parse_error/1 where that message is generated.

  defp format_raw_parse_error({:lexer, rest, {line, column}}) do
    <<sample::binary-size(10), _::binary>> = rest

    message = "Parsing failed at `#{sample}`"
    %Phase.Error{message: message, locations: [%{line: line, column: column}], phase: __MODULE__}
  end

My hypothesis is this:

  • Bad input arrived and failed to parse
  • <<sample::binary-size(10), _::binary>> grabbed the problematic character and the next ten bytes for the sample.
  • The tenth character happened to be a unicode character, represented by multiple bytes. For example, an em-dash ( == <<226, 128, 148>>).
  • The binary pattern match truncated the string in the middle of the UTF8 character, producing a binary that was no longer a valid string (a string ending in <<226, 128>> for example)

In practice no exception occurred until the output of Absinthe.Phase.Parse.format_raw_parse_error/1 made it to the JSON encoding step in the Absinthe.Plug pipeline, which was expecting a valid string. I think the tests here can still demonstrate the edge case though.

Without knowing more, the most straightforward solution I can think of is to use String.slice which is unicode aware unless there's a reason we can't use it, but it did get the tests passing.

The tests demonstrate two things. The first is a repro of the error I believe we're experiencing, the second is an edge case with very short, invalid documents that occurred to me while I was looking at the code.

%Absinthe.Phase.Error{
extra: %{},
locations: [%{column: 3, line: 1}],
message: "Parsing failed at `;bbbbbbbb—`",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're thinking character-oriented and not byte oriented I would expect 10 characters. In practice this test will fail because the message will include a sample of ten bytes.

end

@graphql ";"
test "should provide sample of parsing failure on very short query strings" do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something we experienced, but something that occurred to me when I was thinking about the binary pattern match.

@binaryseed
Copy link
Contributor

Great find! Seems like your proposed fix is a good one, let's see those tests turn green

@mattbaker
Copy link
Contributor Author

Awesome, updated with the fix @binaryseed !

@binaryseed binaryseed merged commit eb7c307 into absinthe-graphql:master Feb 10, 2021
@binaryseed binaryseed changed the title Demonstrate edge cases with "format_raw_parse_error" sample Fix unicode bug with "format_raw_parse_error" sample Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants