Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes during escaped Unicode surrogate pairs parsing #855

Open
RazrFalcon opened this issue Jun 4, 2022 · 7 comments
Open

Crashes during escaped Unicode surrogate pairs parsing #855

RazrFalcon opened this issue Jun 4, 2022 · 7 comments
Assignees

Comments

@RazrFalcon
Copy link

> ruby-parse -v
ruby-parse based on parser version 3.1.2.0

> ruby-parse --32 -E -e '"\\u{D800}"'
Failed on: (fragment:0)
/Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/lexer.rb:17506:in `chr': invalid codepoint 0xD800 in UTF-8 (RangeError)
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/lexer.rb:17506:in `block in advance'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/lexer.rb:17494:in `each'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/lexer.rb:17494:in `advance'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/lexer/explanation.rb:19:in `advance'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/base.rb:252:in `next_token'
	from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/racc/parser.rb:259:in `_racc_do_parse_c'
	from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/racc/parser.rb:259:in `do_parse'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/base.rb:190:in `parse'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner/ruby_parse.rb:141:in `process'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner.rb:254:in `process_buffer'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner.rb:231:in `block in process_fragments'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner.rb:225:in `each'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner.rb:225:in `each_with_index'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner.rb:225:in `process_fragments'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner.rb:215:in `block in process_all_input'
	from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/benchmark.rb:293:in `measure'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner.rb:214:in `process_all_input'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner/ruby_parse.rb:137:in `process_all_input'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner.rb:35:in `execute'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner.rb:13:in `go'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/bin/ruby-parse:7:in `<top (required)>'
	from /usr/local/bin/ruby-parse:23:in `load'
	from /usr/local/bin/ruby-parse:23:in `<main>'

> ruby -v
ruby 2.6.8p205 (2021-07-07 revision 67951) [universal.arm64e-darwin21]

> ruby -e '"\\u{D800}"'
-e:1: invalid Unicode codepoint
"\u{D800}"

I would assume that U+D800...U+DFFF should be ignored.

@iliabylich
Copy link
Collaborator

I can't reproduce it locally:

$ ruby -v bin/ruby-parse --32 -E -e '"\\u{D800}"'
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
"\\u{D800}"
^~~~~~~~~~~ tSTRING "\\u{D800}"                 expr_end     [0 <= cond] [0 <= cmdarg]
"\\u{D800}"
           ^ false "$eof"                       expr_end     [0 <= cond] [0 <= cmdarg]
(str "\\u{D800}")

$ ruby -ve 'p "\\u{D800}"'
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
"\\u{D800}"

Is it related to an old version of Ruby? Could you try it on a version of Ruby that is still supported (i.e. at least 2.7)

My hunch is that old Ruby has old Unicode support that doesn't know about these codepoints.

@RazrFalcon
Copy link
Author

RazrFalcon commented Jun 4, 2022

This is the default Ruby on macos. I'm not sure if you do support it.

@iliabylich
Copy link
Collaborator

No, Ruby 2.7 is deprecated since 2022-04-12. We do run tests for 2.6.10 on CI, and at least this version works well. You can use rbenv/RVM or whatever is popular these days to install a newer version of Ruby.

I'm closing it, but feel free to reopen it if the error appears again for you with maintained Ruby versions (>= 2.7)

@RazrFalcon
Copy link
Author

Am I still doing something wrong?

> ruby -v
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]
> /opt/homebrew/lib/ruby/gems/3.1.0/gems/parser-3.1.2.0/bin/ruby-parse --32 -E -e '"\\u{D800}"'
Failed on: (fragment:0)
/opt/homebrew/lib/ruby/gems/3.1.0/gems/parser-3.1.2.0/lib/parser/lexer.rb:17506:in `chr': invalid codepoint 0xD800 in UTF-8 (RangeError)
...

@RazrFalcon
Copy link
Author

Same, but using current master:

> ruby -v bin/ruby-parse --32 -E -e '"\\u{D800}"'
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]
Failed on: (fragment:0)
/opt/homebrew/lib/ruby/gems/3.1.0/gems/parser-3.1.2.0/lib/parser/lexer.rb:17506:in `chr': invalid codepoint 0xD800 in UTF-8 (RangeError)

@iliabylich
Copy link
Collaborator

Sorry, bash escaping issue, I should've checked this code in a separate file. My bad.

$ /bin/cat test.rb
"\u{D800}"

$ ruby -v test.rb
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
test.rb:1: invalid Unicode codepoint
"\u{D800}"

$ ruby -v bin/ruby-parse --32 test.rb
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
Failed on: test.rb
/Users/ilyabylich/Work/parser/lib/parser/lexer.rb:17506:in `chr': invalid codepoint 0xD800 in UTF-8 (RangeError)
...
stacktrace
...

This is a bug and it should be fixed, reopening.

The error comes from this line, codepoint is "D800".to_i(16) == 55296 and so Ruby gives an error on converting a codepoint to a character:

=> "D800".to_i(16).chr(Encoding::UTF_8)
RangeError (invalid codepoint 0xD800 in UTF-8)

I'm pretty sure we need to catch a RangeError and emit it as a :invalid_unicode_escape diagnostic (that's what Ruby parser does).

I'll fix it next week, thanks for reporting.

@iliabylich iliabylich reopened this Jun 4, 2022
@iliabylich iliabylich self-assigned this Jun 4, 2022
@RazrFalcon
Copy link
Author

Sure, no problem. I was running it in Fish and didn't even though about shell escaping differences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants