Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misses special characters after bare # (e.g. /#(\d+)/) #70

Closed
Tietew opened this issue Sep 28, 2020 · 2 comments
Closed

Misses special characters after bare # (e.g. /#(\d+)/) #70

Tietew opened this issue Sep 28, 2020 · 2 comments

Comments

@Tietew
Copy link

Tietew commented Sep 28, 2020

In ruby's regexp, /#(\d+)/ matches literal # then unnamed capture (\d+).
But Regexp::Parser.parse treats it as literal '#(\d+)'.

irb(main):001:0> /#(\d+)/.match('#123')
=> #<MatchData "#123" 1:"123">
irb(main):002:0> Regexp::Parser.parse('#(\d+)')
=> #<Regexp::Expression::Root:0x00005619a4ad5e30 @type=:expression, @token=:root, @text="", @ts=0, @level=nil, @set_level=nil, @conditional_level=nil, @nesting_level=0, @quantifier=nil, @options={}, @expressions=[#<Regexp::Expression::Literal:0x00005619a4b8cb80 @type=:literal, @token=:literal, @text="#(\\d+)", @ts=0, @level=0, @set_level=0, @conditional_level=0, @nesting_level=1, @quantifier=nil, @options={}>]>
irb(main):003:0> Regexp::Parser.parse('\#(\d+)')
=> #<Regexp::Expression::Root:0x00005619a4be42e0 @type=:expression, @token=:root, @text="", @ts=0, @level=nil, @set_level=nil, @conditional_level=nil, @nesting_level=0, @quantifier=nil, @options={}, @expressions=[#<Regexp::Expression::EscapeSequence::Literal:0x00005619a4bf5310 @type=:escape, @token=:literal, @text="\\#", @ts=0, @level=0, @set_level=0, @conditional_level=0, @nesting_level=1, @quantifier=nil, @options={}>, #<Regexp::Expression::Group::Capture:0x00005619a4bf52c0 @type=:group, @token=:capture, @text="(", @ts=2, @level=0, @set_level=0, @conditional_level=0, @nesting_level=1, @quantifier=nil, @options={}, @expressions=[#<Regexp::Expression::CharacterType::Digit:0x00005619a4bf51d0 @type=:type, @token=:digit, @text="\\d", @ts=3, @level=1, @set_level=0, @conditional_level=0, @nesting_level=2, @quantifier=#<Regexp::Expression::Quantifier:0x00005619a4bf51a8 @token=:one_or_more, @text="+", @mode=:greedy, @min=1, @max=-1>, @options={}>], @number=1, @number_at_level=1>]>

Regexp::Parser.parse('#(\d+)') shoud return same tree as Regexp::Parser.parse('\#(\d+)') (except for first node, it's "#" instead of "\\#")

@Tietew Tietew changed the title Misses capture after bare # (e.g. /#(\d+)/) Misses special characters after bare # (e.g. /#(\d+)/) Sep 28, 2020
@Tietew
Copy link
Author

Tietew commented Sep 28, 2020

Same issue on #[abc] (character class), #(?...) (non-capture grouping), #\1 (backref), #@! (invalid instance vars), #@@! (invalid class vars), #^ (sol anchor, useless but legal), #$ (eol anchor), #\z (eos anchor), etc.

jaynetics added a commit that referenced this issue Sep 28, 2020
Issue #70.

The comment scanner was to greedy. In a way this bug always existed.

Comment-like patterns with a specific shape have always been scanned incorrectly in normal mode, e.g.

```ruby
/foo # \d
/
```

This was just very rare. Prior to the fix of issue #66 via PR #67, the comment scanner only fired for a limited, incomplete subset of valid comments like the one above. With the broadening of the scanner, this bug became much easier to hit upon.
@jaynetics
Copy link
Collaborator

@Tietew i've just released version 1.8.1 with a fix. thank you very much for the detailed report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants