Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve set handling #55

Merged
merged 29 commits into from Aug 28, 2018
Merged

Improve set handling #55

merged 29 commits into from Aug 28, 2018

Conversation

jaynetics
Copy link
Collaborator

@jaynetics jaynetics commented Apr 30, 2018

This makes CharacterSet a standard Subexpression as suggested in #47 (comment)

All equivalent tokens result in the same Scanner and Parser emissions as outside of sets.

New CharacterSet::Range and CharacterSet::Intersection expressions represent respective trees.

Other notable changes are:

example from type, token to type, token from exp to exp
[\b] :set, :backspace :escape, :backspace none/String ES::Backspace
[[:xy:]] :set, :char_xy :posixclass, :xy none/String PosixClass
[[:^xy:]] :set, :char_nonxy :nonposixclass, :xy none/String PosixClass
\x20 :escape, :hex :escape, :hex ES::Literal ES::Hex
\x20 :escape, :octal :escape, :octal ES::Literal ES::Octal
\u1234 :escape, :codepoint :escape, :codepoint ES::Literal ES::Codepoint
\u{12 34} :escape, :codepoint_list :escape, :codepoint_list ES::Literal ES::CodepointList

@ammar What do you think? The commit messages provide a bit more explanation if you are wondering about some of the changes, but feel free to suggest any other solution.

# Conflicts:
#	lib/regexp_parser/expression/subexpression.rb
#	lib/regexp_parser/lexer.rb
#	lib/regexp_parser/parser.rb
#	lib/regexp_parser/scanner/scanner.rl
# Conflicts:
#	lib/regexp_parser/syntax/ruby/1.8.6.rb
#	lib/regexp_parser/syntax/ruby/1.9.1.rb
#	lib/regexp_parser/syntax/ruby/2.0.0.rb
#	test/syntax/versions/test_1.8.rb
@jaynetics
Copy link
Collaborator Author

jaynetics commented Apr 30, 2018

Turns out I should have read the docs...
https://github.com/k-takata/Onigmo/blob/79114095/doc/RE#L155-L156

Intersections apply to all expressions in their set, not just adjacent ones.

'abc1'.scan(/[a b \d && b c [:digit:]]/x) # => ["b", "1"]
'abc1'.scan(/[^a b \d && b c [:digit:]]/x) # => ["a", "c"]

So maybe Intersection parse results need to look somewhat like this:

RP.parse(/[a&&b]/).first.first # =>
  #<Intersection @expressions=[
    #<Intersection::Left @expressions=[
      #<Literal @text="a"/>
    ],
    #<Intersection::Right @expressions=[
      #<Literal @text="b"/>
    ]/>
  ]/>

Now that would require quite a bit of tree restructuring while parsing.

Not to mention that there can be more than one intersection:

'abc1&'.scan(/[abc && ab && bc]/x) # => ["b"]

Another option could be to treat Sets as group of Sequences by default, which, however, might make them harder to handle just for this somewhat exotic feature.

Hmmm ...

@jaynetics
Copy link
Collaborator Author

I'm reasonably happy with this now ...

... this fixes the `>` and `l` #strfregexp_tree parts of Alternation (currently broken on master) and the new Intersection expression.

Maybe #level should be renamed #group_level and #nesting_level should become #level instead, for clarity sake?
# Conflicts:
#	ChangeLog
#	test/scanner/test_sets.rb
#	test/warnings.yml
@jaynetics jaynetics merged commit 87beeea into master Aug 28, 2018
@jaynetics jaynetics deleted the improve_set_handling branch August 28, 2018 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant