Inconsistent behaviour for unmatched subpatterns in `Regex::Matchdata` #12806

straight-shoota · 2022-11-30T14:36:18Z

A regex match with unused optional subpatterns leads to inconsistent and apparently unexpected behaviour when querying information about those subpatterns in Regex::Matchdata.

md = /f(?<x>x)?o/.match("foo").not_nil!

md # => Regex::MatchData("fo" 1:nil)

md[1]?   # => nil (OK)
md["x"]? # => nil (OK)
md[1]   # IndexError: Capture group 1 was not matched (OK)
md["x"] # KeyError: Capture group 'x' was not matched (OK)
md.begin(1) # NilAssertionError: Nil assertion failed (😒)
md.end(1)   # NilAssertionError: Nil assertion failed (😒)
md.byte_begin(1) # => -1 (BUG)
md.byte_end(1)   # => -1 (BUG)

The #[]? and #[] methods handle this condition explicitly, returning nil or raising IndexError or KexError with an error message indicating the problem. A minor downside is that the error type does not distinquish between index out of range and an unused subpattern, but that's probably okay. And you can tell from the error message which error it is.

#begin and #end raise an unspecific NilAssertionError. This should be improved. I think it makes sense to raise IndexError like #[] does. I would also suggest non-raising variants #begin? and #end? which return nil if the subpattern was unmatched (or index is out of range).

#byte_begin and #byte_end both return -1 which should be considered a bug. This is clearly incorrect behaviour.
Error handling for these methods should be like #begin and #end raising IndexError. The additional non-raising variants would also make sense.

The text was updated successfully, but these errors were encountered:

straight-shoota · 2022-11-30T17:03:54Z

Actually, instead of non-raising variants of #begin and #end methods, maybe a bit different approach could be better. I'm thinking about a combined method that returns both begin and end as a single Range(Int32, Int32) value. Retrieving both indices doesn't add any considerable overhead, and I think the API might be a bit more concise with less methods overall.

…`#end` (#12810) Resolves #12806

straight-shoota added kind:bug A bug in the code. Does not apply to documentation, specs, etc. status:discussion topic:stdlib:text labels Nov 30, 2022

straight-shoota mentioned this issue Nov 30, 2022

Raise IndexError on unmatched subpattern for MatchData#begin and #end #12810

Merged

straight-shoota closed this as completed in #12810 Dec 2, 2022

straight-shoota added a commit that referenced this issue Dec 2, 2022

Raise IndexError on unmatched subpattern for MatchData#begin and …

49edc0b

…`#end` (#12810) Resolves #12806

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent behaviour for unmatched subpatterns in `Regex::Matchdata` #12806

Inconsistent behaviour for unmatched subpatterns in `Regex::Matchdata` #12806

straight-shoota commented Nov 30, 2022 •

edited

Loading

straight-shoota commented Nov 30, 2022

Inconsistent behaviour for unmatched subpatterns in Regex::Matchdata #12806

Inconsistent behaviour for unmatched subpatterns in Regex::Matchdata #12806

Comments

straight-shoota commented Nov 30, 2022 • edited Loading

straight-shoota commented Nov 30, 2022

Inconsistent behaviour for unmatched subpatterns in `Regex::Matchdata` #12806

Inconsistent behaviour for unmatched subpatterns in `Regex::Matchdata` #12806

straight-shoota commented Nov 30, 2022 •

edited

Loading