Skip to content

RegEx pointless backtracking between non-overlapping sub-patterns #22636

@MasterInQuestion

Description

@MasterInQuestion

    https://github.com/MasterInQuestion/coordTransform/commit/dabc782b3e200d81210032f4dc6222b344e13d0b
    .
https://github.com/Perl/perl5/issues/22636#issuecomment-2387520738
    “Backtracking past "d" for "e" alike, would be entirely pointless: for non-overlapping impossible to match.
    Alike in linked past "-" (or ".") for "\d" etc.”

    Test case:
[[

> perl -e "'deEfficiency' =~ /d?(?:eficiency|[eE]fficiency)/;" -M"re"="debug"
Compiling REx "d?(?:eficiency|[eE]fficiency)"
Final program:
   1: CURLY{0,1} (7)
   5:   EXACT <d> (0)
   7: BRANCH (buf:0/0) (13)
   9:   EXACT <eficiency> (22)
  13: BRANCH (buf:0/0) (FAIL)
  15:   ANYOFM[Ee] (17)
  17:   EXACT <fficiency> (22)
  21: TAIL (22)
  22: END (0)
minlen 9
Matching REx "d?(?:eficiency|[eE]fficiency)" against "deEfficiency"
   0 <> <deEfficien>         |   0| 1:CURLY{0,1}(7)
                             |   0| EXACT <d> can match 1 times out of 1...
   1 <d> <eEfficienc>        |   1|  7:BRANCH (buf:0/0)(13)
   1 <d> <eEfficienc>        |   2|   9:EXACT <eficiency>(22)
                             |   2|   failed...
   1 <d> <eEfficienc>        |   1|  13:BRANCH (buf:0/0)(21)
   1 <d> <eEfficienc>        |   2|   15:ANYOFM[Ee](17)
   2 <de> <Efficiency>       |   2|   17:EXACT <fficiency>(22)
                             |   2|   failed...
                             |   1|  BRANCH failed...
   0 <> <deEfficien>         |   1|  7:BRANCH (buf:0/0)(13)
   0 <> <deEfficien>         |   2|   9:EXACT <eficiency>(22)
                             |   2|   failed...
   0 <> <deEfficien>         |   1|  13:BRANCH (buf:0/0)(21)
   0 <> <deEfficien>         |   2|   15:ANYOFM[Ee](17)
                             |   2|   failed...
                             |   1|  BRANCH failed...
                             |   0| failed...
   1 <d> <eEfficienc>        |   0| 1:CURLY{0,1}(7)
                             |   0| EXACT <d> can match 0 times out of 1...
   1 <d> <eEfficienc>        |   1|  7:BRANCH (buf:0/0)(13)
   1 <d> <eEfficienc>        |   2|   9:EXACT <eficiency>(22)
                             |   2|   failed...
   1 <d> <eEfficienc>        |   1|  13:BRANCH (buf:0/0)(21)
   1 <d> <eEfficienc>        |   2|   15:ANYOFM[Ee](17)
   2 <de> <Efficiency>       |   2|   17:EXACT <fficiency>(22)
                             |   2|   failed...
                             |   1|  BRANCH failed...
                             |   0| failed...
   2 <de> <Efficiency>       |   0| 1:CURLY{0,1}(7)
                             |   0| EXACT <d> can match 0 times out of 1...
   2 <de> <Efficiency>       |   1|  7:BRANCH (buf:0/0)(13)
   2 <de> <Efficiency>       |   2|   9:EXACT <eficiency>(22)
                             |   2|   failed...
   2 <de> <Efficiency>       |   1|  13:BRANCH (buf:0/0)(21)
   2 <de> <Efficiency>       |   2|   15:ANYOFM[Ee](17)
   3 <deE> <fficiency>       |   2|   17:EXACT <fficiency>(22)
  12 <deEfficiency> <>       |   2|   22:END(0)
Match successful!
Freeing REx: "d?(?:eficiency|[eE]fficiency)"

> perl -e "'deEfficiency' =~ /d?+(?:eficiency|[eE]fficiency)/;" -M"re"="debug"
Compiling REx "d?+(?:eficiency|[eE]fficiency)"
Final program:
   1: SUSPEND (11)
   3:   CURLY{0,1} (9)
   7:     EXACT <d> (0)
   9:   SUCCEED (0)
  10: TAIL (11)
  11: BRANCH (buf:0/0) (17)
  13:   EXACT <eficiency> (26)
  17: BRANCH (buf:0/0) (FAIL)
  19:   ANYOFM[Ee] (21)
  21:   EXACT <fficiency> (26)
  25: TAIL (26)
  26: END (0)
minlen 9
Matching REx "d?+(?:eficiency|[eE]fficiency)" against "deEfficiency"
   0 <> <deEfficien>         |   0| 1:SUSPEND(11)
   0 <> <deEfficien>         |   1|  3:CURLY{0,1}(9)
                             |   1|  EXACT <d> can match 1 times out of 1...
   1 <d> <eEfficienc>        |   2|   9:SUCCEED(0)
                             |   2|   SUCCEED: subpattern success...
   1 <d> <eEfficienc>        |   0| 11:BRANCH (buf:0/0)(17)
   1 <d> <eEfficienc>        |   1|  13:EXACT <eficiency>(26)
                             |   1|  failed...
   1 <d> <eEfficienc>        |   0| 17:BRANCH (buf:0/0)(25)
   1 <d> <eEfficienc>        |   1|  19:ANYOFM[Ee](21)
   2 <de> <Efficiency>       |   1|  21:EXACT <fficiency>(26)
                             |   1|  failed...
                             |   0| BRANCH failed...
   1 <d> <eEfficienc>        |   0| 1:SUSPEND(11)
   1 <d> <eEfficienc>        |   1|  3:CURLY{0,1}(9)
                             |   1|  EXACT <d> can match 0 times out of 1...
   1 <d> <eEfficienc>        |   2|   9:SUCCEED(0)
                             |   2|   SUCCEED: subpattern success...
   1 <d> <eEfficienc>        |   0| 11:BRANCH (buf:0/0)(17)
   1 <d> <eEfficienc>        |   1|  13:EXACT <eficiency>(26)
                             |   1|  failed...
   1 <d> <eEfficienc>        |   0| 17:BRANCH (buf:0/0)(25)
   1 <d> <eEfficienc>        |   1|  19:ANYOFM[Ee](21)
   2 <de> <Efficiency>       |   1|  21:EXACT <fficiency>(26)
                             |   1|  failed...
                             |   0| BRANCH failed...
   2 <de> <Efficiency>       |   0| 1:SUSPEND(11)
   2 <de> <Efficiency>       |   1|  3:CURLY{0,1}(9)
                             |   1|  EXACT <d> can match 0 times out of 1...
   2 <de> <Efficiency>       |   2|   9:SUCCEED(0)
                             |   2|   SUCCEED: subpattern success...
   2 <de> <Efficiency>       |   0| 11:BRANCH (buf:0/0)(17)
   2 <de> <Efficiency>       |   1|  13:EXACT <eficiency>(26)
                             |   1|  failed...
   2 <de> <Efficiency>       |   0| 17:BRANCH (buf:0/0)(25)
   2 <de> <Efficiency>       |   1|  19:ANYOFM[Ee](21)
   3 <deE> <fficiency>       |   1|  21:EXACT <fficiency>(26)
  12 <deEfficiency> <>       |   1|  26:END(0)
Match successful!
Freeing REx: "d?+(?:eficiency|[eE]fficiency)"

]]

    Tested against Perl 5.40.3.
    Been like this years ago.

    Another minor caveat is that, /[eE]/:
    Seems unconditionally converted to /[Ee]/.
    (the order may contain frequency hints - deployable enhancement)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions