Skip to content

Commit

Permalink
regex engine - simplify regnode structures and make them consistent
Browse files Browse the repository at this point in the history
This eliminates the regnode_2L data structure, and merges it with the older
regnode_2 data structure. At the same time it makes each "arg" property of the
various regnode types that have one be consistently structured as an anonymous
union like this:

    union {
        U32 arg1u;
        I32 arg2i;
        struct {
            U16 arg1a;
            U16 arg1b;
        };
    };

We then expose four macros for accessing each slot: ARG1u() ARG1i() and
ARG1a() and ARG1b(). Code then explicitly designates which they want. The old
logic used ARG() to access an U32 arg1, and ARG1() to access an I32 arg1,
which was confusing to say the least. The regnode_2L structure had a U32 arg1,
and I32 arg2, and the regnode_2 data strucutre had two I32 args. With the new
set of macros we use the regnode_2 for both, and use the appropriate macros to
show whether we want to signed or unsigned values.

This also renames the regnode_4 to regnode_3. The 3 stands for "three 32-bit
args". However as each slot can also store two U16s, a regnode_3 can hold up
to 6 U16s, or as 3 I32's, or a combination. For instance the CURLY style nodes
use regnode_3 to store 4 values, ARG1i() for min count, ARG2i() for max count
and ARG3a() and ARG3b() for parens before and inside the quantifier.

It also changes the functions reganode() to reg1node() and changes reg2Lanode()
to reg2node(). The 2L thing was just confusing.
  • Loading branch information
demerphq committed Jan 23, 2023
1 parent b094b6a commit c11c61e
Show file tree
Hide file tree
Showing 13 changed files with 360 additions and 286 deletions.
5 changes: 2 additions & 3 deletions embed.fnc
Expand Up @@ -5001,14 +5001,13 @@ ES |regnode_offset|regnode_guts \
ES |void |change_engine_size \
|NN RExC_state_t *pRExC_state \
|const Ptrdiff_t size
ES |regnode_offset|reganode|NN RExC_state_t *pRExC_state \
ES |regnode_offset|reg1node|NN RExC_state_t *pRExC_state \
|U8 op \
|U32 arg
ES |regnode_offset|regpnode|NN RExC_state_t *pRExC_state \
|U8 op \
|NN SV *arg
ES |regnode_offset|reg2Lanode \
|NN RExC_state_t *pRExC_state \
ES |regnode_offset|reg2node|NN RExC_state_t *pRExC_state \
|const U8 op \
|const U32 arg1 \
|const I32 arg2
Expand Down
4 changes: 2 additions & 2 deletions embed.h
Expand Up @@ -1842,13 +1842,13 @@
# define parse_lparen_question_flags(a) S_parse_lparen_question_flags(aTHX_ a)
# define parse_uniprop_string(a,b,c,d,e,f,g,h,i,j) S_parse_uniprop_string(aTHX_ a,b,c,d,e,f,g,h,i,j)
# define reg(a,b,c,d) S_reg(aTHX_ a,b,c,d)
# define reg2Lanode(a,b,c,d) S_reg2Lanode(aTHX_ a,b,c,d)
# define reg1node(a,b,c) S_reg1node(aTHX_ a,b,c)
# define reg2node(a,b,c,d) S_reg2node(aTHX_ a,b,c,d)
# define reg_la_NOTHING(a,b,c) S_reg_la_NOTHING(aTHX_ a,b,c)
# define reg_la_OPFAIL(a,b,c) S_reg_la_OPFAIL(aTHX_ a,b,c)
# define reg_node(a,b) S_reg_node(aTHX_ a,b)
# define reg_scan_name(a,b) S_reg_scan_name(aTHX_ a,b)
# define reg_skipcomment S_reg_skipcomment
# define reganode(a,b,c) S_reganode(aTHX_ a,b,c)
# define regatom(a,b,c) S_regatom(aTHX_ a,b,c)
# define regbranch(a,b,c,d) S_regbranch(aTHX_ a,b,c,d)
# define regclass(a,b,c,d,e,f,g,h,i) S_regclass(aTHX_ a,b,c,d,e,f,g,h,i)
Expand Down
16 changes: 8 additions & 8 deletions pod/perldebguts.pod
Expand Up @@ -752,13 +752,13 @@ will be lost.
PLUS node Match this (simple) thing 1 or more times:
/A{1,}B/ where A is width 1 char

CURLY sv 4 Match this (simple) thing {n,m} times:
CURLY sv 3 Match this (simple) thing {n,m} times:
/A{m,n}B/ where A is width 1 char
CURLYN no 4 Capture next-after-this simple thing:
CURLYN no 3 Capture next-after-this simple thing:
/(A){m,n}B/ where A is width 1 char
CURLYM no 4 Capture this medium-complex thing {n,m}
CURLYM no 3 Capture this medium-complex thing {n,m}
times: /(A){m,n}B/ where A is fixed-length
CURLYX sv 4 Match/Capture this complex thing {n,m}
CURLYX sv 3 Match/Capture this complex thing {n,m}
times.

# This terminator creates a loop structure for CURLYX
Expand Down Expand Up @@ -796,7 +796,7 @@ will be lost.

# Support for long RE
LONGJMP off 1 1 Jump far away.
BRANCHJ off 2L 1 BRANCH with long offset.
BRANCHJ off 2 1 BRANCH with long offset.

# Special Case Regops
IFMATCH off 1 1 Succeeds if the following matches; non-zero
Expand All @@ -814,7 +814,7 @@ will be lost.
# The heavy worker

EVAL evl/flags Execute some Perl code.
2L
2

# Modifiers

Expand All @@ -825,7 +825,7 @@ will be lost.
RENUM off 1 1 Group with independently numbered parens.

# Regex Subroutines
GOSUB num/ofs 2L recurse to paren arg1 at (signed) ofs arg2
GOSUB num/ofs 2 recurse to paren arg1 at (signed) ofs arg2

# Special conditionals
GROUPPN no-sv 1 Whether the group matched.
Expand All @@ -836,7 +836,7 @@ will be lost.
ENDLIKE none Used only for the type field of verbs
OPFAIL no-sv 1 Same as (?!), but with verb arg
ACCEPT no-sv/num Accepts the current matched string, with
2L verbar
2 verbar

# Verbs With Arguments
VERB no-sv 1 Used only for the type field of verbs
Expand Down
14 changes: 7 additions & 7 deletions proto.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit c11c61e

Please sign in to comment.