Document formal grammar for regex sets #16779

p5pRT · 2018-12-02T00:34:09Z

Migrated from rt.perl.org#133707 (status was 'new')

Searchable as RT133707$

p5pRT · 2018-12-02T00:34:09Z

From @khwilliamson

This is a bug report for perl from khw@khw-xps-8930.(none),
generated with the help of perlbug 1.41 running under perl 5.29.6.

A message that was part of the security ticket [perl #133649] asked that
a formal grammar be defined for the regex sets feature, and suggested
enhancements. The security ticket has now been close, so this ticket is
being opened so that portion of it does not get lost. The complete text
of the portion starts at

https://rt-archive.perl.org/perl5/Ticket/Display.html?id=131649#txn-1471088

Flags:
category=core
severity=low

Site configuration information for perl 5.29.6:

Configured by khw at Fri Nov 30 08:02:34 MST 2018.

Summary of my perl5 (revision 5 version 29 subversion 6) configuration:
Commit id: 23665de
Platform:
osname=linux
osvers=4.15.0-39-generic
archname=x86_64-linux-thread-multi-ld
uname='linux khw-xps-8930 4.15.0-39-generic #42-ubuntu smp tue oct
23 15:48:01 utc 2018 x86_64 x86_64 x86_64 gnulinux '
config_args='-des -Uversiononly -Dprefix=/home/khw/blead -Dusedevel
-A'optimize=-ggdb3' -A'optimize=-O0' -Accflags='-Wno-deprecated'
-Dman1dir=none -Dman3dir=none -Dcc=g++ -DDEBUGGING -Dusemorebits
-Dusecbacktrace -Dusethreads'
hint=recommended
useposix=true
d_sigaction=define
useithreads=define
usemultiplicity=define
use64bitint=define
use64bitall=define
uselongdouble=define
usemymalloc=n
default_inc_excludes_dot=define
bincompat5005=undef
Compiler:
cc='g++'
ccflags ='-D_REENTRANT -D_GNU_SOURCE -Wno-deprecated -fwrapv
-DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector-strong
-I/usr/local/include -DUSE_C_BACKTRACE -g -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
optimize='-O2 -ggdb3 -O0'
cppflags='-D_REENTRANT -D_GNU_SOURCE -Wno-deprecated -fwrapv
-DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector-strong
-I/usr/local/include'
ccversion=''
gccversion='7.3.0'
gccosandvers=''
intsize=4
longsize=8
ptrsize=8
doublesize=8
byteorder=12345678
doublekind=3
d_longlong=define
longlongsize=8
d_longdbl=define
longdblsize=16
longdblkind=3
ivtype='long'
ivsize=8
nvtype='long double'
nvsize=16
Off_t='off_t'
lseeksize=8
alignbytes=16
prototype=define
Linker and Libraries:
ld='g++'
ldflags =' -fstack-protector-strong -L/usr/local/lib'
libpth=/usr/include/c++/7 /usr/include/x86_64-linux-gnu/c++/7
/usr/include/c++/7/backward /usr/local/lib
/usr/lib/gcc/x86_64-linux-gnu/7/include-fixed
/usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib
/usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
libs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
libc=libc-2.27.so
so=so
useshrplib=false
libperl=libperl.a
gnulibc_version='2.27'
Dynamic Linking:
dlsrc=dl_dlopen.xs
dlext=so
d_dlsymun=undef
ccdlflags='-Wl,-E'
cccdlflags='-fPIC'
lddlflags='-shared -O2 -ggdb3 -O0 -L/usr/local/lib
-fstack-protector-strong'

@INC for perl 5.29.6:
/home/khw/blead/lib/perl5/site_perl/5.29.6/x86_64-linux-thread-multi-ld
/home/khw/blead/lib/perl5/site_perl/5.29.6
/home/khw/blead/lib/perl5/5.29.6/x86_64-linux-thread-multi-ld
/home/khw/blead/lib/perl5/5.29.6
/home/khw/blead/lib/perl5/site_perl/5.29.5
/home/khw/blead/lib/perl5/site_perl/5.29.4
/home/khw/blead/lib/perl5/site_perl/5.29.3
/home/khw/blead/lib/perl5/site_perl/5.29.2
/home/khw/blead/lib/perl5/site_perl/5.29.1
/home/khw/blead/lib/perl5/site_perl/5.29.0
/home/khw/blead/lib/perl5/site_perl/5.28.0
/home/khw/blead/lib/perl5/site_perl/5.27.11
/home/khw/blead/lib/perl5/site_perl/5.27.10
/home/khw/blead/lib/perl5/site_perl/5.27.9
/home/khw/blead/lib/perl5/site_perl/5.27.8
/home/khw/blead/lib/perl5/site_perl/5.27.7
/home/khw/blead/lib/perl5/site_perl/5.27.6
/home/khw/blead/lib/perl5/site_perl/5.27.5
/home/khw/blead/lib/perl5/site_perl/5.27.4
/home/khw/blead/lib/perl5/site_perl/5.27.3
/home/khw/blead/lib/perl5/site_perl/5.27.2
/home/khw/blead/lib/perl5/site_perl/5.27.1
/home/khw/blead/lib/perl5/site_perl/5.27.0
/home/khw/blead/lib/perl5/site_perl/5.26.0
/home/khw/blead/lib/perl5/site_perl/5.25.12
/home/khw/blead/lib/perl5/site_perl/5.25.11
/home/khw/blead/lib/perl5/site_perl/5.25.10
/home/khw/blead/lib/perl5/site_perl/5.25.9
/home/khw/blead/lib/perl5/site_perl/5.25.8
/home/khw/blead/lib/perl5/site_perl/5.25.7
/home/khw/blead/lib/perl5/site_perl/5.25.6
/home/khw/blead/lib/perl5/site_perl/5.25.5
/home/khw/blead/lib/perl5/site_perl/5.25.4
/home/khw/blead/lib/perl5/site_perl/5.25.3
/home/khw/blead/lib/perl5/site_perl/5.25.2
/home/khw/blead/lib/perl5/site_perl/5.25.1
/home/khw/blead/lib/perl5/site_perl/5.24.0
/home/khw/blead/lib/perl5/site_perl/5.23.10
/home/khw/blead/lib/perl5/site_perl/5.23.9
/home/khw/blead/lib/perl5/site_perl/5.23.8
/home/khw/blead/lib/perl5/site_perl/5.23.7
/home/khw/blead/lib/perl5/site_perl/5.23.6
/home/khw/blead/lib/perl5/site_perl/5.23.5
/home/khw/blead/lib/perl5/site_perl/5.23.4
/home/khw/blead/lib/perl5/site_perl/5.23.3
/home/khw/blead/lib/perl5/site_perl/5.23.2
/home/khw/blead/lib/perl5/site_perl/5.23.1
/home/khw/blead/lib/perl5/site_perl/5.23.0
/home/khw/blead/lib/perl5/site_perl/5.22.0
/home/khw/blead/lib/perl5/site_perl/5.21.12
/home/khw/blead/lib/perl5/site_perl/5.21.11
/home/khw/blead/lib/perl5/site_perl/5.21.10
/home/khw/blead/lib/perl5/site_perl/5.21.9
/home/khw/blead/lib/perl5/site_perl/5.21.8
/home/khw/blead/lib/perl5/site_perl/5.21.7
/home/khw/blead/lib/perl5/site_perl/5.21.6
/home/khw/blead/lib/perl5/site_perl/5.21.5
/home/khw/blead/lib/perl5/site_perl/5.21.4
/home/khw/blead/lib/perl5/site_perl/5.21.3
/home/khw/blead/lib/perl5/site_perl/5.21.2
/home/khw/blead/lib/perl5/site_perl/5.21.1
/home/khw/blead/lib/perl5/site_perl/5.20.0
/home/khw/blead/lib/perl5/site_perl/5.19.12
/home/khw/blead/lib/perl5/site_perl/5.19.11
/home/khw/blead/lib/perl5/site_perl/5.19.10
/home/khw/blead/lib/perl5/site_perl

Environment for perl 5.29.6:
HOME=/home/khw
LANG=en_US.UTF-8
LANGUAGE=en_US
LD_LIBRARY_PATH (unset)
LOGDIR (unset)

PATH=/usr/lib/ccache:/home/khw/bin:/home/khw/perl5/perlbrew/bin:/home/khw/print/bin:/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/usr/games:/usr/local/games:/snap/bin:/home/khw/iands/www:/home/khw/cxoffice/bin
PERL5OPT=-w
PERL_BADLANG (unset)
PERL_DIFF_TOOL=wgdiff
PERL_POD_PEDANTIC=1
SHELL=/bin/ksh

khwilliamson · 2019-12-09T03:27:42Z

Here is my try at a formal grammar. I haven't done this kind of thing in a long time, so it may be flawed.

primary-expression:
compiled set-expression, which is typically an interpolated scalar
[ perl character class ]
[:posix:], which is one of [:alpha:], ... [:digit:], ... [:word:]
\perl backslash escape sequence like \n, \p{...} \N{...} \x{...}, etc
( expression )

unary-expression:
primary-expression
! primary-expression

multiplicative-expression:
unary-expression
multiplicative-expression & unary-expression

additive-expression:
multiplicative-expression
additive-expression + multiplicative-expression
additive-expression | multiplicative-expression
additive-expression - multiplicative-expression
additive-expression ^ multiplicative-expression

expression:
additive_expression

My sense is that @demerphq was expecting there to be lots of gotchas when he requested this, around the fact that you can interpolate in an already-compiled regex set expression. And in fact it doesn't check that the code came from interpolation of such. Only that it is a syntactically valid compiled expression displayed out '(?^:...)'

I would like to make the regex sets feature non-experimental. And I believe this is the final road block to that.

demerphq · 2019-12-09T12:38:34Z

On Mon, 9 Dec 2019, 04:27 Karl Williamson, ***@***.***> wrote: Here is my try at a formal grammar. I haven't done this kind of thing in a long time, so it may be flawed. primary-expression: compiled set-expression, which is typically an interpolated scalar [ perl character class ] [:posix:], which is one of [:alpha:], ... [:digit:], ... [:word:] \perl backslash escape sequence like \n, \p{...} \N{...} \x{...}, etc ( expression )

Normally a grammar is expressed as a set of rules, with quoted text being literals, and unquoted text being other rules, usually with at least the '|' alternation meta-character similar to regex'es, and sometime with additional support for '+' (one or more), '*' (zero or more) and '?' (zero or one) meta-characters [grammars which support these additional items are MUCH easier to read and write even if they aren't strictly necessary from a logical point of view]. In the following ':' is used to separate a rule name from its definition, and the rule definition is terminated with a ';', I used '...' to mean "not completed yet" as one might expect. any_char_class: traditional_char_class | extended_char_class; traditional_char_class: '[' traditional_cc_spec* ']'; extended_char_class: '(?[' extended_cc_spec* '])'; traditional_cc_spec: trad_cc_range | trad_cc_codepoint | trad_cc_posix | trad_cc_meta | ... ; trad_cc_range: trad_cc_codepoint '-' trad_cc_codepoint; trad_cc_codepoint: 'a' | 'b' | ... | '\n' | '\t' | '\x{' hex-digits '}' | ... ; trad_cc_posix: '[:' posix_negate? posix_name ':]'; posix_negate: '^' posix_name: 'digit' | ... | 'lower'; trad_cc_meta: '\d' | ... | '\s'; extended_cc_spec: ... ; The part that is really interesting IMO/IIRC is the exact rules for set operations and nested set definitions. IIRC they were sufficiently surprising the exact rules weren't intuitive, so they may be difficult to express as a grammar - which I suspect would suggest that the rule is difficult to remember, and thus should be changed to simplify things. Cheers, Yves

…

khwilliamson · 2019-12-09T18:02:07Z

There have been various changes over the years to make things behave properly. I am unaware of anything at this time that would likely be a surprise. It doesn't sound like you have any examples immediately at hand. My guess is that the ones you knew of have been fixed.

demerphq · 2019-12-09T18:11:47Z

On Mon, 9 Dec 2019 at 19:02, Karl Williamson ***@***.***> wrote: There have been various changes over the years to make things behave properly. I am unaware of anything at this time that would likely be a surprise. It doesn't sound like you have any examples immediately at hand. My guess is that the ones you knew of have been fixed.

Maybe. :-) But lets write the rules down properly so we know what you expect, and then we can generate test cases to see if there are any surprises. I remember being very surprised about where (?[]) were required or not. Yves

…

-- perl -Mre=debug -e "/just|another|perl|hacker/"

khwilliamson · 2019-12-09T18:31:16Z

On 12/9/19 11:11 AM, Yves Orton wrote: I remember being very surprised about where (?[]) were required or not.

That doesn't make sense to me. They are always required for this feature to kick in.

demerphq · 2019-12-09T18:32:32Z

On Mon, 9 Dec 2019 at 19:31, Karl Williamson ***@***.***> wrote: On 12/9/19 11:11 AM, Yves Orton wrote: > I remember > being very surprised about where (?[]) were required or not. That doesn't make sense to me. They are always required for this feature to kick in.

IIRC, they are also required sometimes inside of a (?[]) as well. That is the part that is surprising. Yves

…

-- perl -Mre=debug -e "/just|another|perl|hacker/"

demerphq · 2019-12-09T18:42:23Z

On Mon, 9 Dec 2019 at 19:32, demerphq ***@***.***> wrote: On Mon, 9 Dec 2019 at 19:31, Karl Williamson ***@***.***> wrote: > > On 12/9/19 11:11 AM, Yves Orton wrote: > > I remember > > being very surprised about where (?[]) were required or not. > > That doesn't make sense to me. They are always required for this > feature to kick in. IIRC, they are also required sometimes inside of a (?[]) as well. That is the part that is surprising.

FWIW, this discussion and me saying we should write a grammar is related to 19a498a commit 19a498a (yves/fix_131649) Author: Yves Orton <demerphq@gmail.com> Date: Mon Jun 26 13:19:55 2017 +0200 fix #131649 - extended charclass can trigger assert The extended charclass parser makes some assumptions during the first pass which are only true on well structured input, and it does not properly catch various errors. later on the code assumes that things the first pass will let through are valid, when in fact they should trigger errors. While I was debugging that ticket and fixing those issues I noticed the rules about how regexes inside of (?[ ... ]) are not as intuitive as I thought, and that wrappers are required inside sometimes. For instance I think I encountered these issues while leading up to the creation of the following error messages: +=item Unexpected ']' with no following ')' in (?[... in regex; marked by <-- HERE in m/%s/ + +(F) While parsing an extended character class a ']' character was encountered +at a point in the definition where the only legal use of ']' is to close the +character class definition as part of a '])', you may have forgotten the close +paren, or otherwise confused the parser. + +=item Expecting close paren for nested extended charclass in regex; marked by <-- HERE in m/%s/ + +(F) While parsing a nested extended character class like: + + (?[ ... (?flags:(?[ ... ])) ... ]) + ^ + +we expected to see a close paren ')' (marked by ^) but did not. + +=item Expecting close paren for wrapper for nested extended charclass in regex; marked by <-- HERE in m/%s/ + +(F) While parsing a nested extended character class like: + + (?[ ... (?flags:(?[ ... ])) ... ]) + ^ + +we expected to see a close paren ')' (marked by ^) but did not. I dont remember more details, which is why I would like you to write the grammar down. For instance your original does not seem to account for (?[ (?flags:(?[ ... ])) ]) at all. What flags are legal? What things are illegal after a (?flags: ...) that is inside of an extended char-class, etc. Yves

…

-- perl -Mre=debug -e "/just|another|perl|hacker/"

khwilliamson · 2019-12-09T18:52:26Z

On 12/9/19 11:32 AM, Yves Orton wrote: On Mon, 9 Dec 2019 at 19:31, Karl Williamson ***@***.***> wrote: > > On 12/9/19 11:11 AM, Yves Orton wrote: > > I remember > > being very surprised about where (?[]) were required or not. > > That doesn't make sense to me. They are always required for this > feature to kick in. IIRC, they are also required sometimes inside of a (?[]) as well. That is the part that is surprising.

That would be surprising to me too. But looking at the code, and doing a few experiments, I don't see how that could happen.

…

Yves -- perl -Mre=debug -e "/just|another|perl|hacker/" — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#16779?email_source=notifications&email_token=AAA2DH5Y337R66HMS3FNVMTQX2FMDA5CNFSM4JYCTEFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGKFVCQ#issuecomment-563370634>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA2DHZQGOYWSO4LCDBL3IDQX2FMDANCNFSM4JYCTEFA>.

khwilliamson · 2019-12-09T19:26:38Z

On 12/9/19 11:42 AM, Yves Orton wrote: On Mon, 9 Dec 2019 at 19:32, demerphq ***@***.***> wrote: > > On Mon, 9 Dec 2019 at 19:31, Karl Williamson ***@***.***> wrote: > > > > On 12/9/19 11:11 AM, Yves Orton wrote: > > > I remember > > > being very surprised about where (?[]) were required or not. > > > > That doesn't make sense to me. They are always required for this > > feature to kick in. > > IIRC, they are also required sometimes inside of a (?[]) as well. That > is the part that is surprising. FWIW, this discussion and me saying we should write a grammar is related to 19a498a commit 19a498a (yves/fix_131649) Author: Yves Orton ***@***.***> Date: Mon Jun 26 13:19:55 2017 +0200 fix #131649 - extended charclass can trigger assert The extended charclass parser makes some assumptions during the first pass which are only true on well structured input, and it does not properly catch various errors. later on the code assumes that things the first pass will let through are valid, when in fact they should trigger errors. While I was debugging that ticket and fixing those issues I noticed the rules about how regexes inside of (?[ ... ]) are not as intuitive as I thought, and that wrappers are required inside sometimes. For instance I think I encountered these issues while leading up to the creation of the following error messages: +=item Unexpected ']' with no following ')' in (?[... in regex; marked by <-- HERE in m/%s/ + +(F) While parsing an extended character class a ']' character was encountered +at a point in the definition where the only legal use of ']' is to close the +character class definition as part of a '])', you may have forgotten the close +paren, or otherwise confused the parser. + +=item Expecting close paren for nested extended charclass in regex; marked by <-- HERE in m/%s/ + +(F) While parsing a nested extended character class like: + + (?[ ... (?flags:(?[ ... ])) ... ]) + ^ + +we expected to see a close paren ')' (marked by ^) but did not. + +=item Expecting close paren for wrapper for nested extended charclass in regex; marked by <-- HERE in m/%s/ + +(F) While parsing a nested extended character class like: + + (?[ ... (?flags:(?[ ... ])) ... ]) + ^ + +we expected to see a close paren ')' (marked by ^) but did not. I dont remember more details, which is why I would like you to write the grammar down. For instance your original does not seem to account for (?[ (?flags:(?[ ... ])) ]) at all. What flags are legal? What things are illegal after a (?flags: ...) that is inside of an extended char-class, etc. Yves

So, the grammar I wrote earlier in this thread has this rule: primary-expression: compiled set-expression, which is typically an interpolated scalar ... By that, I mean that you could say something like my $foo = qr/([ ... ])/; and later include $foo in another set expression qr/([ ... + $foo + ... ])/; But by the time I see the interpolated $foo, it looks like qr/([ ... + (?^flags:(?[...])) + ... ])/; The boiler plate of the expansion of $foo, including the flags, are placed there by regcomp.c during the previous compilation of $foo. It is essential that the pattern accept an interpolated previously compiled set expression, but the programmer should not be constructing such things manually. So the grammar merely refers to the interpolated scalar without specifying the details, such as the flags. Those details have changed before, and its possible they could change again. Since you wrote that ticket, there's been a tightening of the checking that the interpolated code is actually from a previously compiled regex, but AFAIK, there's no way for me to know its genesis for sure. If someone does have a way, I'm all ears. Maybe the diagnostic messages and explanations can be fixed to be more clear. But the bottom line is that the programmer should not be specifying flags, etc inside one of these expressions, and hence the grammar shouldn't include that.

…

-- perl -Mre=debug -e "/just|another|perl|hacker/" — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#16779?email_source=notifications&email_token=AAA2DH5HBGRYZOS6CECZUXTQX2GRBA5CNFSM4JYCTEFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGKGRYQ#issuecomment-563374306>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA2DH6JUV6BHLF3RP7DHILQX2GRBANCNFSM4JYCTEFA>.

demerphq · 2019-12-10T11:17:50Z

On Mon, 9 Dec 2019 at 20:26, Karl Williamson <notifications@github.com> wrote:

On 12/9/19 11:42 AM, Yves Orton wrote: > On Mon, 9 Dec 2019 at 19:32, demerphq ***@***.***> wrote: > > > > On Mon, 9 Dec 2019 at 19:31, Karl Williamson > ***@***.***> wrote: > > > > > > On 12/9/19 11:11 AM, Yves Orton wrote: > > > > I remember > > > > being very surprised about where (?[]) were required or not. > > > > > > That doesn't make sense to me. They are always required for this > > > feature to kick in. > > > > IIRC, they are also required sometimes inside of a (?[]) as well. That > > is the part that is surprising. > > FWIW, this discussion and me saying we should write a grammar is related to > > 19a498a > > commit 19a498a (yves/fix_131649) > Author: Yves Orton ***@***.***> > Date: Mon Jun 26 13:19:55 2017 +0200 > > fix #131649 - extended charclass can trigger assert > > The extended charclass parser makes some assumptions during the > first pass which are only true on well structured input, and it > does not properly catch various errors. later on the code assumes > that things the first pass will let through are valid, when in > fact they should trigger errors. > > While I was debugging that ticket and fixing those issues I noticed > the rules about how regexes inside of (?[ ... ]) are not as intuitive > as I thought, and that wrappers are required inside sometimes. For > instance I think I encountered these issues while leading up to the > creation of the following error messages: > > +=item Unexpected ']' with no following ')' in (?[... in regex; marked > by <-- HERE in m/%s/ > + > +(F) While parsing an extended character class a ']' character was > encountered > +at a point in the definition where the only legal use of ']' is to > close the > +character class definition as part of a '])', you may have forgotten > the close > +paren, or otherwise confused the parser. > + > +=item Expecting close paren for nested extended charclass in regex; > marked by <-- HERE in m/%s/ > + > +(F) While parsing a nested extended character class like: > + > + (?[ ... (?flags:(?[ ... ])) ... ]) > + ^ > + > +we expected to see a close paren ')' (marked by ^) but did not. > + > +=item Expecting close paren for wrapper for nested extended charclass > in regex; marked by <-- HERE in m/%s/ > + > +(F) While parsing a nested extended character class like: > + > + (?[ ... (?flags:(?[ ... ])) ... ]) > + ^ > + > +we expected to see a close paren ')' (marked by ^) but did not. > > I dont remember more details, which is why I would like you to write > the grammar down. For instance your original does not seem to account > for (?[ (?flags:(?[ ... ])) ]) at all. > > What flags are legal? What things are illegal after a (?flags: ...) > that is inside of an extended char-class, etc. > > Yves So, the grammar I wrote earlier in this thread has this rule: primary-expression: compiled set-expression, which is typically an interpolated scalar ... By that, I mean that you could say something like my $foo = qr/([ ... ])/; and later include $foo in another set expression qr/([ ... + $foo + ... ])/; But by the time I see the interpolated $foo, it looks like qr/([ ... + (?^flags:(?[...])) + ... ])/; The boiler plate of the expansion of $foo, including the flags, are placed there by regcomp.c during the previous compilation of $foo. It is essential that the pattern accept an interpolated previously compiled set expression, but the programmer should not be constructing such things manually. So the grammar merely refers to the interpolated scalar without specifying the details, such as the flags. Those details have changed before, and its possible they could change again. Since you wrote that ticket, there's been a tightening of the checking that the interpolated code is actually from a previously compiled regex, but AFAIK, there's no way for me to know its genesis for sure. If someone does have a way, I'm all ears. Maybe the diagnostic messages and explanations can be fixed to be more clear. But the bottom line is that the programmer should not be specifying flags, etc inside one of these expressions, and hence the grammar shouldn't include that.

It is allowed, and implied by the fact that concatenation of text to construct a pattern and concatenation of regex objects are and must be indistinguishable. ./perl -Ilib -lE'print "x"=~/(?[ (?^ux:(?[ [x] ])) ])/ ? "yes" : "no";' The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE [ x ] ])/ at -e line 1. The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE (?^ux:(?[ [x] ])) ])/ at -e line 1. The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ (?^ux:(?[ <-- HERE [x] ])) ])/ at -e line 1. yes So IMO it needs to be in the grammar. I have to say I find the rules of the syntax really confusing and the error messages that are produced relatively unhelpful. I actually feel this sufficiently to say we shouldnt change the experimental syntax status until we are sure that we are happy with the rules. The basis of this is great, some of the rough edges are IMO too rough. Do we need to see multiple warnings about the experimental status from a single pattern? ./perl -Ilib -lE'print "x"=~/(?[ (?x:(?[ ])) ])/ ? "yes" : "no";' The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE (?x:(?[ ])) ])/ at -e line 1. Unexpected character in regex; marked by <-- HERE in m/(?[ (? <-- HERE x:(?[ ])) ])/ at -e line 1. $ ./perl -le'my $s=qr/(?x:(?[ [ x ] + [ y ] ]))/' The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?x:(?[ <-- HERE [ x ] + [ y ] ]))/ at -e line 1. $ ./perl -le'my $s=qr/(?x:(?[ [ x ] + [ y ] ]))/; print "x"=~/(?[ $s ])/' The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?x:(?[ <-- HERE [ x ] + [ y ] ]))/ at -e line 1. The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE (?^:(?x:(?[ [ x ] + [ y ] ]))) ])/ at -e line 1. Expecting '(?flags:(?[...' in regex; marked by <-- HERE in m/(?[ (?^:(?x <-- HERE :(?[ [ x ] + [ y ] ]))) ])/ at -e line 1. ./perl -le'my $s=qr/(?x:(?[ [ x ] ]))/; print "x"=~/(?[ $s ])/' The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?x:(?[ <-- HERE [ x ] ]))/ at -e line 1. The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE (?^:(?x:(?[ [ x ] ]))) ])/ at -e line 1. Expecting '(?flags:(?[...' in regex; marked by <-- HERE in m/(?[ (?^:(?x <-- HERE :(?[ [ x ] ]))) ])/ at -e line 1. $ ./perl -le'my $s=qr/(?[ [ x ] ])/x; print "x"=~/(?[ $s ])/' The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE [ x ] ])/ at -e line 1. The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE (?^x:(?[ [ x ] ])) ])/ at -e line 1. The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ (?^x:(?[ <-- HERE [ x ] ])) ])/ at -e line 1. 1 The last two for me are *extremely* problematic. Saying: qr/PAT/x and saying qr/(?x:PAT)/ should produce identical results. Given these issues IMO unfortunately this feature is not ready to be marked un-experimental, and i think writing down a grammar and figuring out what is and should be allowed in a formal sense is required to wrap our heads around it and understand if it is right for us to support long term. Yves

…

-- perl -Mre=debug -e "/just|another|perl|hacker/"

khwilliamson · 2020-01-27T21:56:21Z

Do we need to see multiple warnings about the experimental status
from a single pattern?

That could be fixed, but it will be fixed automatically without extra effort by making it non-experimental.

As to the rest, @demerphq found some real bugs. I have a WIP at smoke-me/khw-sets which I believe addresses his examples. I apologize for the awful error messages. I think this branch cleans them up. The stuff about flags, etc. were extremely misleading.

Before I proceed down that branch, I'm hoping @demerphq will play with it a little to see if he can find some results that are still confusing. Flags should now be irrelevant.

A set operations expression can contain a previously-compiled one interpolated in. Prior to this commit, some heuristics were employed to verify it actually was such a thing, and not a sort of look-alike that wasn't necessarily valid. The heuristics actually forbade legal ones. I don't know of any illegal ones that were let through, but it is certainly possible. Also, the error/warning messages referred to the heuristics, and were unhelpful at best. The technique used instead in this commit is to return a regop only used by this feature for any nested compilations. This guarantees that the caller can determine if the result is valid, and what that result is without having to do any heuristics or inspecting any flags. The error/warning messages are changed to reflect this, and I believe are now helpful. This fixes the bugs in #16779 #16779 (comment)

demerphq · 2022-02-19T13:22:18Z

It seems a bit weird that I cant do this:

my $cc1= qr/[a-c]/;
my $cc2= qr/(?[ $cc1 + [x-z] ])/;

but I can do this:

my $cc3= qr/(?[ [a-c] + [x-z] ])/;

and I can do this:

my $ecc1= qr/(?[ [a-c] ])/;
my $ecc2= qr/(?[ $ecc1 + [x-z] ])/;

Also shouldn't these two produce the same errors?

$ ./perl -Ilib -lE'print "x"=~/(?[ (?^x:(?[ ])) ])/ ? "yes" : "no";'
The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE (?^x:(?[ ])) ])/ at -e line 1.
Incomplete expression within '(?[ ])' in regex; marked by <-- HERE in m/(?[ (?^x:(?[ <-- HERE ])) ])/ at -e line 1.

$ ./perl -Ilib -lE'print "x"=~/(?[ (?x:(?[ ])) ])/ ? "yes" : "no";'
The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE (?x:(?[ ])) ])/ at -e line 1.
Unexpected character in regex; marked by <-- HERE in m/(?[ (? <-- HERE x:(?[ ])) ])/ at -e line 1.

Why does (?x:(?[ ])) produce an "Unexpected character in regex" and (?^x:(?[ ])) produce a Incomplete expression within.

Similarly why does:

(?x:(?[ [x] ]))

produce an exception but

(?^x:(?[ [x] ]))

does not?

The docs say "(?^x:foo) is equivalent to (?x-imns:foo)" so why is one accepted and one is not?

$ ./perl -Ilib -lE'print "x"=~/(?[ (?^x:(?[ [x] ])) ])/ ? "yes" : "no";'
The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE (?^x:(?[ [x] ])) ])/ at -e line 1.
yes

$ ./perl -Ilib -lE'print "x"=~/(?[ (?x-imns:(?[ [x] ])) ])/ ? "yes" : "no";'
The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE (?x-imns:(?[ [x] ])) ])/ at -e line 1.
Unexpected character in regex; marked by <-- HERE in m/(?[ (? <-- HERE x-imns:(?[ [x] ])) ])/ at -e line 1.

khwilliamson · 2022-02-24T20:47:40Z

On 2/19/22 06:22, Yves Orton wrote: It seems a bit weird that I cant do this: my $cc1= qr/[a-c]/; my $cc2= qr/(?[ $cc1 + [x-z] ])/; but I can do this: my $cc3= qr/(?[ [a-c] + [x-z] ])/; and I can do this: my $ecc1= qr/(?[ [a-c] ])/; my $ecc2= qr/(?[ $ecc1 + [x-z] ])/;

This has caused me to realize that the pod for this feature (in perlrecharclass) should be rewritten as now in #19459 The idea of assembling a pattern from smaller interpolated components (usually scalars) led to problems in implementation and use. We had to add \g{} beyond the initial \g, as an example, so as to allow the programmer to write something that wouldn't get an entirely different meaning depending on the interpolated string. regex sets are even more vulnerable to this than other pattern constructs. There are various operators, with precedence to worry about. After several false starts, the current implementation only guarantees compilation of interpolated strings inside a regex set if each such string is itself a legal regex set. It makes sense that a larger set would be constructed out of the basic building blocks of smaller sets. Your tests below all violate that (not until now documented) rule. I don't think this constraint actually hinders building up a set to do the complete set of its capabilities. It might be nice to extend things so it also could accept other interpolated strings, but this would be a nicety, not necessary for operation, and doing so would not change the behavior of anything successfully working on the current regime. Thus I don't think that should stand in the way of de-experimentalizing these.

Also shouldn't these two produce the same errors? $ ./perl -Ilib -lE'print "x"=~/(?[ (?^x:(?[ ])) ])/ ? "yes" : "no";' The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE (?^x:(?[ ])) ])/ at -e line 1. Incomplete expression within '(?[ ])' in regex; marked by <-- HERE in m/(?[ (?^x:(?[ <-- HERE ])) ])/ at -e line 1. $ ./perl -Ilib -lE'print "x"=~/(?[ (?x:(?[ ])) ])/ ? "yes" : "no";' The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE (?x:(?[ ])) ])/ at -e line 1. Unexpected character in regex; marked by <-- HERE in m/(?[ (? <-- HERE x:(?[ ])) ])/ at -e line 1. Why does (?x:(?[ ])) produce an "Unexpected character in regex" and (?^x:(?[ ])) produce a Incomplete expression within. Similarly why does: (?x:(?[ [x] ])) produce an exception but (?^x:(?[ [x] ])) does not?

I don't get an exception with either of those in blead. Both produce Final program: 1: EXACT <x> (3) 3: END (0) anchored "x" at 0..0 (checking anchored isall) minlen 1

The docs say "(?^x:foo) is equivalent to (?x-imns:foo)" so why is one accepted and one is not? $ ./perl -Ilib -lE'print "x"=~/(?[ (?^x:(?[ [x] ])) ])/ ? "yes" : "no";' The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE (?^x:(?[ [x] ])) ])/ at -e line 1. yes $ ./perl -Ilib -lE'print "x"=~/(?[ (?x-imns:(?[ [x] ])) ])/ ? "yes" : "no";' The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE (?x-imns:(?[ [x] ])) ])/ at -e line 1. Unexpected character in regex; marked by <-- HERE in m/(?[ (? <-- HERE x-imns:(?[ [x] ])) ])/ at -e line 1.

I thought I had sufficiently fixed up the pod, but I haven't. It should say that it is illegal to nest these. But you can achieve the same effect by interpolating one that has already been compiled. In the case you present: my $nested = qr/(?x-imns:(?[ [x] ]))/; print "x"=~/(?[ $nested ])/ ? "yes" : "no";

…

—

demerphq · 2022-02-25T04:36:43Z

On Mon, 9 Dec 2019 at 20:26, Karl Williamson ***@***.***> wrote:

On 12/9/19 11:42 AM, Yves Orton wrote: > On Mon, 9 Dec 2019 at 19:32, demerphq ***@***.***> wrote: > > > > On Mon, 9 Dec 2019 at 19:31, Karl Williamson > ***@***.***> wrote: > > > > > > On 12/9/19 11:11 AM, Yves Orton wrote: > > > > I remember > > > > being very surprised about where (?[]) were required or not. > > > > > > That doesn't make sense to me. They are always required for this > > > feature to kick in. > > > > IIRC, they are also required sometimes inside of a (?[]) as well. That > > is the part that is surprising. > > FWIW, this discussion and me saying we should write a grammar is related to > > 19a498a > > commit 19a498a (yves/fix_131649) > Author: Yves Orton ***@***.***> > Date: Mon Jun 26 13:19:55 2017 +0200 > > fix #131649 - extended charclass can trigger assert > > The extended charclass parser makes some assumptions during the > first pass which are only true on well structured input, and it > does not properly catch various errors. later on the code assumes > that things the first pass will let through are valid, when in > fact they should trigger errors. > > While I was debugging that ticket and fixing those issues I noticed > the rules about how regexes inside of (?[ ... ]) are not as intuitive > as I thought, and that wrappers are required inside sometimes. For > instance I think I encountered these issues while leading up to the > creation of the following error messages: > > +=item Unexpected ']' with no following ')' in (?[... in regex; marked > by <-- HERE in m/%s/ > + > +(F) While parsing an extended character class a ']' character was > encountered > +at a point in the definition where the only legal use of ']' is to > close the > +character class definition as part of a '])', you may have forgotten > the close > +paren, or otherwise confused the parser. > + > +=item Expecting close paren for nested extended charclass in regex; > marked by <-- HERE in m/%s/ > + > +(F) While parsing a nested extended character class like: > + > + (?[ ... (?flags:(?[ ... ])) ... ]) > + ^ > + > +we expected to see a close paren ')' (marked by ^) but did not. > + > +=item Expecting close paren for wrapper for nested extended charclass > in regex; marked by <-- HERE in m/%s/ > + > +(F) While parsing a nested extended character class like: > + > + (?[ ... (?flags:(?[ ... ])) ... ]) > + ^ > + > +we expected to see a close paren ')' (marked by ^) but did not. > > I dont remember more details, which is why I would like you to write > the grammar down. For instance your original does not seem to account > for (?[ (?flags:(?[ ... ])) ]) at all. > > What flags are legal? What things are illegal after a (?flags: ...) > that is inside of an extended char-class, etc. > > Yves So, the grammar I wrote earlier in this thread has this rule: primary-expression: compiled set-expression, which is typically an interpolated scalar ... By that, I mean that you could say something like my $foo = qr/([ ... ])/; and later include $foo in another set expression qr/([ ... + $foo + ... ])/; But by the time I see the interpolated $foo, it looks like qr/([ ... + (?^flags:(?[...])) + ... ])/;

But this assumes that it is coming from a stringified qr// object from the current process. Something like Sereal will extract the modifiers and then store them independently from the pattern so that when it constructs a qr// it is constructed properly using a function like regexp_pattern. use re qw(regexp_pattern); my $qr=qr/[a-z]/ix; my ($pat,$mods)= re::regexp_pattern($qr); my $new="(?$mods:$pat)"; should be a legal way to construct a qr// like but non-object based equivalent of the original. It wont be (?^ix:...) it will be (?ix:...) and I think it should work the same. I really dont understand why you are objecting to this kind of thing. Why should the caret notation be ok but the non-caret is not? That is the core of my objection here.

The boiler plate of the expansion of $foo, including the flags, are placed there by regcomp.c during the previous compilation of $foo. It is essential that the pattern accept an interpolated previously compiled set expression, but the programmer should not be constructing such things manually. So the grammar merely refers to the interpolated scalar without specifying the details, such as the flags. Those details have changed before, and its possible they could change again.

Which is why I think the expectation that they can only contain (?^flags: ... ) is wrong. (?^flags: ) is defined to be equivalent to a specific set of flags. I dont see why the construct that does not use the '^' abbreviation should behave differently from one that explicitly spells out the flags. That is my concern here.

Since you wrote that ticket, there's been a tightening of the checking that the interpolated code is actually from a previously compiled regex, but AFAIK, there's no way for me to know its genesis for sure. If someone does have a way, I'm all ears.

I dont think you should care. If I stringify the pattern to save it on disk, and then bring it back as a string instead of a qr// object it should still work. If I use our apis to compose the string into modifiers and pattern then construct it back as a string it should operate just the same as if it was qr// object.

Maybe the diagnostic messages and explanations can be fixed to be more clear. But the bottom line is that the programmer should not be specifying flags, etc inside one of these expressions, and hence the grammar shouldn't include that.

That just doesn't make sense to me. If we can parse the flags properly, and we can, then we should. Lets say we introduce a new flag in the future that changes how character classes are introduced (lets NOT, but let us say we did for the sake of the argument) then that flag should change how the pattern is processed. Yves

…

-- perl -Mre=debug -e "/just|another|perl|hacker/"

See Perl#16779 (comment) This commit extends the flags accepted by nested calls to regex sets to any legal set. Previously it allowed only '^', as that is what an actual compilation would return. But as pointed out in the conversation in that ticket, it's possible to detach flags from the rest of the pattern, and to write equivalent ones in multiple ways. This commit just changes one line to look for any legal flag, not restricting it to '^'.

khwilliamson · 2022-02-28T23:48:49Z

On 2/24/22 21:36, Yves Orton wrote: On Mon, 9 Dec 2019 at 20:26, Karl Williamson ***@***.***> wrote: > On 12/9/19 11:42 AM, Yves Orton wrote: > > On Mon, 9 Dec 2019 at 19:32, demerphq ***@***.***> wrote: > > > > > > On Mon, 9 Dec 2019 at 19:31, Karl Williamson > > ***@***.***> wrote: > > > > > > > > On 12/9/19 11:11 AM, Yves Orton wrote: > > > > > I remember > > > > > being very surprised about where (?[]) were required or not. > > > > > > > > That doesn't make sense to me. They are always required for this > > > > feature to kick in. > > > > > > IIRC, they are also required sometimes inside of a (?[]) as well. That > > > is the part that is surprising. > > > > FWIW, this discussion and me saying we should write a grammar is related > to > > > > 19a498a > > > > commit 19a498a (yves/fix_131649) > > Author: Yves Orton ***@***.***> > > Date: Mon Jun 26 13:19:55 2017 +0200 > > > > fix #131649 - extended charclass can trigger assert > > > > The extended charclass parser makes some assumptions during the > > first pass which are only true on well structured input, and it > > does not properly catch various errors. later on the code assumes > > that things the first pass will let through are valid, when in > > fact they should trigger errors. > > > > While I was debugging that ticket and fixing those issues I noticed > > the rules about how regexes inside of (?[ ... ]) are not as intuitive > > as I thought, and that wrappers are required inside sometimes. For > > instance I think I encountered these issues while leading up to the > > creation of the following error messages: > > > > +=item Unexpected ']' with no following ')' in (?[... in regex; marked > > by <-- HERE in m/%s/ > > + > > +(F) While parsing an extended character class a ']' character was > > encountered > > +at a point in the definition where the only legal use of ']' is to > > close the > > +character class definition as part of a '])', you may have forgotten > > the close > > +paren, or otherwise confused the parser. > > + > > +=item Expecting close paren for nested extended charclass in regex; > > marked by <-- HERE in m/%s/ > > + > > +(F) While parsing a nested extended character class like: > > + > > + (?[ ... (?flags:(?[ ... ])) ... ]) > > + ^ > > + > > +we expected to see a close paren ')' (marked by ^) but did not. > > + > > +=item Expecting close paren for wrapper for nested extended charclass > > in regex; marked by <-- HERE in m/%s/ > > + > > +(F) While parsing a nested extended character class like: > > + > > + (?[ ... (?flags:(?[ ... ])) ... ]) > > + ^ > > + > > +we expected to see a close paren ')' (marked by ^) but did not. > > > > I dont remember more details, which is why I would like you to write > > the grammar down. For instance your original does not seem to account > > for (?[ (?flags:(?[ ... ])) ]) at all. > > > > What flags are legal? What things are illegal after a (?flags: ...) > > that is inside of an extended char-class, etc. > > > > Yves > > So, the grammar I wrote earlier in this thread has this rule: > > primary-expression: > compiled set-expression, which is typically an interpolated scalar > ... > > By that, I mean that you could say something like > > my $foo = qr/([ ... ])/; > > and later include $foo in another set expression > > qr/([ ... + $foo + ... ])/; > > But by the time I see the interpolated $foo, it looks like > > qr/([ ... + (?^flags:(?[...])) + ... ])/; > But this assumes that it is coming from a stringified qr// object from the current process. Something like Sereal will extract the modifiers and then store them independently from the pattern so that when it constructs a qr// it is constructed properly using a function like regexp_pattern. use re qw(regexp_pattern); my $qr=qr/[a-z]/ix; my ($pat,$mods)= re::regexp_pattern($qr); my $new="(?$mods:$pat)"; should be a legal way to construct a qr// like but non-object based equivalent of the original. It wont be (?^ix:...) it will be (?ix:...) and I think it should work the same. I really dont understand why you are objecting to this kind of thing. Why should the caret notation be ok but the non-caret is not? That is the core of my objection here.

I now think I understand your motivations, and have pushed a new commit to address your issues.

> > The boiler plate of the expansion of $foo, including the flags, are > placed there by regcomp.c during the previous compilation of $foo. It > is essential that the pattern accept an interpolated previously compiled > set expression, but the programmer should not be constructing such > things manually. So the grammar merely refers to the interpolated > scalar without specifying the details, such as the flags. Those details > have changed before, and its possible they could change again. > Which is why I think the expectation that they can only contain (?^flags: ... ) is wrong.

Fixed

(?^flags: ) is defined to be equivalent to a specific set of flags. I dont see why the construct that does not use the '^' abbreviation should behave differently from one that explicitly spells out the flags. That is my concern here.

Fixed

> > Since you wrote that ticket, there's been a tightening of the checking > that the interpolated code is actually from a previously compiled regex, > but AFAIK, there's no way for me to know its genesis for sure. If > someone does have a way, I'm all ears. > I dont think you should care. If I stringify the pattern to save it on disk, and then bring it back as a string instead of a qr// object it should still work. If I use our apis to compose the string into modifiers and pattern then construct it back as a string it should operate just the same as if it was qr// object.

The reason I care is that the sets code builds up larger sets from simple components or other sets. Each such set must be a single character wide. Most compiled patterns don't fit that. Extra work could allow some of that, but I don't see that as necessary to remove experimental status

> > Maybe the diagnostic messages and explanations can be fixed to be more > clear. But the bottom line is that the programmer should not be > specifying flags, etc inside one of these expressions, and hence the > grammar shouldn't include that. > That just doesn't make sense to me. If we can parse the flags properly, and we can, then we should. Lets say we introduce a new flag in the future that changes how character classes are introduced (lets NOT, but let us say we did for the sake of the argument) then that flag should change how the pattern is processed.

You're right; fixed

…

Yves -- perl -Mre=debug -e "/just|another|perl|hacker/" — Reply to this email directly, view it on GitHub <#16779 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA2DH6ZRCKRG2ZFZ32DJRDU44BONANCNFSM4JYCTEFA>. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

demerphq · 2022-03-01T03:04:26Z

On Tue, 1 Mar 2022 at 00:48, Karl Williamson ***@***.***> wrote:

On 2/24/22 21:36, Yves Orton wrote: > ... Why > should the caret notation be ok but the non-caret is not? That is the core > of my objection here. I now think I understand your motivations, and have pushed a new commit to address your issues.

Yay! Thank you Karl. Sorry I didn't manage to explain myself better earlier.

> Which is why I think the expectation that they can only contain (?^flags: > ... ) is wrong. Fixed

Summary response to all the "Fixed" responses you made: Yay!

> I dont think you should care. If I stringify the pattern to save it on > disk, and then bring it back as a string instead of a qr// object it should > still work. If I use our apis to compose the string into modifiers and > pattern then construct it back as a string it should operate just the same > as if it was qr// object. The reason I care is that the sets code builds up larger sets from simple components or other sets. Each such set must be a single character wide. Most compiled patterns don't fit that. Extra work could allow some of that, but I don't see that as necessary to remove experimental status

Of course I meant *only* such patterns that ONLY contain a charclass definition. At least now. I think it would be nice if we could handle /(a|b|c)/ and realize it is the same as /[abc]/. But I agree such a thing would be a bonus, and is not necessary for removing the experimental status. My general concern is that we don't do anything that breaks the "a string is a regex just as much as qr// object is, provided they are textually logically equivalent". (I include the word logical because of equivalencies like (?^i: ... ) and (?i: ... ), etc, but that is not meant to meant to include the pattern matching equivalence of /(a|b|c)/ and /[abc]/. And of course this comment overlooks such things as (?{ }). I would prefer a better term but I couldn't think of one. ) I believe with your latest patches I have no further objections to removing the experimental status, although I say that without trying the latest patch sequence out. I'll do my best to do so today. But based on your responses to this mail I expect I will have no further objections. Thanks again for your patience on this and apologies again for not coming up with a better way to express my concerns that would have sped the process up. Cheers, Yves

…

-- perl -Mre=debug -e "/just|another|perl|hacker/"

See #16779 (comment) This commit extends the flags accepted by nested calls to regex sets to any legal set. Previously it allowed only '^', as that is what an actual compilation would return. But as pointed out in the conversation in that ticket, it's possible to detach flags from the rest of the pattern, and to write equivalent ones in multiple ways. This commit just changes one line to look for any legal flag, not restricting it to '^'.

p5pRT added Severity Low labels Oct 19, 2019

toddr removed the khw label Oct 25, 2019

khwilliamson mentioned this issue Feb 26, 2020

[EXPERIMENT] extended bracketed character classes #13197

Closed

xenu removed the Severity Low label Dec 29, 2021

khwilliamson mentioned this issue Mar 6, 2022

Make the regex_sets feature non-experimental #19503

Merged

khwilliamson closed this as completed in #19503 Mar 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document formal grammar for regex sets #16779

Document formal grammar for regex sets #16779

p5pRT commented Dec 2, 2018

p5pRT commented Dec 2, 2018

khwilliamson commented Dec 9, 2019

demerphq commented Dec 9, 2019 via email

khwilliamson commented Dec 9, 2019

demerphq commented Dec 9, 2019 via email

khwilliamson commented Dec 9, 2019 via email

demerphq commented Dec 9, 2019 via email

demerphq commented Dec 9, 2019 via email

khwilliamson commented Dec 9, 2019 via email

khwilliamson commented Dec 9, 2019 via email

demerphq commented Dec 10, 2019 via email

khwilliamson commented Jan 27, 2020

demerphq commented Feb 19, 2022

khwilliamson commented Feb 24, 2022 via email

demerphq commented Feb 25, 2022 via email

khwilliamson commented Feb 28, 2022 via email

demerphq commented Mar 1, 2022 via email

Document formal grammar for regex sets #16779

Document formal grammar for regex sets #16779

Comments

p5pRT commented Dec 2, 2018

p5pRT commented Dec 2, 2018

From @khwilliamson

khwilliamson commented Dec 9, 2019

demerphq commented Dec 9, 2019 via email

khwilliamson commented Dec 9, 2019

demerphq commented Dec 9, 2019 via email

khwilliamson commented Dec 9, 2019 via email

demerphq commented Dec 9, 2019 via email

demerphq commented Dec 9, 2019 via email

khwilliamson commented Dec 9, 2019 via email

khwilliamson commented Dec 9, 2019 via email

demerphq commented Dec 10, 2019 via email

khwilliamson commented Jan 27, 2020

demerphq commented Feb 19, 2022

khwilliamson commented Feb 24, 2022 via email

demerphq commented Feb 25, 2022 via email

khwilliamson commented Feb 28, 2022 via email

demerphq commented Mar 1, 2022 via email