-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EXPERIMENT] variable-length look-behind #18756
Comments
I have proposed on p5p that we mark this a success. |
Do we have any criteria for determining whether an experiment has been successful or not (other than "people haven't complained about it")? |
The main issue in my mind is whether the semantics are sane. If I understand it correctly, when we determine the lookbehind expression has width in the range Thus for example This won't usually affect whether something matches, but it can affect captures, and could defeat attempts to optimize a pattern using the normal rules. Not sure what other issues it could cause.
(I had a typo in the above first time I tried it, which led to #19168.) I don't have a solution to offer, it may be that these semantics are ideal, or that they are more nearly ideal than any other. However I don't recall to what extent this was discussed when VLB was first implemented, and it's worth considering before we bless them as final. |
@demerphq has now looked at #19168, which showed that VLB currently has a major bug; that led to PR #19442. I think the PR has a good chance of making it for the upcoming release (needs more tests, and more eyes), but this case further convinces me that VLB as a whole is not ready to be marked a success. I still hope someone will one day comment on the issues mentioned in my previous comment here. |
On Mon, 4 Oct 2021, 06:22 Hugo van der Sanden, ***@***.***> wrote:
The main issue in my mind is whether the semantics are sane.
If I understand it correctly, when we determine the lookbehind expression
has width in the range {m, n}, we attempt to match the whole expression
on the anchored substring s[-n:-1]; if that fails, we try again on
s[-n+1:-1] and repeat up to s[-m:-1]. That effectively means we prefer
the longest match, unlike the rest of the regexp engine.
The rule is "leftmost longest", so this seems fine to me. Am I missing a
subtlety here?
The fact that the range is bounded is more problematic to me in the sense
that it will cause surprise.
Thus for example a?? will ignore the requested minimality
But again, the rule is leftmost longest. So we will attempt to match at the
leftmost position, we will try to match the empty string, fail because that
can't match at that position, then try "a", assuming we fail we will
advance the cursor and try again. I don't see a problem here, in theory
anyway.
In practice I'm not 100% certain whether (?<=) will match with my
patch.(I'm writing this on my phone) I suspect it won't as we require the
cursor to line up with where it started after matching. We may have to
refine the patch a touch.
and ($FOO|$BAR) will prefer the longer rather than the first of the
alternates.
But again, the rule is leftmost longest. So we will try the leftmost
alternation at the leftmost position. So I don't see a problem here.
I get the feeling you are thinking that in a lookbehind you are expecting
the semantics to behave as though we are matching right to left and thus
with "mirrored" semantics from left to right. That is one of the reasonable
possibilities i suppose, especially if you dont think of the regexe engine
as simulating a DFA, although not the one I would have expected myself, and
not the one we have implemented. If you think of this as a DFA however it
doesn't make sense.
Consider the pattern
/[a-z]+(?<!m+)/
I would expect this to be formally equivalent to
/[a-ln-z]+/
In a DFA construction. The mirror interpretation would be
/[a-z]*[a-ln-m]+/
The length restrictions make it something else yet again.
Frankly lookbehind is full of ambiguity no matter how you slice it. It
sounds like it should be well defined in a formal sense when you discuss
simple case cases but IMO it is not at a formal level of the mathematics of
"regular expressions" - never mind that perls regex engine strictly
speaking doesn't implement mathematical "regular expressions", we still
strive to be as close as possible and use it as a model to understand what
is happening.
This won't usually affect whether something matches, but it can affect
captures, and could defeat attempts to optimize a pattern using the normal
rules. Not sure what other issues it could cause.
% perl -E 'no warnings qw{experimental}; "abfoo" =~ /(?=foo)(?<=(a??b))/ and say $1'
ab
% perl -E 'no warnings qw{experimental}; "abfoo" =~ /(?=foo)(?<=(b|ab))/ and say $1'
ab
%
Both of these match as I would expect based on my understanding of
"leftmost longest".
(I had a typo in the above first time I tried it, which led to #19168
<#19168>.)
I don't have a solution to offer, it may be that these semantics are
ideal, or that they are more ideal than any other. However I don't recall
to what extent this was discussed when VLB was first implemented, and it's
worth considering before we bless them as final.
I think this is not ready to be declared non experimental. Perhaps there
are implementation details I am unaware of at this time that would change
my mind, but based on what I know right now I have serious doubts.
The length restriction is to me hugely problematic. Consider how the length
restriction affects the cases above, and consider something like this:
/\w+(?<!a+b+c+)/
I'd expect that to match a sequence of word characters excluding 'a'
followed by a sequence of word characters excluding 'b', followed by a
sequence of word characters excluding 'c'. But with the length restriction
it will match something quite different, i am pretty sure with a bit of
thought i could come with a case where it matched something that was in
direct contradiction to the "correct" interpretation.
Possibly we can save the situation and make it an error to put something in
a lookbehind that might match more than 256 characters.
Cheers
Yves
|
Thanks, you two. I have been pursuing this exiting experimental only because it seemed to be not moving and to work. I am not in a rush, and will back off until post-5.36! |
On Mon, 21 Feb 2022, 12:11 Ricardo Signes, ***@***.***> wrote:
Thanks, you two. I have been pursuing this exiting experimental only
because it seemed to be not moving and to work. I am not in a rush, and
will back off until post-5.36!
I have had a chance to review this in more detail and I am very pleased to
say I was very wrong. Yesterday I could have sworn unlimited quantifiers
were allowed in lookbehind, but I guess I tested something else, as I said
in the commit message my brain was mush by the end of the day. Every case I
expected to be broken seems to be covered. (Yay Karl!) Provided my PR is
applied and we add a bunch more tests and no further surprises are revealed
I think we will be good for 5.36 contrary to what I said earlier.
I apologize to Karl for doubting him. He did a great job with the max
length checks.
I will follow up with more tests and maybe a slight optimization but I
think we are good!
Yves
… |
On 2/20/22 21:34, Yves Orton wrote:
On Mon, 21 Feb 2022, 12:11 Ricardo Signes, ***@***.***> wrote:
> Thanks, you two. I have been pursuing this exiting experimental only
> because it seemed to be not moving and to work. I am not in a rush, and
> will back off until post-5.36!
>
I have had a chance to review this in more detail and I am very pleased to
say I was very wrong. Yesterday I could have sworn unlimited quantifiers
were allowed in lookbehind, but I guess I tested something else, as I said
in the commit message my brain was mush by the end of the day. Every case I
expected to be broken seems to be covered. (Yay Karl!) Provided my PR is
applied and we add a bunch more tests and no further surprises are revealed
I think we will be good for 5.36 contrary to what I said earlier.
I apologize to Karl for doubting him. He did a great job with the max
length checks.
I will follow up with more tests and maybe a slight optimization but I
think we are good!
Yves
Thanks, but I'm not so sure it's ready to be de-experimentalized.
Are we sure we have the correct semantics? I am now thinking it should
be a mirror of the lookahead assertions, starting at 0, then -1, -2, ...
That would be the most intuitive, and would mean no real performance
penalty for long lookbehinds.
So, some background information.
A few of the world's language scripts have upper/lower case; mostly
those derived from ancient Greek. Of the relatively few characters in
Unicode that have case, about 10% can match under /i a sequence of
characters. In modern Western European languages, this is notably the
German ß character whose traditional upper case is the sequence SS.
Perl did not handle this situation very well, and I started fixing
various areas where it failed. In doing so, this broke innocent code
that was using lookbehind.
Technically, in Unicode password and paßword should match under /i.
This means that if you have a lookbehind assertion that matches 'ss', it
also should match the single character ß. Hence it is variable length.
Before I fixed things, perl simply ignored the ß possibility. But
after I did, it would complain about it being variable length.
Rather than moving the language backwards, the obvious solution was to
allow some, at least limited, form of variable length lookbehind. As I
wrote the patch, I didn't see a clear place as to how to allow things
like this, but forbid more general cases. Besides, , ISTR people had
been complaining about the fixed-length restriction anyway.
I did not invent the 255 byte length limit. I inherited that. That
limit has always been the case AFAIK. Almost certainly it stems from a
single byte in the C structure for a regnode being available for use. I
have not seen any field complaints about that number being too small.
But we could create new regnodes which occupy more bytes so as to
increase the limit.
What I did was merely change the limit from a fixed size to a maximum
size. I was trying to be the least disruptive as I could of quite
obtuse code. I think I did find a bug or two along the way in the
existing implementation.
I didn't consider at the time what the semantics should be. But now I'm
thinking that ideally lookbehind should act the same as lookahead but
with the sign of the directionality changed from positive to negative.
That might be too hard to achieve, or maybe turn not to have other
drawbacks, but until we think about it, and make some determination, we
shouldn't de-experimentalize the feature
|
On Tue, 22 Feb 2022 at 15:05, Karl Williamson ***@***.***>
wrote:
On 2/20/22 21:34, Yves Orton wrote:
> On Mon, 21 Feb 2022, 12:11 Ricardo Signes, ***@***.***> wrote:
>
> > Thanks, you two. I have been pursuing this exiting experimental only
> > because it seemed to be not moving and to work. I am not in a rush, and
> > will back off until post-5.36!
> >
>
> I have had a chance to review this in more detail and I am very pleased
to
> say I was very wrong. Yesterday I could have sworn unlimited quantifiers
> were allowed in lookbehind, but I guess I tested something else, as I
said
> in the commit message my brain was mush by the end of the day. Every
case I
> expected to be broken seems to be covered. (Yay Karl!) Provided my PR is
> applied and we add a bunch more tests and no further surprises are
revealed
> I think we will be good for 5.36 contrary to what I said earlier.
>
> I apologize to Karl for doubting him. He did a great job with the max
> length checks.
>
> I will follow up with more tests and maybe a slight optimization but I
> think we are good!
>
> Yves
>
Thanks, but I'm not so sure it's ready to be de-experimentalized.
Are we sure we have the correct semantics? I am now thinking it should
be a mirror of the lookahead assertions, starting at 0, then -1, -2, ...
That would be the most intuitive, and would mean no real performance
penalty for long lookbehinds.
So, some background information.
A few of the world's language scripts have upper/lower case; mostly
those derived from ancient Greek. Of the relatively few characters in
Unicode that have case, about 10% can match under /i a sequence of
characters. In modern Western European languages, this is notably the
German ß character whose traditional upper case is the sequence SS.
Perl did not handle this situation very well, and I started fixing
various areas where it failed. In doing so, this broke innocent code
that was using lookbehind.
Technically, in Unicode password and paßword should match under /i.
This means that if you have a lookbehind assertion that matches 'ss', it
also should match the single character ß. Hence it is variable length.
Before I fixed things, perl simply ignored the ß possibility. But
after I did, it would complain about it being variable length.
Rather than moving the language backwards, the obvious solution was to
allow some, at least limited, form of variable length lookbehind. As I
wrote the patch, I didn't see a clear place as to how to allow things
like this, but forbid more general cases. Besides, , ISTR people had
been complaining about the fixed-length restriction anyway.
I did not invent the 255 byte length limit. I inherited that. That
limit has always been the case AFAIK. Almost certainly it stems from a
single byte in the C structure for a regnode being available for use. I
have not seen any field complaints about that number being too small.
But we could create new regnodes which occupy more bytes so as to
increase the limit.
Indeed.
What I did was merely change the limit from a fixed size to a maximum
size. I was trying to be the least disruptive as I could of quite
obtuse code. I think I did find a bug or two along the way in the
existing implementation.
Or five? :-)
I didn't consider at the time what the semantics should be. But now I'm
thinking that ideally lookbehind should act the same as lookahead but
with the sign of the directionality changed from positive to negative.
That might be too hard to achieve, or maybe turn not to have other
drawbacks, but until we think about it, and make some determination, we
shouldn't de-experimentalize the feature
I am curious on what grounds you say "ideally" here? This isn't a
"regular" construct, I do not think you can convert lookbehind into a true
DFA construction (eg, one that moves left to right and inspects each byte
only once), therefore there doesn't seem to be an ideal here at all. This
is demonstrated by the various ways that lookbehind is implemented in other
regex engines. See
https://www.regular-expressions.info/lookaround.html
where there is a pretty good summary of the different implementations and
meanings for variable length lookbehind. Unfortunately there is not a
consensus. Some treat lookbehind as atomic (as far as I can tell we do
not[1]), some match truly right to left, (we do not), some match shortest
to longest, some match longest to shortest like we do. Some match in
"alternation order". PCRE matches in alternation order but atomically. So
whatever we do we are aligned with some other regex engines and not aligned
with others.
It seems to me that whatever choice we make with the current implementation
we violate the expectation that alternatives should match in the order they
are specified.
Consider:
"aafoo"=~/(?=foo)(?<=(a|aa))/
With our current max-left to min-left with left to right semantics model $1
will end up as "aa". This violates the expectation that it matches "a" as
it is the first alternation.
But if we change it as you say to min-left to max-left with left-to-right
semantics we would break this:
"aaafoo"=~/(?=foo)(?<=(aa|a))/
and we would match "a" first, which would violate the expectation that we
match "aa". What this says to me is that as long as we match with "normal
left-to-right" semantics as we currently do we are going to do the "wrong"
thing sometimes with positive lookbehind (negative lookbehind
doesnt capture, so these questions are irrelevant).
Arguably we should be converting both into an alternation of lookbehinds
(thanks to Hugo for this observation), which would then behave as expected.
This is afaik how PCRE would match, (except it would treat the construct as
atomic).
"aafoo"=~/(?=foo)(?<=(aa|a))/
"aafoo"=~/(?=foo)(?|(?<=(aa))|(?=(a)))/
"aafoo"=~/(?=foo)(?<=(a|aa))/
"aafoo"=~/(?=foo)(?|(?<=(a))|(?<=(aa)))/
You may have noticed that I had to use (?| ... ) in this conversion. That
is because ((?<=a)) does not capture anything, so I could not convert
(?<=(a|aa))
into
((?<=a)|(?<=aa))
as it would not capture the contents. And converting it into
(?:(?<=(a))|(?<=(aa)))
would have meant two capturing buffers not one. (?| ...) resolves that
problem. However it demonstrates the issues and subtleties that come up
with considering alternative implementations.
Notice that the translation for (?<! ...) would be different. In that case
we can ignore capturing buffers (if their contents matches then the pattern
fails, so the capture buffer doesn't get populated) and turn
/(?=foo)(?!a|aa)/
into
/(?=foo)(?<!a)(?<!aa)/
So really we only care about this fine point of the semantics with
*positive* lookbehind which captures, if there is no capture then it
doesn't matter which we match.
Where this gets interesting, sort of, I think is the scenario that got you
started on this, which is where case-insensitive matches can be implicitly
variable length. So for instance /(?=\xDF)/i should be equivalent to
/(?=\xDF|[sS][sS])/. The reason I said "sort of" is that it seems to me
these questions only matter when the alternatives which would match overlap
(eg could match each other), like (a|aa). Just guessing I would assume with
unicode case folding there aren't any such cases. If that is true then we
can forget those cases, if it isnt, eg, if there is some charact which when
folded can match "X" and "XX" then we would have to make some decisions.
Anyway, what I am saying here is that unless we are going to leave this
marked experimental until we *totally* change how lookbehind is
implemented, changing it from going to max-left to min-left or min-left to
max-left is really not going to "save the day". It just moves the bugs
around.
But I am of the opinion that it is unlikely that we will do these changes,
and if we do I would suggest that some kind of pragma to opt in to the new
(or old) semantics would suffice. I also feel that going from 'max-left' to
'min-left' is likely to be more efficient on average, especially if you
consider that we could use the AHOCORASICK/TRIE opcode to perform the match
efficiently left to right for many cases.
I also think that the current implementation implies the least surprise,
the normal rule of thumb is "leftmost longest". Consider the principals
laid out in perlretut:
…-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----
When a regexp can match a string in several different ways, we can use
the principles above to predict which way the regexp will match:
=over 4
=item *
Principle 0: Taken as a whole, any regexp will be matched at the
earliest possible position in the string.
=item *
Principle 1: In an alternation C<a|b|c...>, the leftmost alternative
that allows a match for the whole regexp will be the one used.
=item *
Principle 2: The maximal matching quantifiers C<'?'>, C<'*'>, C<'+'> and
C<{n,m}> will in general match as much of the string as possible while
still allowing the whole regexp to match.
=item *
Principle 3: If there are two or more elements in a regexp, the
leftmost greedy quantifier, if any, will match as much of the string
as possible while still allowing the whole regexp to match. The next
leftmost greedy quantifier, if any, will try to match as much of the
string remaining available to it as possible, while still allowing the
whole regexp to match. And so on, until all the regexp elements are
satisfied.
=back
As we have seen above, Principle 0 overrides the others. The regexp
will be matched as early as possible, with the other principles
determining how the regexp matches at that earliest character
position.
-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----
My view is that matching lookbehind from max-left to min-left, as we do, is
the most aligned with the principles above. And given we aren't going to
implement right to left DFA style matching any time soon, waiting for these
inconsistencies to be resolved is going to mean that positive lookbehind
stays experimental for a very long time, potentially forever.
Another point I think is relevant, which I mentioned earlier but I would
like to call more attention to is that the only place where we should care
about this *at all* is when there is a positive lookbehind which contains a
capture buffer, either because it would change the content of the capture
buffer, or because that captured text is used later via a backreference or
both. If there is no capture buffer inside of the positive lookbehind it
doesn't matter what it matches. So if you really felt like we had to keep
the door open for changes in the future then I would say we should change
the experimental status on this so the vlb warning is ONLY produced when
you capture inside of a positive lookbehind. But I feel like it is
unnecessary to do so.
Given that there are multiple reasonable interpretations for how lookbehind
should match meaning there is no true "ideal" match order, and given the
current implementation is the most compatible with the base principles of
how matching works I am comfortable with signing this off as it is. If
people cared about these semantic issues they would have raised them in the
last three years. If we ever rewrite the regex engine enough to be able to
offer a different implementation we can give people a way to choose which
they want. Or even introduce new constructs so they can use both at the
same time. Now that you have introduced (*positive_lookbehind:...) we can
easily add (*dfa_positive_lookbehind:...) or something like it for the new
behavior.
cheers,
Yves
[1] This demonstrates that positive lookbehind is not atomic and that we do
backtrack into it:
./perl -Ilib -le'print "aaz"=~/(?<=(a|aa))\1z/ ? "yes:$1:$&" : "no"'
yes:a:az
./perl -Ilib -le'print "aaz"=~/(?<=(aa|a))\1z/ ? "yes:$1:$&" : "no"'
yes:a:az
|
On Wed, 23 Feb 2022 at 04:17, demerphq ***@***.***> wrote:
Correction. Where I said:
for instance /(?=\xDF)/i should be equivalent to /(?=\xDF|[sS][sS])/. The
reason I said "sort of" is that it seems to me these questions only matter
when the
I meant:
for instance /(?<=\xDF)/i should be equivalent to /(?<=\xDF|[sS][sS])/. The
reason I said "sort of" is that it seems to me these questions only matter
when the
Yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"
|
On Mon, 21 Feb 2022 at 03:16, demerphq ***@***.***> wrote:
Consider the pattern
/[a-z]+(?<!m+)/
I would expect this to be formally equivalent to
/[a-ln-z]+/
In a DFA construction. The mirror interpretation would be
/[a-z]*[a-ln-z]+/
I realized after playing with this a bit that I was wrong about these
conversions and I now am inclined to think that there is no clear DFA
construction for lookbehind.
[a-z]+(?<!m+)
would match "mmmmmmma" so it can't be the same as /[a-ln-z]+/, it would be
the same as /[a-z]*[a-ln-z]+/
But I dont think this is a good argument for the mirror interpretation.
Consider that the mirror interpretation, that is matching min-left to
max-left, would produce as many errors as the max-left to min-left. Eg,
with a{1,2} the mirror interpretation would match "a" before it would match
"aa", which would be wrong.
The length restrictions make it something else yet again.
The length restriction is enforced so this concern is resolved for me.
cheers,
yves
…--
perl -Mre=debug -e "/just|another|perl|hacker/"
|
On 2/22/22 20:17, Yves Orton wrote:
I am curious on what grounds you say "ideally" here?
I think now I was wrong. I think we have to fill groups L-R, so that $1
corresponds to the group begun by the leftmost left parenthesis; and
that means we can't do a mirror image.
So maybe the semantics are currently fine.
|
On Tue, 22 Feb 2022 at 15:05, Karl Williamson ***@***.***>
wrote:
On 2/20/22 21:34, Yves Orton wrote:
> On Mon, 21 Feb 2022, 12:11 Ricardo Signes, ***@***.***> wrote:
>
> > Thanks, you two. I have been pursuing this exiting experimental only
> > because it seemed to be not moving and to work. I am not in a rush, and
> > will back off until post-5.36!
> >
>
> I have had a chance to review this in more detail and I am very pleased
to
> say I was very wrong. Yesterday I could have sworn unlimited quantifiers
> were allowed in lookbehind, but I guess I tested something else, as I
said
> in the commit message my brain was mush by the end of the day. Every
case I
> expected to be broken seems to be covered. (Yay Karl!) Provided my PR is
> applied and we add a bunch more tests and no further surprises are
revealed
> I think we will be good for 5.36 contrary to what I said earlier.
>
> I apologize to Karl for doubting him. He did a great job with the max
> length checks.
>
> I will follow up with more tests and maybe a slight optimization but I
> think we are good!
>
> Yves
>
Thanks, but I'm not so sure it's ready to be de-experimentalized.
Ok, well I have removed the de-experimentalization patch from
#19442
IMO we need to get that merged in time for 5.36.0 whatever we do. The
current implementation is just buggy.
Yves
|
On Wed, 23 Feb 2022 at 04:42, Karl Williamson ***@***.***>
wrote:
On 2/22/22 20:17, Yves Orton wrote:
> I am curious on what grounds you say "ideally" here?
I think now I was wrong. I think we have to fill groups L-R, so that $1
corresponds to the group begun by the leftmost left parenthesis; and
that means we can't do a mirror image.
We will violate the expectations of alternation with our current base
implementation *whatever* we do. Changing the order we try things doesn't
help. With max-left to min-left we will do /(?<=(a|aa))/ and
/(?<=(a{1,2}?))/ wrong.. With min-left to max-left we will do /(?<=(aa|a))/
and /(?<=(a{1,2}))/ wrong.
We can't fix this problem by changing the order we try things. We have to
change things entirely.
If you are really concerned we can make the experimental flag trigger only
when there is variable length positive lookbehind that contains a capture
buffer. All the other variants of lookbehind don't care.
So maybe the semantics are currently fine.
I think the current semantics are the least surprising of the options
available to us without forcing a complete rewrite, and the ones that are
most compliant with the base principles of matching, that we match the
leftmost thing first.
So I am fine with us removing the experimental status.
I have removed the de experiment patch from
#19442
and pushed the de experiment patch as:
#19454
so that I can start work on a patch to make the experimental warning
trigger only when lookbehind is variable length AND it includes capturing.
Then we have some options.
Yves
|
On Wed, 23 Feb 2022 at 04:54, demerphq ***@***.***> wrote:
I have removed the de experiment patch from
#19442
Please review and let me know if you have any objections to it being
merged. I have other patches queuing which expect to apply on top of it.
and pushed the de experiment patch as:
#19454
so that I can start work on a patch to make the experimental warning
trigger only when lookbehind is variable length AND it includes capturing.
Then we have some options.
…--
perl -Mre=debug -e "/just|another|perl|hacker/"
|
My initial concern here was that it didn't feel like there had been much discussion of the semantics. @demerphq has convinced me at least that the current semantics are coherent enough to be viable (though I'm not as completely convinced that they are necessary or ideal). My newer concern is that a pretty huge issue such as #19168 was not found by people using it in the wild, which suggests to me it has had minimal take-up even for the @jkeenan asked at the top of this issue:
.. which I don't think ever got an answer. |
On Wed, 23 Feb 2022 at 12:49, Hugo van der Sanden ***@***.***> wrote:
My initial concern here was that it didn't feel like there had been much
discussion of the semantics. @demerphq <https://github.com/demerphq> has
convinced me at least that the current semantics are coherent enough to be
viable (though I'm not as completely convinced that they are necessary or
ideal).
Fair and reasonable.
My newer concern is that a pretty huge issue such as #19168
<#19168> was not found by people
using it in the wild, which suggests to me it has had minimal take-up even
for the //i context that motivated @khwilliamson
<https://github.com/khwilliamson> to add it - I'm not sure on what basis
we can declare the experiment successful if nobody has used it.
I think the /i case is handled differently than the true alternation case.
FWIW, people *did* know, for instance
https://www.regular-expressions.info/lookaround.html
mentions that our implementation is buggy. But they didnt tell us I guess.
:-(
@jkeenan <https://github.com/jkeenan> asked at the top of this issue:
Do we have any criteria for determining whether an experiment has been
successful or not (other than "people haven't complained about it")?
.. which I don't think ever got an answer.
Fair question. I dont know what to say.
Cheers,
Yves
…--
perl -Mre=debug -e "/just|another|perl|hacker/"
|
On 2/23/22 05:00, Yves Orton wrote:
On Wed, 23 Feb 2022 at 12:49, Hugo van der Sanden ***@***.***>
wrote:
> My initial concern here was that it didn't feel like there had been much
> discussion of the semantics. @demerphq <https://github.com/demerphq> has
> convinced me at least that the current semantics are coherent enough
to be
> viable (though I'm not as completely convinced that they are necessary or
> ideal).
>
Fair and reasonable.
> My newer concern is that a pretty huge issue such as #19168
> <#19168> was not found by people
> using it in the wild, which suggests to me it has had minimal take-up
even
> for the //i context that motivated @khwilliamson
> <https://github.com/khwilliamson> to add it - I'm not sure on what basis
> we can declare the experiment successful if nobody has used it.
Anytime one has a sequence 'ss' (any combination of caps) as part of a
lookbehind assertion and are under /iu rules, you are implicitly using a
variable length lookbehind. Such sequences are very common. Tickets
became closable upon the introduction of vlb, and no new ones have since
been generated. I argue that that indicates this feature has had
significant field testing in that regard.
>
I think the /i case is handled differently than the true alternation case.
FWIW, people *did* know, for instance
https://www.regular-expressions.info/lookaround.html
mentions that our implementation is buggy. But they didnt tell us I guess.
:-(
I just sent email to the site asking for failing test cases
@jkeenan <https://github.com/jkeenan> asked at the top of this issue:
>
> Do we have any criteria for determining whether an experiment has been
> successful or not (other than "people haven't complained about it")?
>
> .. which I don't think ever got an answer.
>
Fair question. I dont know what to say.
Many people, myself generally included, will avoid using an experimental
feature, to avoid being on the bleeding edge.
My view is that at some point one has to declare it as accepted absent
negative feedback This means that we commit to supporting it if buggy.
…
Cheers,
Yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"
—
Reply to this email directly, view it on GitHub
<#18756 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA2DH6ZXR6QOEF4A6TFSPLU4TD5VANCNFSM434YFM2A>.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Limited variable-length look-behind was first released in perl v5.30.0 as an experimental feature. This issue tracks its progress toward the end of its experimental phase.
The text was updated successfully, but these errors were encountered: