Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
New regex interpolation trap section
  • Loading branch information
threadless-screw committed Aug 7, 2019
1 parent 98f809b commit 092c748
Showing 1 changed file with 59 additions and 0 deletions.
59 changes: 59 additions & 0 deletions doc/Language/traps.pod6
Expand Up @@ -921,6 +921,65 @@ of the assignment operators instead:
=head1 Regexes
=head1 Interpolation constructs
Perl 6 offers several constructs to generate regexes at runtime through
interpolation (see their detailed description
L<here|/language/regexes#Regex_interpolation>). When a thus generated regex
contains only characters that match themselves, some of these constructs behave
identically, as if they are equivalent alternatives. As soon as the generated
regex contains metacharacters, however, they behave differently, which may come
as an unpleasant and confusing surprise.
The first two constructs that may easily be confused with each other are
C«$variable» and C«<$variable>». The former causes the (stringified) variable to
match literally, while the latter causes the (stringified) variable to match as
a regex. As long as the variable comprises only characters that, in a regex,
match themselves (i.e. alphanumeric characters and the underscore), there is no
distinction between the constructs:
my $variable = 'camelia';
say ‘I ♥ camelia’ ~~ / $variable /; # OUTPUT: 「camelia」
say ‘I ♥ camelia’ ~~ / <$variable> /; # OUTPUT: 「camelia」
But when the variable is changed to comprise regex metacharacters, i.e.
characters that are neither alphanumeric nor the underscore C<_>, the outputs
become different:
my $variable = '#camelia';
say ‘I ♥ #camelia’ ~~ / $variable /; # OUTPUT: 「#camelia」
say ‘I ♥ #camelia’ ~~ / <$variable> /; # !! Error: malformed regex
What happens here is that the string C<#camelia> contains the metacharacter
C<#>. In the context of a regex, this character should be quoted to match
literally; without quoting, the C<#> is parsed as the start of a comment that
runs until the end of the line, which in turn causes the regex not to be
terminated, and thus to be malformed.
Two other constructs that must similarly be distinguished from one another are
C«$(code)» and C«<{code}>». The former construct runs user-specified code within
the regex and interpolates the (stringified) return value literally. The latter
also runs user-specified code within the regex, but interpolates the
(stringified) return value as a regex. So, like before, as long as the return
value comprises only characters that match literally in a regex, there is no
distinction between the two:
my $variable = 'ailemac;
say ‘I ♥ camelia’ ~~ / $($variable.flip) /; # OUTPUT: 「camelia」
say ‘I ♥ camelia’ ~~ / <{$variable.flip}> /; # OUTPUT: 「camelia」
But when the return value is changed to comprise regex metacharacters, the
outputs diverge:
my $variable = 'ailema.';
say ‘I ♥ camelia’ ~~ / $($variable.flip) /; # OUTPUT: Nil
say ‘I ♥ camelia’ ~~ / <{$variable.flip}> /; # OUTPUT: 「camelia」
In this case the return value of the code is the string C<.amelia>, which
contains the metacharacter C<.>. The above attempt by C«$(code)» to match the
dot literally fails; the attempt by C«<{code}>» to match the dot as a regex
wildcard succeeds. Hence the different outputs.
=head2 C<|> vs C<||>: which branch will win
To match one of several possible alternatives, C<||> or C<|> will be used. But
Expand Down

0 comments on commit 092c748

Please sign in to comment.