Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Basic documentation for the s/// operator.
  • Loading branch information
Jeffrey Goff committed Jan 23, 2016
1 parent 17bcb7d commit a606c27
Showing 1 changed file with 156 additions and 1 deletion.
157 changes: 156 additions & 1 deletion doc/Language/regexes.pod
Expand Up @@ -650,7 +650,162 @@ all named captures:
}
There is a more convenient way to get named captures which is discussed in
the next section.
the section on Subrules.
=SUBTITLE Substitution
Regular expressions can also be used to substitute one piece of text for
another. You can use this for anything from correcting a spelling error (for
instance, replacing 'Perl Jam' with 'Pearl Jam') to reformatting an ISO8601
date from C<yyyy-mm-ddThh:mm:ssZ> to C<mm-dd-yy h:m {AM,PM}> and beyond.
Just like the search-and-replace editor's dialog box, the C<s/ / /> operator
has two sides, a left and right side. The left side is where your matching
expression goes, and the right-hand side is what you want to replace it with.
=head1 X<Lexical conventions|quote,s/ / />
Substitutions are written similarly to matching, but the substitution operator
has both an area for the text to substitute, and the text to be substituted:
s/replace/with/; # a substitution that is applied to $_
$str ~~ s/replace/with/; # a substitution applied to a scalar
The substitution operator allows delimiters other than the slash:
s|replace|with|;
s!replace!with!;
s,replace,with,;
Note that neither the colon C<:> nor balancing delimiters such as C<{}> or
C<()> can be delimiters. Colons clash with adverbs such as C<s:i/Foo/bar/>
and the other delimiters are used for other purposes.
Like the C<m//> operator, whitespace is ignored in general. Comments, as in
Perl 6 in general, start with the hash character C<#> and go to the end of
the current line.
=head1 Replacing literals
The simplest thing to replace is a literal string. The string you want to
replace goes on the left-hand side of the substitution operator, and the string
you want to replace it with goes on the right-hand side, like so:
$_ = 'The Replacements';
s/Replace/Entrap/;
.say; # The Entrapments
Alphanumeric characters and the underscore are literal matches, just as in its
cousin the C<m//> operator. All other characters must be escaped with a
backslash C<\> or included in quotes:
$_ = 'Space: 1999';
s/Space\:/Party like it's/;
.say # Party like it's 1999
Note that the matching restrictions only apply to the left-hand side of the
substitution expression.
By default, substitutions are only done once per string, this is to mitigate
unexpected consequences:
$_ = 'There can be twly two';
s/tw/on/; # Replace 'tw' with 'on' once
.say; # There can be only two
=head1 Wildcards and character classes
Anything that can go into the C<m//> operator can go into the left-hand side
of the substitution operator, including wildcards and character classes. This
is handy when the text you're matching isn't static, such as trying to match
a number in the middle of a string:
$_ = "Blake's 9";
s/\d+/7/; # Replace any sequence of digits with '7'
.say; # Blake's 7
Of course, you can use any of the C<+>, C<*> and C<?> modifiers, and they'll
behave just as they would in the C<m//> operator's context.
=head1 Capturing Groups
Just as in the match operator, capturing groups are allowed on the left-hand
side, and the matched contents populate the C<$0>..C<$n> variables and the
C<$/> object:
$_ = '2016-01-23 18:09:00';
s/ (\d+)\-(\d+)\-(\d+) /today/; # Replace YYYY-MM-DD with 'today'
.say; # today 18:09:00
"$1-$2-$0".say; # 01-23-2016
"$/[1]-$/[2]-$/[0]".say; # 01-23-2016
Any of these variables C<$0>, C<$1>, C<$/> can be used on the right-hand side
of the operator as well, so you can manipulate what you've just matched. This
way you can separate out the C<YYYY>, C<MM> and C<DD> parts of a date and
reformat them into C<MM-DD-YYYY> order:
$_ = '2016-01-23 18:09:00';
s/ (\d+)\-(\d+)\-(\d+) /$0-$1-$2/; # Transform YYYY-MM-DD to MM-DD-YYYY
.say; # 01-23-2016 18:09:00
Since the right-hand side is effectively a regular Perl 6 interpolated string,
you can reformat the time from C<HH:MM> to C<h:MM {AM,PM}> like so:
$_ = '18:38';
s/(\d+)\:(\d+)/{$0 % 12}\:$1 {$0 < 12 ?? 'AM' !! 'PM'}/;
.say; # 6:38 PM
Using the modulo C<%> operator above keeps the sample code under 80 characters,
but is otherwise the same as C<$0 < 12 ?? $0 !! $0 - 12>. When combined with
the power of the Parser Expression Grammars that B<really> underly what you're
seeing here, you can use "regular expressions" to parse pretty much any
language out there.
=head2 Common adverbs
The full list of adverbs that you can apply to regular expressions can be found
elsewhere in this document (L<section Adverbs|#Adverbs>), but the most
common modifiers that you will use are probably C<:g> and C<:i>.
=item Global adverb C<:g>
Ordinarily substitutions are only made once in a given string, but adding the
C<:g> modifier overrides that behavior, so that replacements are made
everywhere possible. Substitutions will never overlap, so for instance:
$_ = q{I can say "banana" but I don't know when to stop};
s:g/na/nana,/; # Substitute 'nana' for 'na'
.say; # ... "banana,nana,"
Even though the substitution doubled the number of C<na>'s in the string, the
substitution only took place twice. Which is to say the substitution only
applies to the original string, the targeted string does not count.
=item Insensitive adverb C<:i>
Substitutions are normally case-sensitive, so that C<s/foo/bar/> will only
match C<'foo'> and not C<'Foo'>. The C<:i> adverb makes matching
case-insensitive:
$_ = 'STAR TREK Into Darkness';
s/Trek Into/TREK\: Into/; # Someone forgot the colon?
.say; # STAR TREK Into Darkness
s:/Trek into/TREK\: Into/; # Thereifixedit
.say; # STAR TREK: Into Darkness
If you want more in-depth descriptions of what these adverbs are actually
doing, look in the L<section Adverbs|#Adverbs> section of this document.
These are just a few of the transformations you can apply with the substitution
operator. Some of the simpler uses in the real world include removing personal
data from log files, editing mySQL timestamps into PostgreSQL format, changing
copyright information in HTML files and sanitizing form fields in a web
application.
As an aside, novices to regular expressions often get overwhelmed and think
that their regular expression needs to match every piece of data in the line,
including what they want to match. Write just enough to match the data you're
looking for, no more, no less.
=head1 X<Subrules|declarator,regex>
Expand Down

0 comments on commit a606c27

Please sign in to comment.