Skip to content
Browse files

Applied edits from hbm on PerlMonks.

  • Loading branch information...
1 parent 319b2a7 commit 83df8b895895a9f1fcc7c5b272cb1a975ff1049c @chromatic committed Sep 24, 2010
Showing with 80 additions and 69 deletions.
  1. +6 −0 CREDITS
  2. +20 −12 sections/operator_characteristics.pod
  3. +54 −57 sections/regular_expressions.pod
View
6 CREDITS
@@ -173,3 +173,9 @@ E: cstith@gmail.com
N: Mike Huffman
E: mhuffman@aracnet.com
+
+N: E. Choroba
+E: choroba on PerlMonks
+
+N: hbm
+E: hbm on PerlMonks
View
32 sections/operator_characteristics.pod
@@ -133,15 +133,23 @@ X<fixity; circumfix>
X<postcircumfix>
X<fixity; postcircumfix>
-The I<fixity> of an operator is its position relative to its operands. The
-mathematic operators tend to be I<infix> operators, where they appear between
-their operands. Other operators are I<prefix>, where they appear before their
-operands; these tend to be unary operators, such as the prefix increment
-operator C<++$x> or the mathematical and boolean negation operators (C<-$x> and
-C<!$x>, respectively). I<Postfix> operators appear after their operands (such
-as postfix increment C<$x++>). I<Circumfix> operators surround their operands,
-such as the anonymous hash and anonymous array creation operators or quoting
-operators (C<{ ... }> and C<[ ... ]> or C<qq{ ... }>, for example).
-I<Postcircumfix> operators surround some operands but follow others, as in the
-case of array or hash indices (C<$hash{ ... }> and C<$array[ ... ]>, for
-example).
+An operator's I<fixity> is its position relative to its operands:
+
+=over 4
+
+=item I<Infix> operators appear between their operands. Most mathematical
+operators are infix operators, such as the multiplication operator in C<$length
+* $width>.
+
+=item I<Prefix> operators appear before their operators and I<postfix>
+operators appear after. These operators tend to be unary, such as mathematic
+negation (C<-$x>), boolean negation (C<!$y>), and postfix increment (C<$z++>).
+
+=item I<Circumfix> operators surround their operands. Examples include the
+anonymous hash constructor (C<{ ... }>) and quoting operators (C<qq[ ... ]>).
+
+=item I<Postcircumfix> operators follow certain operands and surround others,
+as in the case of hash or array element access (C<$hash{ ... }> and C<$array[
+... ]>).
+
+=back
View
111 sections/regular_expressions.pod
@@ -197,8 +197,8 @@ X<greedy quantifiers>
X<quantifiers; greedy>
The C<+> and C<*> quantifiers by themselves are I<greedy quantifiers>; they
-match as many times as possible. This is particularly pernicious when using
-the tempting-but-troublesome "match any amount of anything" pattern C<.*>:
+match as much of the input string as possible. This is particularly pernicious
+when matching "any amount of anything" with C<.*>:
=begin programlisting
@@ -211,11 +211,10 @@ the tempting-but-troublesome "match any amount of anything" pattern C<.*>:
=end programlisting
-The problem is more obvious when you expect to match a short portion of a
-string. Greediness always tries to match as much of the input string as
-possible I<first>, backing off only when it's obvious that the match will not
-succeed. Thus you may not be able to fit all of the results into the four
-boxes in 7 Down if you go looking for "loam" with:
+Greedy quantifiers always try to match as much of the input string as possible
+I<first>, backing off only when it's obvious that the match will not succeed.
+You may not be able to fit all of the results into the four boxes in 7 Down if
+you go looking for "loam" with:
=begin programlisting
@@ -227,96 +226,94 @@ You'll get C<Alabama>, C<Belgium>, and C<Bethlehem> for starters. The soil
might be nice there, but they're all too long--and the matches start in the
middle of the words.
-X<regex anchors>
-X<anchors; start of string>
-
-I<Regex anchors> force a match at a specific position in a string. The I<start
-of string anchor> (C<\A>) ensures that any match will start at the beginning of
-the string:
+Turn a greedy quantifier into a non-greedy quantifier by appending the C<?>
+quantifier:
=begin programlisting
- # also matches "lammed", "lawmaker", and "layman"
- my $seven_down = qr/\Al${letters_only}{2}m/;
+ my $minimal_greedy_match = qr/hot.*?meal/;
=end programlisting
-X<anchors; end of string>
-
-Similarly, the I<end of line string anchor> (C<\Z>) ensures that any match will
-I<end> at the end of the string.
+When given a non-greedy quantifier, the regular expression engine will prefer
+the I<shortest> possible potential match, and will increase the number of
+characters identified by the C<.*?> token combination only if the current
+number fails to match. Because C<*> matches zero or more times, the minimal
+potential match for this token combination is zero characters:
=begin programlisting
- # also matches "loom", which is close enough
- my $seven_down = qr/\Al${letters_only}{2}m\Z/;
+ say 'Found a hot meal' if 'ilikeahotmeal' =~ /$minimal_greedy_match/;
=end programlisting
-X<word boundary metacharacter>
-
-If you're not fortunate enough to have a Unix word dictionary file available,
-the I<word boundary metacharacter> (C<\b>) matches only at the boundary between
-a word character (C<\w>) and a non-word character (C<\W>):
+Use the C<+> quantifier to match one or more items:
=begin programlisting
- my $seven_down = qr/\bl${letters_only}{2}m\b/;
+ my $minimal_greedy_at_least_one = qr/hot.+?meal/;
+
+ unlike( 'ilikeahotmeal', $minimal_greedy_at_least_one );
+
+ like( 'i like a hot meal', $minimal_greedy_at_least_one );
=end programlisting
-=begin sidebar
+The C<?> quantifier modifier also applies to the C<?> (zero or one matches)
+quantifier as well as the range quantifiers. In every case, it causes the
+regex to match as little of the input as possible.
-Like Perl, there's more than one way to write a regular expression. Consider
-choosing the most expressive and maintainable one.
+The greedy modifiers C<.+> and C<.*> are tempting but dangerous. If you write
+regular expression with greedy matches, test them thoroughly with a
+comprehensive and automated test suite with representative data to lessen the
+possibility of unpleasant surprises.
-=end sidebar
+=head1 Regex Anchors
-Sometimes you can't anchor a regular expression. In those cases, you can turn
-a greedy quantifier into a non-greedy quantifier by appending the C<?>
-quantifier:
+X<regex anchors>
+X<anchors; start of string>
+
+I<Regex anchors> force a match at a specific position in a string. The I<start
+of string anchor> (C<\A>) ensures that any match will start at the beginning of
+the string:
=begin programlisting
- my $minimal_greedy_match = qr/hot.*?meal/;
+ # also matches "lammed", "lawmaker", and "layman"
+ my $seven_down = qr/\Al${letters_only}{2}m/;
=end programlisting
-In this case, the regular expression engine will prefer the I<shortest>
-possible potential match, increasing the number of characters identified by the
-C<.*?> token combination only if the current number fails to match. Because
-C<*> matches zero or more times, the minimal potential match for this token
-combination is zero characters:
+X<anchors; end of string>
+
+The I<end of line string anchor> (C<\Z>) ensures that any match will I<end> at
+the end of the string.
=begin programlisting
- say 'Found a hot meal' if 'ilikeahotmeal' =~ /$minimal_greedy_match/;
+ # also matches "loom", which is close enough
+ my $seven_down = qr/\Al${letters_only}{2}m\Z/;
=end programlisting
-If this isn't what you want, use the C<+> quantifier to match one or more
-items:
+X<word boundary metacharacter>
+
+The I<word boundary metacharacter> (C<\b>) matches only at the boundary between
+a word character (C<\w>) and a non-word character (C<\W>). Thus to find
+C<loam> but not C<Belgium>, use the anchored regex:
=begin programlisting
- my $minimal_greedy_at_least_one = qr/hot.+?meal/;
+ my $seven_down = qr/\bl${letters_only}{2}m\b/;
- unlike( 'ilikeahotmeal', $minimal_greedy_at_least_one );
+=end programlisting
- like( 'i like a hot meal', $minimal_greedy_at_least_one );
+=begin sidebar
-=end programlisting
+Like Perl, there's more than one way to write a regular expression. Consider
+choosing the most expressive and maintainable one.
-The C<?> quantifier modifier also applies to the C<?> (zero or one matches)
-quantifier as well as the range quantifiers. In every case, it causes the
-regex to match as few times as possible.
-
-In general, the greedy modifiers C<.+> and C<.*> are tempting but dangerous
-tools. For simple programs which need little maintenance, they may be quick
-and easy to write, but non-greedy matching seems to match human expectations
-better. If you find yourself writing a lot of regular expression with greedy
-matches, test them thoroughly with a comprehensive and automated test suite
-with representative data to lessen the possibility of unpleasant surprises.
+=end sidebar
=head1 Metacharacters

0 comments on commit 83df8b8

Please sign in to comment.
Something went wrong with that request. Please try again.