Browse files

Edited nested data structures section.

  • Loading branch information...
1 parent 4fe1470 commit 401ff816adda0b8c9df425a665021ba9c2c4eae1 @chromatic chromatic committed Sep 7, 2011
Showing with 73 additions and 90 deletions.
  1. +73 −90 sections/nested_data_structures.pod
View
163 sections/nested_data_structures.pod
@@ -5,9 +5,9 @@ X<data structures>
X<nested data structures>
Perl's aggregate data types--arrays and hashes--allow you to store scalars
-indexed by integers or string keys. Perl 5's references (L<references>) allow
-you to access aggregate data types indirectly, through special scalars. Nested
-data structures in Perl, such as an array of arrays or a hash of hashes, are
+indexed by integer or string keys. Perl 5's references (L<references>) allow
+you to access aggregate data types through special scalars. Nested data
+structures in Perl, such as an array of arrays or a hash of hashes, are
possible through the use of references.
=head2 Declaring Nested Data Structures
@@ -17,9 +17,9 @@ A simple declaration of an array of arrays might be:
=begin programlisting
my @famous_triplets = (
- [qw( eenie miney moe )],
- [qw( huey dewey louie )],
- [qw( duck duck goose )],
+ [qw( eenie miney moe )],
+ [qw( huey dewey louie )],
+ [qw( duck duck goose )],
);
=end programlisting
@@ -29,8 +29,8 @@ A simple declaration of an array of arrays might be:
=begin programlisting
my %meals = (
- breakfast => { entree => 'eggs', side => 'hash browns' },
- lunch => { entree => 'panini', side => 'apple' },
+ breakfast => { entree => 'eggs', side => 'hash browns' },
+ lunch => { entree => 'panini', side => 'apple' },
dinner => { entree => 'steak', side => 'avocado salad' },
);
@@ -45,7 +45,7 @@ elements to the list.
=head2 Accessing Nested Data Structures
-Accessing elements in nested data structures uses Perl's reference syntax. The
+Use Perl's reference syntax to access elements in nested data structures. The
sigil denotes the amount of data to retrieve, and the dereferencing arrow
indicates that the value of one portion of the data structure is a reference:
@@ -56,9 +56,8 @@ indicates that the value of one portion of the data structure is a reference:
=end programlisting
-In the case of a nested data structure, the only way to nest a data structure
-is through references, thus the arrow is superfluous. This code is equivalent
-and clearer:
+The only way to nest a multi-level data structure is through references, so the
+arrow is superfluous. You may omit it for clarity:
=begin programlisting
@@ -69,14 +68,13 @@ and clearer:
=begin sidebar
-You can avoid the arrow in every case except invoking a function reference
-stored in a nested data structure, where the arrow invocation syntax is the
-clearest mechanism of invocation.
+The arrow invocation syntax is clearest only in the case of invoking a function
+reference stored in a nested data structure.
=end sidebar
-Accessing components of nested data structures as if they were first-class
-arrays or hashes requires disambiguation blocks:
+Use disambiguation blocks to access components of nested data structures as if
+they were first-class arrays or hashes:
=begin programlisting
@@ -85,16 +83,16 @@ arrays or hashes requires disambiguation blocks:
=end programlisting
-Similarly, slicing a nested data structure requires additional punctuation:
+... or to slice a nested data structure:
=begin programlisting
my ($entree, $side) = @{ $meals{breakfast} }{qw( entree side )};
=end programlisting
-The use of whitespace helps, but it does not entirely eliminate the noise of
-this construct. Sometimes using temporary variables can clarify:
+Whitespace helps, but does not entirely eliminate the noise of this construct.
+Use temporary variables to clarify:
=begin programlisting
@@ -105,7 +103,7 @@ this construct. Sometimes using temporary variables can clarify:
X<aliasing>
-You can also use C<for>'s implicit aliasing to C<$_> to avoid the use of an
+... or use C<for>'s implicit aliasing to C<$_> to avoid the use of an
intermediate reference:
=begin programlisting
@@ -114,20 +112,19 @@ intermediate reference:
=end programlisting
-... though clarity should be a concern.
-
+... though always keep clarity in mind.
C<perldoc perldsc>, the data structures cookbook, gives copious examples of how
-to use the various types of data structures available in Perl.
+to use Perl's various data structures.
=head2 Autovivification
Z<autovivification>
X<autovivification>
-Perl's expressivity extends to nested data structures. When you attempt to
+Perl's expressivity extends to nested data structures. When you attempt to
write to a component of a nested data structure, Perl will create the path
-through the data structure to that piece if it does not exist:
+through the data structure to the destination as necessary:
=begin programlisting
@@ -138,9 +135,9 @@ through the data structure to that piece if it does not exist:
After the second line of code, this array of arrays of arrays of arrays
contains an array reference in an array reference in an array reference in an
-array reference. Each array reference contains one element. Similarly,
-treating an undefined value as if it were a hash reference in a nested data
-structure will create intermediary hashes, keyed appropriately:
+array reference. Each array reference contains one element. Similarly, treating
+an undefined value as if it were a hash reference in a nested data structure
+will create intermediary hashes:
=begin programlisting
@@ -153,49 +150,42 @@ X<autovivification>
X<C<autovivification> pragma>
X<pragmas; C<autovivification>>
-This behavior is I<autovivification>, and it's more often useful than it isn't.
-Its benefit is in reducing the initialization code of nested data structures.
-Its drawback is in its inability to distinguish between the honest intent to
-create missing elements in nested data structures and typos.
-
-The C<autovivification> pragma on the CPAN (L<pragmas>) lets you disable
-autovivification in a lexical scope for specific types of operations; it's
-worth your time to consider this in large projects, or projects with multiple
-developers.
+This useful behavior is I<autovivification>. While it reduces the
+initialization code of nested data structures, it cannot distinguish between
+the honest intent to create missing elements in nested data structures and
+typos. The CPAN's C<autovivification> pragma (L<pragmas>) lets you disable
+autovivification in a lexical scope for specific types of operations.
=begin sidebar
-You can also check for the existence of specific hash keys and the number of
-elements in arrays before dereferencing each level of a complex data structure,
-but that can produce tedious, lengthy code which many programmers prefer to
-avoid.
+You I<can> verify your expectations before dereferencing each level of a
+complex data structure, but the resulting code is often lengthy and tedious.
=end sidebar
You may wonder at the contradiction between taking advantage of
-autovivification while enabling C<strict>ures. The question is one of balance.
-is it more convenient to catch errors which change the behavior of your program
-at the expense of disabling those error checks for a few well-encapsulated
-symbolic references? Is it more convenient to allow data structures to grow
-rather than specifying their size and allowed keys?
-
-The answer to the latter question depends on your specific project. When
-initially developing, you can allow yourself the freedom to experiment. When
-testing and deploying, you may want to increase strictness to prevent unwanted
-side effects. Thanks to the lexical scoping of the C<strict> and
-C<autovivification> pragmas, you can enable and disable these behaviors as
-necessary.
+autovivification while enabling C<strict>ures. The question is one of balance.
+Is it more convenient to catch errors which change the behavior of your program
+at the expense of disabling error checks for a few well-encapsulated symbolic
+references? Is it more convenient to allow data structures to grow rather than
+specifying their size and allowed keys?
+
+The answers depend on your project. During early development, allow yourself
+the freedom to experiment. While testing and deploying, consider an increase of
+strictness to prevent unwanted side effects. Thanks to the lexical scoping of
+the C<strict> and C<autovivification> pragmas, you can enable these behaviors
+where and as necessary.
=head2 Debugging Nested Data Structures
The complexity of Perl 5's dereferencing syntax combined with the potential for
confusion with multiple levels of references can make debugging nested data
-structures difficult. Two good options exist for visualizing them.
+structures difficult. Two good visualization tools exist.
X<C<Data::Dumper>>
-The core module C<Data::Dumper> can stringify values of arbitrary complexity
-into Perl 5 code:
+The core module C<Data::Dumper> converts values of arbitrary complexity into
+strings of Perl 5 code:
=begin programlisting
@@ -206,14 +196,13 @@ into Perl 5 code:
=end programlisting
This is useful for identifying what a data structure contains, what you should
-access, and what you accessed instead. C<Data::Dumper> can dump objects as
-well as function references (if you set C<$Data::Dumper::Deparse> to a true
-value).
+access, and what you accessed instead. C<Data::Dumper> can dump objects as well
+as function references (if you set C<$Data::Dumper::Deparse> to a true value).
-While C<Data::Dumper> is a core module and prints Perl 5 code, it also produces
-verbose output. Some developers prefer the use of the C<YAML::XS> or C<JSON>
-modules for debugging. You have to learn a different format to understand
-their outputs, but their outputs can be much clearer to read and to understand.
+While C<Data::Dumper> is a core module and prints Perl 5 code, its output is
+verbose. Some developers prefer the use of the C<YAML::XS> or C<JSON> modules
+for debugging. They do not produce Perl 5 code, but their outputs can be much
+clearer to read and to understand.
=head2 Circular References
@@ -224,9 +213,9 @@ X<memory management; circular references>
X<garbage collection>
Perl 5's memory management system of reference counting (L<reference_counts>)
-has one drawback apparent to user code. Two references which end up pointing
+has one drawback apparent to user code. Two references which eventually point
to each other form a I<circular reference> that Perl cannot destroy on its own.
-Consider a biological model, where each entity has two parents and can have
+Consider a biological model, where each entity has two parents and zero or more
children:
=begin programlisting
@@ -240,22 +229,21 @@ children:
=end programlisting
-Because both C<$alice> and C<$robert> contain an array reference which contains
-C<$cianne>, and because C<$cianne> is a hash reference which contains C<$alice>
-and C<$robert>, Perl can never decrease the reference count of any of these
-three people to zero. It doesn't recognize that these circular references
-exist, and it can't manage the lifespan of these entities.
+Both C<$alice> and C<$robert> contain an array reference which contains
+C<$cianne>. Because C<$cianne> is a hash reference which contains C<$alice> and
+C<$robert>, Perl can never decrease the reference count of any of these three
+people to zero. It doesn't recognize that these circular references exist, and
+it can't manage the lifespan of these entities.
X<references; weak>
X<weak references>
X<C<Scalar::Util>>
-You must either break the reference count manually yourself (by clearing the
-children of C<$alice> and C<$robert> or the parents of C<$cianne>), or take
-advantage of a feature called I<weak references>. A weak reference is a
-reference which does not increase the reference count of its referent. Weak
-references are available through the core module C<Scalar::Util>. Export the
-C<weaken()> function and use it on a reference to prevent the reference count
+Either break the reference count manually yourself (by clearing the children of
+C<$alice> and C<$robert> or the parents of C<$cianne>), or use I<weak
+references>. A weak reference is a reference which does not increase the
+reference count of its referent. Weak references are available through the core
+module C<Scalar::Util>. Its C<weaken()> function prevents a reference count
from increasing:
=begin programlisting
@@ -274,20 +262,15 @@ from increasing:
=end programlisting
-With this accomplished, C<$cianne> will retain references to C<$alice> and
-C<$robert>, but those references will not by themselves prevent Perl's garbage
-collector from destroying those data structures. You rarely have to use weak
-references if you design your data structures correctly, but they're useful in
-a few situations.
+Now C<$cianne> will retain references to C<$alice> and C<$robert>, but those
+references will not by themselves prevent Perl's garbage collector from
+destroying those data structures. Most data structures do not need weak
+references, but when they're necessary, they're invaluable.
=head2 Alternatives to Nested Data Structures
While Perl is content to process data structures nested as deeply as you can
-imagine, the human cost of understanding these data structures as well as the
-relationship of various pieces, not to mention the syntax required to access
-various portions, can be high. Beyond two or three levels of nesting, consider
-whether modeling various components of your system with classes and objects
-(L<moose>) will allow for a clearer representation of your data.
-
-Sometimes bundling data with behaviors appropriate to that data can clarify
-code.
+imagine, the human cost of understanding these data structures and their
+relationships--to say nothing of the complex syntax--is high. Beyond two or
+three levels of nesting, consider whether modeling various components of your
+system with classes and objects (L<moose>) will allow for clearer code.

0 comments on commit 401ff81

Please sign in to comment.