Skip to content

Latest commit

 

History

History
262 lines (181 loc) · 8.71 KB

nested_data_structures.pod

File metadata and controls

262 lines (181 loc) · 8.71 KB

Nested Data Structures

Perl's aggregate data types--arrays and hashes--allow you to store scalars indexed by integer or string keys. Perl 5's references (references) allow you to access aggregate data types through special scalars. Nested data structures in Perl, such as an array of arrays or a hash of hashes, are possible through the use of references.

Use the anonymous reference declaration syntax to declare a nested data structure:

Use Perl's reference syntax to access elements in nested data structures. The sigil denotes the amount of data to retrieve, and the dereferencing arrow indicates that the value of one portion of the data structure is a reference:

The only way to nest a multi-level data structure is through references, so the arrow is superfluous. You may omit it for clarity, except for invoking function references:

Use disambiguation blocks to access components of nested data structures as if they were first-class arrays or hashes:

... or to slice a nested data structure:

Whitespace helps, but does not entirely eliminate the noise of this construct. Use temporary variables to clarify:

... or use for's implicit aliasing to $_ to avoid the use of an intermediate reference:

perldoc perldsc, the data structures cookbook, gives copious examples of how to use Perl's various data structures.

Autovivification

When you attempt to write to a component of a nested data structure, Perl will create the path through the data structure to the destination as necessary:

After the second line of code, this array of arrays of arrays of arrays contains an array reference in an array reference in an array reference in an array reference. Each array reference contains one element. Similarly, treating an undefined value as if it were a hash reference in a nested data structure will make it so:

This useful behavior is autovivification. While it reduces the initialization code of nested data structures, it cannot distinguish between the honest intent to create missing elements in nested data structures and typos. The autovivification pragma (pragmas) from the CPAN lets you disable autovivification in a lexical scope for specific types of operations.

You may wonder at the contradiction between taking advantage of autovivification while enabling strictures. The question is one of balance. Is it more convenient to catch errors which change the behavior of your program at the expense of disabling error checks for a few well-encapsulated symbolic references? Is it more convenient to allow data structures to grow rather than specifying their size and allowed keys?

The answers depend on your project. During early development, allow yourself the freedom to experiment. While testing and deploying, consider an increase of strictness to prevent unwanted side effects. Thanks to the lexical scoping of the strict and autovivification pragmas, you can enable these behaviors where and as necessary.

You can verify your expectations before dereferencing each level of a complex data structure, but the resulting code is often lengthy and tedious. It's better to avoid deeply nested data structures by revising your data model to provide better encapsulation.

Debugging Nested Data Structures

The complexity of Perl 5's dereferencing syntax combined with the potential for confusion with multiple levels of references can make debugging nested data structures difficult. Two good visualization tools exist.

The core module Data::Dumper converts values of arbitrary complexity into strings of Perl 5 code:

This is useful for identifying what a data structure contains, what you should access, and what you accessed instead. Data::Dumper can dump objects as well as function references (if you set $Data::Dumper::Deparse to a true value).

While Data::Dumper is a core module and prints Perl 5 code, its output is verbose. Some developers prefer the use of the YAML::XS or JSON modules for debugging. They do not produce Perl 5 code, but their outputs can be much clearer to read and to understand.

Circular References

Perl 5's memory management system of reference counting (reference_counts) has one drawback apparent to user code. Two references which eventually point to each other form a circular reference that Perl cannot destroy on its own. Consider a biological model, where each entity has two parents and zero or more children:

Both $alice and $robert contain an array reference which contains $cianne. Because $cianne is a hash reference which contains $alice and $robert, Perl can never decrease the reference count of any of these three people to zero. It doesn't recognize that these circular references exist, and it can't manage the lifespan of these entities.

Either break the reference count manually yourself (by clearing the children of $alice and $robert or the parents of $cianne), or use weak references. A weak reference is a reference which does not increase the reference count of its referent. Weak references are available through the core module Scalar::Util. Its weaken() function prevents a reference count from increasing:

Now $cianne will retain references to $alice and $robert, but those references will not by themselves prevent Perl's garbage collector from destroying those data structures. Most data structures do not need weak references, but when they're necessary, they're invaluable.

Alternatives to Nested Data Structures

While Perl is content to process data structures nested as deeply as you can imagine, the human cost of understanding these data structures and their relationships--to say nothing of the complex syntax--is high. Beyond two or three levels of nesting, consider whether modeling various components of your system with classes and objects (moose) will allow for clearer code.

POD ERRORS

Hey! The above document had some coding errors, which are explained below:

Around line 3:

A non-empty Z<>

Around line 111:

A non-empty Z<>

Around line 195:

A non-empty Z<>