Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Add initial draft of S07-lists.pod, the new Synopsis 7.
  • Loading branch information
pmichaud committed Jul 25, 2012
1 parent a1a0394 commit 89b9891
Showing 1 changed file with 268 additions and 0 deletions.
268 changes: 268 additions & 0 deletions S07-lists.pod
@@ -0,0 +1,268 @@

=encoding utf8

=head1 TITLE

Synopsis 7: Lists and Iteration

=head1 AUTHORS

Patrick R. Michaud <pmichaud@pobox.com

=head1 VERSION

Created: 12 July 2012

Last Modified: 12 July 2012
Version: 1

=head1 Overview

Lists and arrays have always been one of Perl's fundamental data types,
and Perl 6 is no different. However, lists in Perl 6 have been greatly
extended to accommodate lazy lists, infinite lists, lists of mutable
and immutable elements, typed lists, flattening behaviors, and so on.
So where lists and arrays in Perl 5 tended to be finite sequences of
scalar values, in Perl 6 we have additional dimensions of behavior
that must be addressed.

=head2 The C<List> type

The C<List> class is the base class for dealing with other types of
lists, including C<Array>. To the programmer, a C<List> is a potentially
lazy and infinite sequence of elements. A C<List> is mutable, in that
one can manipulate the sequence via operations such as C<push>, C<pop>,
C<shift>, C<unshift>, C<splice>, etc. A C<List>'s elements may be
either mutable or immutable.

C<List> objects are C<Positional>, meaning they can be bound to
array variables and support the postfix C<.[]> operator.

Lists are also lazy, in that the elements of a C<List> may
come from generator functions (called C<Iterator>s) that produce
elements on demand.

=head2 The C<Array> type

An C<Array> is simply a C<List> in which all of the elements are held
in scalar containers. This allows assignment to the elements of the
array.

=head2 The C<Parcel> type

The comma operator (C<< infix:<,> >>) creates C<Parcel> objects.
These should not be confused with lists; a C<Parcel> represents
a raw syntactic sequence of elements. A C<Parcel> is immutable,
although the elements of a C<Parcel> may be either mutable or
immutable.

The name "Parcel" is derived from the phrase "parenthesis cell",
since many C<Parcel> objects appear inside of parentheses.
However, except for the empty parcel, it's the comma operator
that creates C<Parcel> objects.

() # empty Parcel
(1) # an Int
(1,2) # a Parcel with two Ints
(1,) # a Parcel with one Int

A C<Parcel> is also C<Positional>, and uses flattening context for
list operations such as C<.[]> and C<.elems>. See "Flattening contexts"
below.

=head2 Flattening contexts, the C<Iterable> type

C<List> and C<Parcel> objects can have other container objects
as elements. In some contexts we want to interpolate the values
of container objects into the surrounding C<List> or C<Parcel>,
while in other contexts we want any subcontainers to be preserved.
Such interpolation is known as "flattening".

The C<Iterable> type is performed by container and generator objects
that will interpolate their values in flattening contexts. Both
C<List> and C<Parcel> are C<Iterable>. C<Iterable> implies support
for C<.iterator>, which returns an object capable of producing the
elements of the C<Iterable>.

Objects held in scalar containers are never interpolated in
flattening context, even if the object is C<Iterable>.

my @a = 3,4,5;
for 1,2,@a { ... } # five iterations

my $s = @a;
for 1,2,$s { ... } # three iterations

Here, both C<$s> and C<@a> refer to the same underlying C<Array> object,
but the presence of the scalar container prevents C<$s> from being
flattened into the C<for> loop. The C<.list> or C<.flat> method
may be used to restore the flattening behavior:

for 1,2,$s.list { ... } # five iterations
for 1,2,@($s) { ... } # five iterations

Conversely, the C<.item> or C<$()> contextualizer can be used to
prevent an C<Iterable> from being interpolated:

my @b = 1,2,@a; # @b has five elements
my @c = 1,2,@a.item; # @c has three elements
my @c = 1,2,$(@a); # same thing

=head2 Iterators

An C<Iterator> is an object that is able to generate or produce
elements of a list (called "reification"). Because a single
C<Iterator> may be directly or indirectly referenced by
multiple container objects simultaneously, C<Iterator>s provide
an immutable view of the sequence of elements produced, leaving
it up to containers to provide any mutability.

=head3 The C<.reify> method

The C<.reify($n)> method requests an C<Iterator> to return a C<Parcel>
consisting of at least C<$n> reified elements, followed by any additional
iterators needed for the remaining elements in the sequence. For example:

my $r = 10..50;
say $r.iterator.reify(5).perl; # (10, 11, 12, 13, 14, 15..50)

Iterators are allowed to return more or fewer elements than requested
by C<$n>; it's up to the caller to properly handle any shortage or
excess. (In general an iterator is expected to always reify at
least one element, however.) If C<$n> is C<*>, then the iterator
is asked to generate elements according to whatever is most natural
for the thing being iterated. For a C<Range> this might be a substantial
number (or even all) of the elements of the range; for a filehandle
it might read a single record; for a C<List> it may return only the
non-lazy portion of the list.

Once C<.reify> is called on an iterator, the iterator is expected
to return the same results for all subsequent calls to C<.reify>.
This is true even if the C<$n> argument is different from one call
to the next. The general pattern is often something like:

class SomeIter is Iterator {
has $!reified;

method reify($n = 1) {
unless $!reified.defined {
# generate return Parcel and bind to $!reified
}
$!reified;
}
}

=head3 The C<.infinite> method

Because lists in Perl 6 can be lazy, they can also be infinite.
Each C<Iterator> must provide an C<.infinite> method, which returns
a value indicating the knowable finiteness of the iteration:

.infinite Meaning
----------------------------------------------
True iteration is known to be infinite
False iteration is known to be finite
Mu finiteness isn't currently known

As an example, C<Range> iterators can generally know finiteness simply
by looking at the endpoint of the C<Range>. The iterator for the
C<< infix:<...> >> sequence operator treats any sequence ending in
C<*> as being "known infinite", all other C<...> sequences have unknown
finiteness. In the general case it's not possible for loop iterators
to definitively know if the sequence will be finite or infinite.
(Conjecture: There will be a syntax or mechanism for the programmer
to indicate that such sequences are to be treated as known finite
or known infinite.)

=head2 Levels of laziness

Laziness is one of the primary virtues of a Perl programmer; it is
also one of the virtues of Perl 6. However, it's also possible to
succumb to false laziness, so there are times when Perl 6 chooses
to be eager instead of lazy.

Perl 6 defines four levels of laziness:

=over

=item Strictly Lazy

Does not evaluate anything unless explicitly required by the caller,
including not traversing non-lazy objects. This behavior generally
occurs only by a pragma or by explicit lazy primitives.

=item Mostly Lazy

Try to obtain available elements without causing eager evaluation
of other lazy objects. However, the implementation is allowed to
do batching for efficiency. The programmer must not rely on
circular side effects when the implementation is working ahead
like this, or the results will be indeterminate. (However, this
does not rule out pure I<definitional> circularity; earlier array
values may be used in the definition of subsequent values.)

=item Mostly Eager

Obtain all leading items that are not known to be infinite.
The remainder of the items are left for lazy evaluation.
As with the "mostly lazy" case, the programmer must not depend
on circular side effects in any situation where the implementation
is allowed to work ahead. Note that there are no language-defined
limits on the amount of conjectural evaluation allowed, up to the
size of available memory; however, an implementation may allow
the arbitrary limitation of workahead by use of pragmas.

In any case there should be no profound semantic differences
between "mostly lazy" and "mostly eager". These levels are
primarily just hints to the implementation as to whether you are
likely to want to use all the values in the list. Nothing
transactional should be allowed to depend on the difference.

=item Strictly Eager

Obtain all items, failing immediately with an appropriate message
if asked to evaluate any data structures known to be infinite.
(Data structures that are effectively infinite but not provably
so will likely exhaust memory instead.)

=back

=head2 The laziness level of some common operations

=over

=item List assignment; my @a = @something;

In order to provide p5-like behavior in list assignment, it is performed
at the Mostly Eager level. This means that if you do

my @a = foo();

it will eagerly evaluate the return value from C<foo()> to place
elements into C<@a>, stopping only when encountering something
that is "known infinite" or upon reaching the end of the sequence.

=item C<for>, C<map>, C<grep>, etc.

The C<for> loop and looping routines such as C<map>, C<grep>, etc. are
Mostly Lazy by default, but will be eager when evaluated in sink context.

=item Slurpy parameters

Slurpy parameters are Mostly Lazy.

=item Feed operators

The feed operator is strictly lazy, meaning that no operation
should be performed before the user requests any element. That's how

my @a <== grep { ... } <== map { ... } <== grep { ... } <== 1, 2, 3

is completely lazy, even if 1,2,3 is a fairly small known compact
list.

=back

=head2 Common list operations


0 comments on commit 89b9891

Please sign in to comment.