S02-bits.pod

=encoding utf8

=head1 TITLE

Synopsis 2: Bits and Pieces

=head1 VERSION

    Created: 10 Aug 2004

    Last Modified: 16 Oct 2015
    Version: 296

This document summarizes Apocalypse 2, which covers small-scale lexical
items and typological issues.  (These Synopses also contain updates to
reflect the evolving design of Perl 6 over time, unlike the Apocalypses,
which are frozen in time as "historical documents".  These updates are not
marked--if a Synopsis disagrees with its Apocalypse, assume the Synopsis is
correct.)

=head1 One-pass parsing

To the extent allowed by sublanguages' parsers, Perl is parsed using a
one-pass, predictive parser.  That is, lookahead of more than one "longest
token" is discouraged.  The currently known exceptions to this are where the
parser must:

=over 4

=item *

Locate the end of interpolated expressions that begin with a sigil and might
or might not end with brackets.

=item *

Recognize that a reduce operator is not really beginning a C<[...]>
composer.

=back

One-pass parsing is fundamental to knowing exactly which language you are
dealing with at any moment, which in turn is fundamental to allowing
unambiguous language mutation in any desired direction.  (Generic languages
are allowed, but only if intended; accidentally generic languages lead to
loss of linguistic identity and integrity.  This is the hard lesson of
Perl 5's source filters and other multi-pass parsing mistakes.)

=head1 Lexical Conventions

=head2 Unicode Semantics

In the abstract, Perl is written in Unicode, and has consistent Unicode
semantics regardless of the underlying text representations.  By default
Perl presents Unicode in "NFG" formation, where each grapheme counts as one
character.  A grapheme is what the novice user would think of as a character
in their normal everyday life, including any diacritics.

Perl can count Unicode line and paragraph separators as line markers, but
that behavior had better be configurable so that Perl's idea of line numbers
matches what your editor thinks about Unicode lines.

Unicode horizontal whitespace is counted as whitespace, but it's better not
to use thin spaces where they will make adjoining tokens look like a single
token.  On the other hand, Perl doesn't use indentation as syntax, so you
are free to use any amount of whitespace anywhere that whitespace makes
sense. Comments always count as whitespace.

=head2 Bracketing Characters

For some syntactic purposes, Perl distinguishes bracketing characters from
non-bracketing.  Bracketing characters are defined as any Unicode characters
with either bidirectional mirrorings or Ps/Pe/Pi/Pf properties.

In practice, though, you're safest using matching characters with
Ps/Pe/Pi/Pf properties, though ASCII angle brackets are a notable exception,
since they're bidirectional but not in the Ps/Pe/Pi/Pf sets.

Characters with no corresponding closing character do not qualify as opening
brackets.  This includes the second section of the Unicode BidiMirroring
data table.

If a character is already used in Ps/Pe/Pi/Pf mappings, then any entry in
BidiMirroring is ignored (both forward and backward mappings).  For any
given Ps character, the next Pe codepoint (in numerical order) is assumed to
be its matching character even if that is not what you might guess using
left-right symmetry.  Therefore C<U+298D> (C<⦍>) maps to C<U+298E> (C<⦎>), not C<U+2990> (C<⦐>),
and C<U+298F> (C<⦏>) maps to C<U+2990> (C<⦐>), not C<U+298E> (C<⦎>).  Neither C<U+298E> (C<⦎>) nor
C<U+2990> (C<⦐>) are valid bracket openers, despite having reverse mappings in the
BidiMirroring table.

The C<U+301D> (C<〝>) codepoint has two closing alternatives, C<U+301E> (C<〞>) and
C<U+301F> (C<〟>); Perl 6 only recognizes the one with lower code point number,
C<U+301E> (C<〞>), as the closing brace.  This policy also applies to new
one-to-many mappings introduced in the future.

However, many-to-one mappings are fine; multiple opening characters may map
to the same closing character.  For instance, C<U+2018> (C<‘>), C<U+201A> (C<‚>), and
C<U+201B> (C<‛>) may all be used as the opener for the C<U+2019> (C<’>) closer.
Constructs that count openers and closers assume that only the given opener
is special.  That is, if you open with one of the alternatives, all other
alternatives are treated as non-bracketing characters within that construct.

=head2 Multiline Comments

Pod sections may be used reliably as multiline comments in Perl 6.  Unlike
in Perl 5, Pod syntax now lets you use C<=begin comment> and C<=end comment>
to delimit a Pod block correctly without the need for C<=cut>.  (In fact,
C<=cut> is now gone.)  The format name does not have to be C<comment> -- any
unrecognized format name will do to make it a comment.  (However, bare
C<=begin> and C<=end> probably aren't good enough, because all comments in
them will show up in the formatted output.)

We have single paragraph comments with C<=for comment> as well.  That lets
C<=for> keep its meaning as the equivalent of a C<=begin> and C<=end>
combined.  As with C<=begin> and C<=end>, a comment started in code reverts
to code afterwards.

Since there is a newline before the first C<=>, the Pod form of comment
counts as whitespace equivalent to a newline.  See S26 for more on embedded
documentation.

=head2 Single-line Comments

Except within a quote literal, a C<#> character always introduces a comment
in Perl 6.  There are two forms of comment based on C<#>.  Embedded comments
require the C<#> to be followed by a backtick (C<`>) plus one or more
opening bracketing characters.

All other uses of C<#> are interpreted as single-line comments that work
just as in Perl 5, starting with a C<#> character and ending at the
subsequent newline.  They count as whitespace equivalent to newline for
purposes of separation.  Unlike in Perl 5, C<#> may I<not> be used as the
delimiter in quoting constructs.

=head2 Embedded Comments

Embedded comments are supported as a variant on quoting syntax, introduced
by C<#`> plus any user-selected bracket characters (as defined in
L</Bracketing Characters> above):

    say #`( embedded comment ) "hello, world!";

    $object\#`{ embedded comments }.say;

    $object\ #`「
        embedded comments
    」.say;

Brackets may be nested, following the same policy as ordinary quote
brackets.

There must be no space between the C<#`> and the opening bracket character.
(There may be the I<visual appearance> of space for some double-wide
characters, however, such as the corner quotes above.)

For multiline comments it is recommended (but not required) to use two or
more brackets both for visual clarity and to avoid relying too much on
internal bracket counting heuristics when commenting code that may
accidentally miscount single brackets:

    #`{{
        say "here is an unmatched } character";
    }}

However, it's sometimes better to use Pod comments because they are
implicitly line-oriented.

=head2 User-selected Brackets

For all quoting constructs that use user-selected brackets, you can open
with multiple identical bracket characters, which must be closed by the same
number of closing brackets.  Counting of nested brackets applies only to
pairs of brackets of the same length as the opening brackets:

    say #`{{
        This comment contains unmatched } and { { { {   (ignored)
        Plus a nested {{ ... }} pair                    (counted)
    }} q<< <<woot>> >>   # says " <<woot>> "

Note however that bare circumfix or postcircumfix C<<< <<...>> >>> is not a
user-selected bracket, but the ASCII variant of the C<< «...» >>
interpolating word list.  Only C<#`> and the C<q>-style quoters (including
C<m>, C<s>, C<tr>, and C<rx>) enable subsequent user-selected brackets.

=head2 Unspaces

Some languages such as C allow you to escape newline characters to combine
lines.  Other languages (such as regexes) allow you to backslash a space
character for various reasons.  Perl 6 generalizes this notion to any kind
of whitespace.  Any contiguous whitespace (including comments) may be hidden
from the parser by prefixing it with C<\>.  This is known as the "unspace".
An unspace can suppress any of several whitespace dependencies in Perl.  For
example, since Perl requires an absence of whitespace between a noun and a
postfix operator, using unspace lets you line up postfix operators:

    %hash\  {$key}
    @array\ [$ix]
    $subref\($arg)

As a special case to support the use above, a backslash where a postfix is
expected is considered a degenerate form of unspace.  Note that whitespace
is not allowed before that, hence

    $subref \($arg)

is a syntax error (two terms in a row).  And

    foo \($arg)

will be parsed as a list operator with a C<Capture> argument:

    foo(\($arg))

However, other forms of unspace may usefully be preceded by whitespace.
(Unary uses of backslash may therefore never be followed by whitespace or
they would be taken as an unspace.)

Other postfix operators may also make use of unspace:

    $number\  ++;
    $number\  --;
    1+3\      i;
    $object\  .say();
    $object\#`{ your ad here }.say

Another normal use of a you-don't-see-this-space is typically to put a
dotted postfix on the next line:

    $object\ # comment
    .say

    $object\#`[ comment
    ].say

    $object\
    .say

But unspace is mainly about language extensibility: it lets you continue the
line in any situation where a newline might confuse the parser, regardless
of your currently installed parser.  (Unless, of course, you override the
unspace rule itself...)

Although we say that the unspace hides the whitespace from the parser, it
does not hide whitespace from the lexer.  As a result, unspace is not
allowed within a token.  Additionally, line numbers are still counted if the
unspace contains one or more newlines.  Since Pod chunks count as whitespace
to the language, they are also swallowed up by unspace.  Heredoc boundaries
are suppressed, however, so you can split excessively long lines introducing
heredocs like this:

    ok(q:to'CODE', q:to'OUTPUT', \
    "Here is a long description", \ # --more--
    todo(:parrøt<0.42>, :dötnet<1.2>));
        ...
        CODE
        ...
        OUTPUT

To the heredoc parser that just looks like:

    ok(q:to'CODE', q:to'OUTPUT', "Here is a long description", todo(:parrøt<0.42>, :dötnet<1.2>));
        ...
        CODE
        ...
        OUTPUT

Note that this is one of those cases in which it is fine to have whitespace
before the unspace, since we're only trying to suppress the newline
transition, not all whitespace as in the case of postfix parsing.  (Note
also that the example above is not meant to spec how the test suite works. )

=head2 Comments in Unspaces and vice versa

An unspace may contain a comment, but a comment may not contain an unspace.
In particular, end-of-line comments do not treat backslash as significant.
If you say:

    #`\ (...

or

    #\ `(...

it is an end-of-line comment, not an embedded comment.  Write:

    \ #`(
         ...
        )

to mean the other thing.

=head2 Unspace disallowed within regexes

Within a regex, unspace is disallowed as too ambiguous with customary
backslashing conventions in surrounding cultures.  Hence you must write an
explicit whitespace match some other way, such as with quotes or with a
C<\x20> or C<\c32> escape.  On the other hand, while an unspace can start
with C<\#> in normal code, C<\#> within a regex is specifically allowed, and
is not taken as unspace, but matches a literal C<U+0023> (NUMBER SIGN).  (Within
a character class, you may also escape whitespace with a backslash; the
restriction on unspace applies only at the normal pattern-matching level.)

=head2 Optional Whitespace and Exclusions

In general, whitespace is optional in Perl 6 except where it is needed to
separate constructs that would be misconstrued as a single token or other
syntactic unit.  (In other words, Perl 6 follows the standard
I<longest-token> principle, or in the cases of large constructs, a I<prefer
shifting to reducing> principle.  See L</Grammatical Categories> below for
more on how a Perl program is analyzed into tokens.)

This is an unchanging deep rule, but the surface ramifications of it change
as various operators and macros are added to or removed from the language,
which we expect to happen because Perl 6 is designed to be a mutable
language.  In particular, there is a natural conflict between postfix
operators and infix operators, either of which may occur after a term.  If a
given token may be interpreted as either a postfix operator or an infix
operator, the infix operator requires space before it.  Postfix operators
may never have intervening space, though they may have an intervening dot.
If further separation is desired, an unspace or embedded comment may be used
as described above, as long as no whitespace occurs outside the unspace or
embedded comment.

For instance, if you were to add your own C<< infix:<++> >> operator, then
it must have space before it. The normal autoincrementing C<< postfix:<++>
>> operator may never have space before it, but may be written in any of
these forms:

    $x++

    $x\++

    $x.++

    $x\ ++

    $x\ .++

    $x\#`( comment ).++
    $x\#`((( comment ))).++

    $x\
    .++

    $x\         # comment
                # inside unspace
    .++

    $x\         # comment
                # inside unspace
    ++          # (but without the optional postfix dot)

    $x\#`『      comment
                more comment
    』.++

    $x\#`[   comment 1
    comment 2
    =begin Podstuff
    whatever (Pod comments ignore current parser state)
    =end Podstuff
    comment 3
    ].++

=head3 Implicit Topical Method Calls

A consequence of the postfix rule is that (except when delimiting a quote or
terminating an unspace) a dot with whitespace in front of it is always
considered a method call on C<$_> where a term is expected.  If a term is
not expected at this point, it is a syntax error.  (Unless, of course, there
is an infix operator of that name beginning with dot.  You could, for
instance, define a Fortranly C<< infix:<.EQ.> >> if the fit took you.  But
you'll have to be sure to always put whitespace in front of it, or it would
be interpreted as a postfix method call instead.)

For example,

    foo .method

and

    foo
    .method

will always be interpreted as

    foo $_.method

but never as

    foo.method

Use some variant of

    foo\
    .method

if you mean the postfix method call.

One consequence of all this is that you may no longer write a Num as C<42.>
with just a trailing dot.  You must instead say either C<42> or C<42.0>.  In
other words, a dot following a number can only be a decimal point if the
following character is a digit.  Otherwise the postfix dot will be taken to
be the start of some kind of method call syntax.  (The C<.123> form with a
leading dot is still allowed however when a term is expected, and is
equivalent to C<0.123> rather than C<$_.123>.)

=head2 Keywords and whitespace

One other spot where whitespace makes a difference is after various
keywords, such as control flow or other statement-oriented keywords.  Such
keywords require whitespace after them.  (Again, this is in the interests of
extensibility).  So for instance, if you define a symbol that happens to be
the same as the keyword C<if>, you can still use it as a non-keyword, as
long as you don't put whitespace after it:

    my \if = 42; say (if) if if;   # prints 42

Here only the middle if of the second statement is taken as a keyword
because it has whitespace after it.  The other mentions of C<if> do not, and
would be illegal were it not that the symbol is defined in this scope.  If
you omit the definition, you'd get a message like this:

    Whitespace required after keyword 'if'
    at myfile:1
    ------> say (if⏏) if if;
    Undeclared routine:
        if used at line 1

=head1 Built-In Data Types

Perl 6 has an optional type system that helps you write safer code that
performs better.  The compiler is free to infer what type information it can
from the types you supply, but it will not complain about missing type
information unless you ask it to.

Perl 6 is an OO engine, but you're not generally required to think in OO
when that's inconvenient.  However, some built-in concepts such as
filehandles are more object-oriented in a user-visible way than in Perl 5.

=head2 The P6opaque Datatype

In support of OO encapsulation, there is a new fundamental data
representation: B<P6opaque>.  External access to opaque objects is always
through method calls, even for attributes.

=head2 Name Equivalence of Types

Types are officially compared using name equivalence rather than structural
equivalence.  However, we're rather liberal in what we consider a name.  For
example, the name includes the version and authority associated with the
module defining the type (even if the type itself is "anonymous").  Beyond
that, when you instantiate a parametric type, the arguments are considered
part of the "long name" of the resulting type, so one C<Array of Int> is
equivalent to another C<Array of Int>.  (Another way to look at it is that
the type instantiation "factory" is memoized.)  Typename aliases are
considered equivalent to the original type.  In particular, the C<Array of
Int> syntax is just sugar for C<Array:of(Int)>, which is the canonical form
of an instantiated generic type.

This name equivalence of parametric types extends only to parameters that
can be considered immutable (or that at least can have an immutable snapshot
taken of them).  Two distinct classes are never considered equivalent even
if they have the same attributes because classes are not considered
immutable.

=head2 Properties on Objects

Perl 6 supports the notion of B<properties> on various kinds of objects.
Properties are like object attributes, except that they're managed by the
individual object rather than by the object's class.

According to S12, properties are actually implemented by a kind of mixin
mechanism, and such mixins are accomplished by the generation of an
individual anonymous class for the object (unless an identical anonymous
class already exists and can safely be shared).

=head3 Traits

Properties applied to objects constructed at compile-time, such as variables
and classes, are also called B<traits>.  Traits cannot be changed at
run-time.  Changes to run-time properties are done via mixin instead, so
that the compiler can optimize based on declared traits.

=head2 Types as Constraints

A variable's type is a constraint indicating what sorts of values the
variable may contain.  More precisely, it's a promise that the object or
objects contained in the variable are capable of responding to the methods
of the indicated "role".  See S12 for more about roles.

    # $x can contain only Int objects
    my Int $x;

=head2 Container Types

A variable may itself be bound to a container type that specifies how the
container works, without specifying what kinds of things it contains.

    # $x is implemented by the MyScalar class
    my $x is MyScalar;

Constraints and container types can be used together:

    # $x can contain only Int objects,
    # and is implemented by the MyScalar class
    my Int $x is MyScalar;

Note that C<$x> is also initialized to the C<Int> type object.  See below
for more on this.

=head2 Nil

There is a special value named C<Nil>.  It means "there is no value here".
It is a little bit like the empty C<()> list, insofar as both represent an
absence of values, except that C<()> is defined and means "there are 0
arguments here if you're counting that low".  The C<Nil> value represents
the absence of a value where there I<should> be one, so it does not
disappear in list context, but relies on something downstream to catch it or
blow up.  C<Nil> also indicates a failed match.

Since method calls are performed directly on any object, C<Nil> can respond
to certain method calls.  C<Nil.defined> returns C<False> (whereas
C<().defined> returns C<True>).  C<Nil.so> also returns C<False>.
C<Nil.ACCEPTS> always returns C<Nil>.  C<Nil.perl> and C<Nil.gist> return
C<'Nil'>.  C<Nil.Stringy> and C<Nil.Str> throw a resumable warning that
returns a value of C<''> on resumption.  C<Nil.Numeric> likewise throws a
resumable warning that returns 0 on resumption.  Any undefined method call
on C<Nil> returns C<Nil>, so that C<Nil> propagates down method call chains.
Likewise any subscripting operation on C<Nil> returns C<Nil>.

Any attempt to change the C<Nil> value should cause an exception to be
thrown.

Assigning C<Nil> to any scalar container causes the container to throw out
any contents and restore itself to an uninitialized state (after which it
will appear to contain an object appropriate to the declared default of the
container, where C<Any> is the default default; the element may be simply
deleted if that's how the default can be represented in the structure).
Binding of C<Nil> with C<:=> simply puts Nil in the container.  However,
binding C<Nil> to a parameter (C<::=> semantics) works more like assignment;
passing C<Nil> to a parameter with a default causes that parameter to be set
to its default value rather than an undefined value, as if the argument had
not been supplied.

Assigning C<Nil> to any entire composite container (such as an C<Array> or
C<Hash>) empties the container, resetting it back to an uninitialized state.
The container object itself then becomes undefined.  (Assignment of C<()>
leaves it defined.)

=head2 Type Objects

C<my Dog $spot> by itself does not automatically call a C<Dog> constructor.
It merely assigns an undefined C<Dog> prototype object to C<$spot>:

    my Dog $spot;           # $spot is initialized with ::Dog
    my Dog $spot = Dog;     # same thing

    $spot.defined;          # False
    say $spot;              # "Dog()"

Any type name used as a value is the undefined prototype object of that
type, or I<type object> for short.  See S12 for more on that.

Any type name in rvalue context is parsed as a single type value and expects
no arguments following it.  However, a type object responds to the function
call interface, so you may use the name of a type with parentheses as if it
were a function, and any argument supplied to the call is coerced to the
type indicated by the type object.  If there is no argument in the
parentheses, the type object returns itself:

    my $type = Num;             # type object as a value
    $num = $type($string)       # coerce to Num

To get a real C<Dog> object, call a constructor method such as C<new>:

    my Dog $spot .= new;
    my Dog $spot = $spot.new;   # .= is rewritten into this

You can pass in arguments to the constructor as well:

    my Dog $cerberus .= new(heads => 3);
    my Dog $cerberus = $cerberus.new(heads => 3);   # same thing

Just like L</Nil>, type objects do not disappear in list context, but rely
on something downstream to catch it or blow up.  This allows type objects to
be assigned to scalars, but to disappear in non-scalar contexts.

=head2 Coercive type declarations

The parenthesized form of type coercion may be used in declarations where it
makes sense to accept a wider set of types but coerce them to a narrow type.
(This only works for one-way coercion, so you may not declare any C<rw>
parameter with a coercive type.)  The type outside the parens indicates the
desired end result, and subsequent code may depend on it being that type.
The type inside the parens indicates the acceptable set of types that are
allowed to be bound or assigned to this location via coercion.  If the wide
type is omitted, C<Any> is assumed.  In any case, the wide type is only
indicative of permission to coerce; there must still be an available
coercion routine from the wide type to the narrow type to actually perform
the coercion.

    sub foo (Str(Any) $y) {...}
    sub foo (Str()    $y) {...}    # same thing

    my Num(Cool) $x = prompt "Gimme a number";

Coercions may also be specified on the return type:

    sub bar ($x, $y --> Int()) { return 3.5 }  # returns 3

=head2 Containers of Native Types

If you say

    my int @array is MyArray;

you are declaring that the elements of C<@array> are native integers, but
that the array itself is implemented by the C<MyArray> class.  Untyped
arrays and hashes are still perfectly acceptable, but have the same
performance issues they have in Perl 5.

=head2 Methods on Arrays

To get the number of elements in an array, use the C<.elems> method.  You
can also ask for the total string length of an array's elements, in
codepoints or graphemes, using these methods, C<.codes> or C<.chars>
respectively on the array.  The same methods apply to strings as well.
(Note that C<.codes> is not well-defined unless you know which
canonicalization is in effect.  Hence, it allows an optional argument to
specify the meaning exactly if it cannot be known from context.)

There is no C<.length> method for either arrays or strings, because
C<length> does not specify a unit.

=head2 Built-in Type Conventions

Built-in object types start with an uppercase letter. This includes
immutable types (e.g. C<Int>, C<Num>, C<Complex>, C<Rat>, C<Str>, C<Bit>,
C<Regex>, C<Set>, C<Block>, C<Iterator>), as well as mutable (container)
types, such as C<Scalar>, C<Array>, C<Hash>, C<Buf>, C<Routine>, C<Module>,
and non-instantiable Roles such as C<Callable> and C<Integral>.

Non-object (native) types are lowercase: C<int>, C<num>, C<complex>, C<rat>,
C<buf>, C<bit>.  Native types are primarily intended for declaring compact
array storage, that is, a sequence of storage locations of the specified
type laid out in memory contiguously without pointer indirection.  However,
Perl will try to make those look like their corresponding uppercase types if
you treat them that way. (In other words, it does autoboxing and
autounboxing as necessary.  Note, however, that repeated autoboxing and
unboxing can make your program much slower, compared to a program that makes
consistent use of either native types or object types.)

=head3 The C<.WHICH> Method for Value Types

Some object types can behave as value types.  Every object can produce a
"WHICH" value that uniquely identifies the object for hashing and other
value-based comparisons.  Normal objects use some kind of unique ID as their
identity, but if a class wishes to behave as a value type, it can define a
C<.WHICH> method that makes different objects look like the same object if
they happen to have the same contents.

=head3 The C<ObjAt> Type

When we say that a normal object uses its location as its identity, we do
I<not> mean that it returns its address as a number.  In the first place,
not all objects are in the same memory space (see the literature on NUMA,
for instance), and two objects should not accidentally have the same
identity merely because they were stored at the same offset in two different
memory spaces.  We also do not want to allow accidental identity collisions
with values that really are numbers (or strings, or any other mundane value
type).  Nor should we be encouraging people to think of object locations
that way in any case.  So C<WHICH> still returns a value rather than another
object, but that value must be of a special C<ObjAt> type that prevents
accidental confusion with normal value types, and at least discourages
trivial pointer arithmetic.

Certainly, it is difficult to give a unique name to every possible address
space, let alone every possible address within every such a space.  In the
absence of a universal naming scheme, it can only be made improbable that
two addresses from two different spaces will collide.  A sufficiently large
random number may represent the current address space on output of an
C<ObjAt> to a different address space, or if serialized to YAML or XML.
(This extra identity component need not be output for debugging messages
that assume the current address space, since it will be the same big number
consistently, unless your process really is running under a NUMA.)

Alternately, if an object is being serialized to a form that does not
preserve object identity, there is no requirement to preserve uniqueness,
since in this case the object is really being translated to a value type
representation, and reconstituted on the other end as a different unique
object.

=head2 Variables Containing Undefined Values

A variable with a non-native type constraint may contain an I<undefined>
value such as a type object, provided the undefined value meets the type
constraint.

    my Int $x = Int;  # works
    my Buf $x = Buf8; # works

Variables with native types do not support undefinedness: it is an error to
assign an undefined value to them:

    my int $y = Int;    # dies

Since C<num> can support the value C<NaN> but not the general concept of
undefinedness, you can coerce an undefined value like this:

    my num $n = computation() // NaN;

Variables of non-native types start out containing a type object of the
appropriate type unless explicitly initialized to a defined value.

Any container's default may be overridden by the C<is default(VALUE)> trait.
If the container's contents are deleted, the value is notionally set to the
provided default value; this value may or may not be physically represented
in memory, depending on the implementation of the container.  You should
officially not care about that (much).

=head2 The C<HOW> Method

Every object supports a C<HOW> function/method that returns the metaclass
instance managing it, regardless of whether the object is defined:

    'x'.HOW.methods('x');   # get available methods for strings
    Str.HOW.methods(Str);   # same thing with the prototype object Str
    HOW(Str).methods(Str);  # same thing as function call

    'x'.methods;        # this is likely an error - not a meta object
    Str.methods;        # same thing

(For a prototype system (a non-class-based object system), all objects are
merely managed by the same meta object.)

=head2 Roles

Perl supports generic types through what are called "roles" which represent
capabilities or interfaces.  These roles are generally not used directly as
object types.  For instance all the numeric types perform the C<Numeric>
role, and all string types perform the C<Stringy> role, but there's no such
thing as a "Numeric" object, since these are generic types that must be
instantiated with extra arguments to produce normal object types.  Common
roles include:

    Stringy
    Numeric
    Real
    Integral
    Rational
    Callable
    Positional
    Associative
    Buf
    Blob

=head2 C<Numeric> Types

Perl 6 intrinsically supports big integers and rationals through its system
of type declarations.  C<Int> automatically supports promotion to arbitrary
precision, as well as holding C<Inf> and C<NaN> values.  Note that C<Int>
assumes 2's complement arithmetic, so C<+^1 == -2> is guaranteed.  (Native
C<int> operations need not support this on machines that are not natively
2's complement.  You must convert to and from C<Int> to do portable bitops
on such ancient hardware.)

C<Num> must support the largest native floating point format that runs at
full speed.  It may be bound to an arbitrary precision type, but by default
it is the same type as a native C<num>.  See below.

C<Rat> supports extended precision rational arithmetic.  Dividing two
C<Integral> objects using C<< infix:</> >> produces a C<Rat>, which is
generally usable anywhere a C<Num> is usable, but may also be explicitly
cast to C<Num>.  (Also, if either side is C<Num> already, C<< infix:</> >>
gives you a C<Num> instead of a C<Rat>.)

C<Rat> and C<Num> both do the C<Real> role.

Lowercase types like C<int> and C<num> imply the native machine
representation for integers and floating-point numbers, respectively, and do
not promote to arbitrary precision, though larger representations are always
allowed for temporary values.  Unless qualified with a number of bits,
C<int> and C<num> types represent the largest native integer and
floating-point types that run at full speed.

Because temporary values are biased in favor of correct semantics over
compact storage, native numeric operators that might overflow must come in
two variants, one which returns a guaranteed correct boxed value, and one of
which returns a guaranteed fast native value.  By default the boxing variant
is selected (probably by virtue of hiding the native variants), but within a
given lexical scope, the C<use native> pragma will allow use of the
dangerous but fast variants instead.  Arguments to the pragma can be more
specific about what types of return values are allowed, e.g. C<use native
'int';> and such.  (The optimizer is also allowed to substitute such
variants when it can determine that the final destination would store
natively in any case, or that the variant could not possibly malfunction
given the arguments.)  [Conjecture: we could allow an 'N' metaoperator to
select the native variant on a case by case basis.]

Numeric values in untyped variables use C<Int> and C<Num> semantics rather
than C<int> and C<num>.  Literals, on the other hand, may default to native
storage formats if they reasonably can.  We rely on the semantics of boxing
temporary values by default (see above) to maintain correct semantics; the
optimizer is of course allowed to box or unbox a literal at compile time (or
cache a boxed/unboxed version of the value) whenever it seems appropriate.
In any case, native literals should be preferred under C<use native>
semantics.

For pragmatic reasons, C<Rat> values are guaranteed to be exact only up to a
certain point.  By default, this is the precision that would be represented
by the C<Rat64> type, which is an alias for C<Rational[Int,Uint64]>, which
has a numerator of C<Int> but is limited to a denominator of C<Uint64>
(which may or may not be implemented as a native C<uint64>, since small
representations may be desirable for small denominators).  A C<Rat64> that
would require more than 64 bits of storage in the denominator is
automatically converted either to a C<Num> or to a lesser-precision C<Rat>,
at the discretion of the implementation.  (Native types such as C<rat64>
limit the size of both numerator and denominator, though not to the same
size.  The numerator should in general be twice the size of the denominator
to support user expectations.  For instance, a C<rat8> actually supports
C<Rational[int16,uint8]>, allowing numbers like C<100.01> to be represented,
and a C<rat64>, defined as C<Rational[int128,uint64]>, can hold the number
of seconds since the Big Bang with attosecond precision.  Though perhaps not
with attosecond accuracy...)

The limitation on C<Rat> values is intended to be enforced only on
user-visible types.  Intermediate values used in the internal calculations
of C<Rat> operators may exceed this precision, or represent negative
denominators.  That is, the temporaries used in calculating the new
numerator and denominator are (at least in the abstract) of C<Int> type.
After a new numerator and denominator are determined, any sign is forced to
be represented only by the numerator.  Then if the denominator exceeds the
storage size of the unsigned integer used, the fraction is reduced via GCD.
If the resulting denominator is still larger than the storage size, then and
I<only> then may the precision be reduced to fit into a C<Rat> or C<Num>.

C<Rat> addition and subtraction should attempt to preserve the denominator
of the more precise argument if that denominator is an integral multiple of
the less precise denominator.  That is, in practical terms, adding a column
of dollars and cents should generally end up with a result that has a
denominator of 100, even if values like 42 and 3.5 were added in.  With
other operators, this guarantee cannot be made; in such cases, the user
should probably be explicitly rounding to a particular denominator anyway.

For applications that really need arbitrary precision denominators as well
as numerators at the cost of performance, C<FatRat> may be used, which is
defined as C<Rational[Int,Int]>, that is, as arbitrary precision in both
parts.  There is no literal form for a C<FatRat>, so it must be constructed
using C<FatRat.new($nu,$de)>.  In general, only math operators with at least
one C<FatRat> argument will return another C<FatRat>, to prevent accidental
promotion of reasonably fast C<Rat> values into arbitrarily slow C<FatRat>
values.

Although most rational implementations normalize or "reduce" fractions to
their smallest representation immediately through a GCD algorithm, Perl
allows a rational datatype to do so lazily at need, such as whenever the
denominator would run out of precision, but avoid the overhead otherwise.
Hence, if you are adding a bunch of C<Rat>s that represent, say, dollars and
cents, the denominator may stay 100 the entire way through.  The C<.nu> and
C<.de> methods will return these unreduced values.  You can use
C<$rat.=norm> to normalize the fraction.  (This also forces the sign on the
denominator to be positive.) The C<.perl> method will produce a decimal
number if the denominator is a power of 10, or normalizable to a power of 10
(that is, having factors of only 2 and 5 (and -1)). Otherwise it will
normalize and return a rational literal of the form C<< <-47/3> >>.
Stringifying a rational via C<.gist> or C<.Str> returns an exact decimal
number if possible, and otherwise rounds off the repeated decimal based on
the size of the denominator.  For full details see the documentation of
C<Rat.gist> in S32.

C<Num.Str> and C<Num.gist> both produce valid C<Num> literals, so they must
include the C<e> for the exponential.

    say 1/5;    # 0.2 exactly
    say 1/3;    # 0.333333

    say <2/6>.perl
                # <1/3>

    say 3.14159_26535_89793
                # 3.141592653589793 including last digit

    say 111111111111111111111111111111111111111111111.123
                # 111111111111111111111111111111111111111111111.123

    say 555555555555555555555555555555555555555555555/5
                # 111111111111111111111111111111111111111111111

    say <555555555555555555555555555555555555555555555/5>.perl
                # 111111111111111111111111111111111111111111111.0
    say 2e2;    # 200e0 or 2e2 or 200.0e0 or 2.0e2

=head2 Infinity and C<NaN>

Perl 6 by default makes standard IEEE floating point concepts visible, such
as C<Inf> (infinity) and C<NaN> (not a number).  Within a lexical scope,
pragmas may specify the nature of temporary values, and how floating point
is to behave under various circumstances.  All IEEE modes must be lexically
available via pragma except in cases where that would entail heroic efforts
to bypass a braindead platform.

The default floating-point modes do not throw exceptions but rather
propagate C<Inf> and C<NaN>.  The boxed object types may carry more detailed
information on where overflow or underflow occurred.  Numerics in Perl are
not designed to give the identical answer everywhere.  They are designed to
give the typical programmer the tools to achieve a good enough answer most
of the time.  (Really good programmers may occasionally do even better.)
Mostly this just involves using enough bits that the stupidities of the
algorithm don't matter much.

=head2 Strings, the C<Str> Type

A C<Str> type is a Unicode string object. It boxes a native C<str> (the
difference being in representation; a C<Str> is a P6opaque and as such you
may mix in to it, but this is not possible with a C<str>). A C<Str> functions
at grapheme level. This means that `.chars` should give the number of
graphemes, `.substr` should never cut a combining character in two, and so
forth. Both C<str> and C<Str> are immutable. Their exact representation in
memory is implementation defined, so implementations are free to use ropes
or other data structures internally in order to make concatenation, substring,
and so forth cheaper.

Implementation note: since Perl 6 mandates that C<Str> must view graphemes
as the fundamental unit rather than codepoints, this has some implications
regarding efficient implementation. It is suggested that all graphemes be
translated on input to unique grapheme numbers and represented as integers
within some kind of uniform array for fast substr access.  For those
graphemes that have a precomposed form, use of that codepoint is suggested.
(Note that this means Latin-1 can still be represented internally with 8-bit
integers.)

For graphemes that have no precomposed form, a temporary private id should
be assigned that uniquely identifies the grapheme.  If such ids are assigned
consistently throughout the process, comparison of two graphemes is no more
difficult than the comparison of two integers, and comparison of base
characters no more difficult than a direct lookup into the id-to-NFD table.

Obviously, any temporary grapheme ids must be translated back to some
universal form (such as NFD) on output, and normal precomposed graphemes may
turn into either NFC or NFD forms depending on the desired output.
Maintaining a particular grapheme/id mapping over the life of the process
may have some GC implications for long-running processes, but most processes
will likely see a limited number of non-precomposed graphemes.

Code wishing to work at a codepoint level instead of a grapheme level
should use the C<Uni> type, which has subclasses representing the various
Unicode normalization forms (namely, C<NFC>, C<NFD>, C<NFIC>, and C<NFKD>).
Note that C<ord> is defined as a codepoint level operation. Even though the
C<Str> may contain synthetics internally, these should never be exposed by
C<ord>; instead, the behaviour should be as if the C<Str> had been converted
to an C<NFC> and then the first element accessed (obviously, implementations
are free to do something far more efficient).

=head2 The C<Buf> Type

A C<Buf> is a stringish view of an array of integers, and has no Unicode or
character properties without explicit conversion to some kind of C<Str>.
(The C<buf8>, C<buf16>, C<buf32>, and C<buf64> types are the native
counterparts; native buf types are required to occupy contiguous memory for
the entire buffer.) Typically a C<Buf> is an array of bytes serving as a
buffer.  Bitwise operations on a C<Buf> treat the entire buffer as a single
large integer.  Bitwise operations on a C<Str> generally fail unless the
C<Str> in question can provide an abstract C<Buf> interface somehow.
Coercion to C<Buf> should generally invalidate the C<Str> interface.  As a
generic role C<Buf> may be instantiated as any of C<buf8>, C<buf16>, or
C<buf32> (or as any type that provides the appropriate C<Buf> interface),
but when used to create a buffer C<Buf> is punned to a class implementing
C<buf8> (actually C<Buf[uint8]>).

Unlike C<Str> types, C<Buf> types prefer to deal with integer string
positions, and map these directly to the underlying compact array as
indices.  That is, these are not necessarily byte positions--an integer
position just counts over the number of underlying positions, where one
position means one cell of the underlying integer type.  Builtin string
operations on C<Buf> types return integers and expect integers when dealing
with positions.  As a limiting case, C<buf8> is just an old-school byte
string, and the positions are byte positions.  Note, though, that if you
remap a section of C<buf32> memory to be C<buf8>, you'll have to multiply
all your positions by 4.

=head3 Native C<buf> Types

These native types are defined based on the C<Buf> role, parameterized by
the native integer type it is composed of:

    Name        Is really
    ====        =========
    buf1        Buf[bit]
    buf8        Buf[uint8]
    buf16       Buf[uint16]
    buf32       Buf[uint32]
    buf64       Buf[uint64]

There are no signed buf types provided as built-ins, but you may say

    Buf[int8]
    Buf[int16]
    Buf[int32]
    Buf[int64]

to get buffers of signed integers.  It is also possible to define a C<Buf>
based on non-integers or on non-native types:

    Buf[complex64]
    Buf[FatRat]
    Buf[Int]

However, no guarantee of memory contiguity can be made for non-native types.

=head2 The C<Whatever> Object

The C<*> character as a standalone term captures the notion of "Whatever",
the meaning of which can be decided lazily by whatever it is an argument to.
Alternately, for those unary and binary operators that don't care to handle
C<*> themselves, it is automatically primed at compile time into a closure
that takes one or two arguments.  (See below.)

Generally, when an operator handles C<*> itself, it can often be thought of
as a "glob" that gives you everything it can in that argument position.  For
instance, here are some operators that choose to handle C<*> and give it
special meaning:

    if $x ~~ 1..* {...}                 # if 1 <= $x <= +Inf
    my ($a,$b,$c) = "foo" xx *;         # an arbitrary long list of "foo"
    if /foo/ ff * {...}                 # a latching flipflop
    @slice = @x[*;0;*];                 # all indexes for 1st and 3rd dimensions
    @slice = %x{*;'foo'};               # all keys in domain of 1st dimension
    @array[*]                           # list of all values, unlike @array[]
    (*, *, $x) = (1, 2, 3);             # skip first two elements
                                        # (same as lvalue "undef" in Perl 5)

C<Whatever> is an undefined prototype object derived from C<Any>.  As a type
it is abstract, and may not be instantiated as a defined object.  When used
for a particular MMD dispatch, and nothing in the MMD system claims it, it
dispatches to as an C<Any> with an undefined value, and (we hope) blows up
constructively.

Since the C<Whatever> object is effectively immutable, the optimizer is free
to recognize C<*> and optimize in the context of what operator it is being
passed to.  An operator can declare that it wants to handle C<*> either by
declaring one or more of its arguments for at least one of its candidates
with an argument of type C<Whatever>, or by marking the proto sub with the
trait, C<is like-Whatever-and-stuff>.  [Conjecture: actually, this is
negotiable--we might shorten it to C<is like(Whatever)> or some such.
C<:-)>]

=head3 Autopriming of Unary and Binary Operators with Whatever

Perl 6 has several ways of performing partial function application.  Since
this is an unwieldy term, we've settled on calling it I<priming>.  (Many
folks call this "currying", but that's not really a correct technical usage
of the term.)  Most generally, priming is performed on a C<Callable> object
by calling its C<.assuming> method, described elsewhere.  This section is
about a convenient syntactic sugar for that.

For any unary or binary operator (specifically, any prefix, postfix, and
infix operator), if the operator has not specifically requested (via
signature matching) to handle C<*> itself, the compiler is required to
translate directly to an appropriately primed closure at compile time.  We
call this I<autopriming>.  Most of the built-in numeric operators fall into
this category.  So:

    * - 1
    '.' x *
    * + *

are autoprimed into closures of one or two arguments:

    { $^x - 1 }
    { '.' x $^y }
    { $^x + $^y }

This rewrite happens after variables are looked up in their lexical scope,
and after declarator install any variables into the lexical scope, with the
result that

    * + (state $s = 0)

is effectively primed into:

    -> $x { $x + (state $OUTER::s = 0) }

rather than:

    -> $x { $x + (state $s = 0) }

In other words, C<*> priming does not create a useful lexical scope.
(Though it does have a dynamic scope when it runs.) This prevents the
semantics from changing drastically if the operator in question suddenly
decides to handle C<Whatever> itself.

As a postfix operator, a method call is one of those operators that is
automatically primed.  Something like:

    *.meth(1,2,3)

is rewritten as:

    { $^x.meth(1,2,3) }

In addition to priming a method call without an invocant, such primed
methods are handy anywhere a smartmatcher is expected:

    @primes = grep *.is-prime, 2..*;
    subset Duck where *.^can('quack');
    when !*.defined {...}

Metaoperators are treated as normal operators; the autopriming does not
automatically distribute to the inner operator.  For example,

    @array X* *

does not make a list of closures, but is equivalent to

    -> $arg { @array X* $arg }

Postcircumfixes (with or without the dot) are also autoprimed, so we have

    *[$x]       -> @a { @a[$x] }
    *{$x}       -> %h { %h{$x} }
    *<foo>      -> %h { %h<foo> }
    *($x)       -> &c { &c($x) }

=head3 The C<WhateverCode> Types

These returned closures are of type C<WhateverCode:($)> or
C<WhateverCode:($,$)> rather than type C<Whatever>, so constructs that do
want to handle C<*> or its derivative closures can distinguish them by type:

    @array[*]    # subscript is type Whatever, returns all elements
    @array[*-1]  # subscript is type WhateverCode:($), returns last element

    0, 1, *+1 ... *  # counting
    0, 1, *+* ... *  # fibonacci

For any prefix, infix, postfix, or postcircumfix operator that would be
primed by a C<Whatever>, a C<WhateverCode> also autoprimes it, such that any
noun phrase based on C<*> as a head noun autoprimes transitively outward as
far as it makes sense, including outward through metaoperators.  Hence:

    * + 2 + 3   # { $^x + 2 + 3 }
    * + 2 + *   # { $^x + 2 + $^y }
    * + * + *   # { $^x + $^y + $^z }
    (-*.abs)i   # { (-$^x.abs)i }
    @a «+» *    # { @a «+» $^x }

Note in particular that parentheses will autoprime on a C<WhateverCode>, so

    *[0](1,2,3,4,5)

means

    -> @a { @a[0](1,2,3,4,5) }

rather than

    (-> @a { @a.[0] })(1,2,3,4,5)

If you want the latter semantics for some reason, use a temporary:

    my $c = *[0]; $c(1,2,3,4,5);

or just put the autoprime in parens:

    (*[0])(1,2,3,4,5)

Note that only C<*> autoprimes, because it's an instantiated object.
A C<Whatever> type object never autoprimes.

=head3 Operators with idiosyncratic Whatever

The above is only for operators that are not C<Whatever>-aware.  There is no
requirement that a C<Whatever>-aware operator return a C<WhateverCode> when
C<Whatever> is used as an argument; that's just the I<typical> behavior for
functions that have no intrinsic "globbish" meaning for C<*>.  If you want
to prime one of these globbish operators, you'll need to write an explicit
closure or do an explicit priming on the operator with C<.assuming()>.
Operators in this class, such as C<< infix:<..> >> and C<< infix:<xx> >>,
typically I<do> autoprime arguments of type C<WhateverCode> even though they
do not autoprime C<Whatever>, so we have:

    "foo" xx *          # infinite supply of "foo"
    "foo" xx *-1        # { "foo" xx $^a - 1 }
    0 .. *              # half the real number line
    0 .. * - 1          # { 0 .. $^a - 1 }
    * - 3 .. * - 1      # { $^a - 3 .. $^b - 1 }

(If the last is used as a subscript, the subscripter notices there are two
arguments and passes that dimension's size twice.)

The smartmatch operator will autoprime C<*> but not a C<WhateverCode>.

    * ~~ Int            # same as { $_ ~~ Int }
    $x ~~ *             # same as { $x ~~ $_ }

    $x ~~ * == 42       # same as $x ~~ { $_ == 42 }
    * == 42 ~~ Any      # same as { $_ == 42 } ~~ Any

=head3 Non-closure-returning Operators with C<*>

Operators that are known to return non-closure values with C<*> include:

    0 .. *      # means 0 .. Inf
    0 ... *     # means 0 ... Inf
    'a' xx *    # means 'a' xx Inf
    1,*         # means 1,*  :)

    $a = *      # just assigns Whatever
    $a = * + 1  # just assigns WhateverCode

The sequence operators C<< &infix:<...> >> and C<< &infix:<...^> >>
do not autoprime C<WhateverCode>, because we want to allow C<WhateverCode>
closures as the stopper:

    0 ...^ *>5  # means 0, 1, 2, 3, 4, 5

[Conjecture: it is possible that, for most of the above operators that take
C<*> to mean C<Inf>, we could still actually return a closure that defaults
that particular argument to C<Inf>.  However, this would work only if we
provide a "value list context" that forbids closures, in the sense that it
always calls any closure it finds in its list and replaces the closure in
the list with its return value or values, and then rescans from that point
(kinda like a text macro does), in case the closure returned a list
containing a closure.  So for example, the closure returned by C<0..*> would
interpolate a C<Range> object into the list when called.  Alternately, it
could return the C<0>, followed by another closure that does C<1..*>.  Even
the C<...> operator could likely be redefined in terms of a closure that
regenerates itself, as long as we figure out some way of remembering the
last N values each time.]

In any case, array indexes must behave as such a 'value list context', since
you can't directly index an array with anything other than a number.  The
final element of an array is subscripted as C<@a[*-1]>, which means that
when the subscripting operation discovers a C<Code:($)> object for a
subscript, it calls it and supplies an argument indicating the number of
elements in (that dimension of) the array.  See S09.

=head3 The C<HyperWhatever> Type

A variant of C<*> is the C<**> term, which is of type C<HyperWhatever>.  It
is generally understood to be a multidimension form of C<*> when that makes
sense.  When modified by an operator that would turn C<*> into a function of
one argument, C<WhateverCode:($)>, C<**> instead turns into a function with
one slurpy argument, C<Code(*@)>, such that multiple arguments are
distributed to some number of internal whatevers.  That is:

    * - 1    means                -> $x { $x - 1 }
    ** - 1   means   -> *@x { map -> $x { $x - 1 }, @x }

Therefore C<@array[^**]> represents C<< @array[{ map { ^* }, @_ }] >>, that
is to say, every element of the array, no matter how many dimensions.
(However, C<@array[**]> means the same thing because (as with C<...> above),
the subscript operator will interpret bare C<**> as meaning all the
subscripts, not the list of dimension sizes.  The meaning of C<Whatever> is
always controlled by the first context it is bound into.)

Other uses for C<*> and C<**> will doubtless suggest themselves over time.
These can be given meaning via the MMD system, if not the compiler.  In
general a C<Whatever> should be interpreted as maximizing the degrees of
freedom in a dwimmy way, not as a nihilistic "don't care anymore--just shoot
me".

=head2 Native types

Values with these types autobox to their uppercase counterparts when you
treat them as objects:

    bit         single native bit
    int         native signed integer
    uint        native unsigned integer (autoboxes to Int)
    buf         native buffer (finite seq of native ints or uints, no Unicode)
    rat         native rational
    num         native floating point
    complex     native complex number
    bool        native boolean

Since native types cannot represent Perl's concept of undefined values, in
the absence of explicit initialization, native floating-point types default
to C<NaN>, while integer types (including C<bit>) default to 0.  The complex
type defaults to C<NaN + NaN\i>.  A buf type of known size defaults to a
sequence of 0 values.

You can set a different default on any container type by use of a trait such
as C<is default(42)>.  Deleting or undefining such a container sets the
contents back to the default value (or optionally removes it in cases where
the default value can be autovivified on demand).

If you wish for a native declaration to attempt no initialization, but leave
whatever garbage was in memory, you may use the C<is default(*)> trait.
There are several use cases for this, if you know you're going to initialize
the memory otherwise, or you're doing some form of memory mapping.

If a buf type is initialized with a Unicode string value, the string is
decomposed into Unicode codepoints, and each codepoint shoved into an
integer element.  If the size of the buf type is not specified, it takes its
length from the initializing string.  If the size is specified, the
initializing string is truncated or 0-padded as necessary.  If a codepoint
doesn't fit into a buf's integer type, a parse error is issued if this can
be detected at compile time; otherwise a warning is issued at run time and
the overflowed buffer element is filled with an appropriate replacement
character, either C<U+FFFD> (REPLACEMENT CHARACTER) if the element's integer
type is at least 16 bits, or C<U+007f> (DELETE) if the larger value would
not fit.  If any other conversion is desired, it must be specified
explicitly.  In particular, no conversion to UTF-8 or UTF-16 is attempted;
that must be specified explicitly.  (As it happens, conversion to a buf type
based on 32-bit integers produces valid UTF-32 in the native endianness.)

=head2 The C<Mu> type

Among other things, C<Mu> is named after the eastern concept of "Mu" or 無
(see L<http://en.wikipedia.org/wiki/MU>, especially the "Mu (negative)"
entry), so in Perl 6 it stands in for Perl 5's concept of "undef" when that
is used as a noun.  However, C<Mu> is also the "nothing" from which
everything else is derived via the undefined type objects, so it stands in
for the concept of "Object" as used in languages like Java.  Or think of it
as a "micro" or µ-object that is the basis for all other objects, something
atomic like a Muon.  Or if acronyms make you happy, there are a variety to
pick from:

    Most Universal
    More Undefined
    Modern Undef
    Master Union
    Meta Ur
    Mega Up
    ...

Or just think of it as a sound a cow makes, which simultaneously means
everything and nothing.

=head2 Undefined types

Perl 6 does not have a single value representing undefinedness.  Instead,
objects of various types can carry type information while nevertheless
remaining undefined themselves.  Whether an object is defined is determined
by whether C<.defined> returns true or not.  These typed objects typically
represent uninitialized values.  Failure objects are also officially
undefined despite carrying exception information; these may be created using
the C<fail> function, or by direct construction of a C<Failure> object of
some sort.  (See S04 for how failures are handled.)

    Mu          Most Undefined
    Failure     Failure (lazy exceptions, thrown if not handled properly)

Whenever you declare any kind of type, class, module, or package, you're
automatically declaring an undefined prototype value with the same name,
known as the I<type object>.  The name itself returns that type object:

    Mu          Perl 6 object (default block parameter type, Any, Junction, or Each)
    Any         Perl 6 object (default routine parameter type, excludes Junction, Nil, Failure)
    Cool        Perl 6 Convenient OO Loopbacks
    Whatever    Wildcard (like Any, but subject to do-what-I-mean via MMD)
    Int         Any Int object
    Widget      Any Widget object

All user-defined classes derive from the C<Any> class by default.

Type objects sometimes stringify to their name in parens, to indicate
undefinedness.  Note that type objects are not classes, but may be used to
name classes when the type's associated meta-object allows it:

    Widget.new()        # create a new Widget

The C<Any> type encompasses all normal value and object types.  It is the
unit type, but includes units that are containers of multiple values.  It is
not the most general type, however.  C<Any> derives from C<Mu>, which is the
top type in Perl 6, and encompasses certain conceptual types that fall
outside the realm of ordinary C<Any> values.  These conceptual types
include:

    Junction    unordered superposition of data with and/or/one/none
    Each        ordered superposition (conjectural)
    Failure     a lazy exception

Conceptual types rely on the failure to match an C<Any> type in order to
trigger various extraordinary behaviors.  The C<Junction> and C<Each> types
trigger an inside-out linguistic distribution of various list behaviors from
inside a scalar expression that pretends a bunch of values are really a
single value.  (These are modeled on similar linguistic behaviors in
English.)  The distributional behavior triggered for these types is known as
I<autothreading>.

The C<Failure> type is considered conceptual so that dynamic context can
determine the treatment of failures that in other languages would always
throw exceptions.  This gives Perl 6 programs the flexibility to handle
exceptions either in-band or out-of-band.  It is particularly important to
be able to handle exceptions in-band when you are trying to perform parallel
operations, so that the failure of one computation does not result in
fratricide of all its fellow computations.  (You can think of this as
analogous to the way C<NaN> propagates through floating-point calculations.)

Single dispatch of a C<Failure> invocant to any method not in C<Failure>
returns the same C<Failure>, so that cascaded method calls can be checked
with a single check:

    $object.fee.fie.[$foe].{$foo}.sic // die "Oops: $!";

Failures may only be passed into functions via parameters that allow C<Mu>
or C<Failure>, and a failure may only be returned from a function whose
return type permits it.

After the failure is returned, any subsequent attempt to use the failure in
an C<Any> context will be subject to further failure analysis, and will
likely throw an exception immediately.  Likewise, discarding the failure in
sink context produces an immediate exception.

Note that a C<Failure> object is undefined, but may contain one or more
defined C<Exception> objects, which are considered normal objects that just
happen to be used in exception throwing and handling.

=head2 Immutable types

Objects with these types behave like values, i.e. C<$x === $y> is true if
and only if their types and contents are identical (that is, if C<$x.WHICH>
eqv C<$y.WHICH>).

    Str         Perl string (finite sequence of Unicode characters)
    Bit         Perl single bit (allows traits, aliasing, undefinedness, etc.)
    Int         Perl integer (allows Inf/NaN, arbitrary precision, etc.)
    Num         Perl number (approximate Real, generally via floating point)
    Rat         Perl rational (exact Real, limited denominator)
    FatRat      Perl rational (unlimited precision in both parts)
    Complex     Perl complex number
    Bool        Perl boolean
    Exception   Perl exception
    Block       Executable objects that have lexical scopes
    Range       A pair of Ordered endpoints
    Set         Unordered collection of values that allows no duplicates
    Bag         Unordered collection of values that allows duplicates
    Mix         Unordered collection of values with weights
    Enum        An immutable Pair
    EnumMap     A mapping of Enums with no duplicate keys
    Signature   Function parameters (left-hand side of a binding)
    LoL         Arguments in a semicolon list
    Capture     Function call arguments (right-hand side of a binding)
    Blob        An undifferentiated mass of ints, an immutable Buf
    Instant     A point on the continuous atomic timeline
    Duration    The difference between two Instants
    HardRoutine A routine that is committed to not changing

C<Set> values may be composed with the C<set> listop or method.
C<Bag> values may be composed with the C<bag> listop or method.
C<Mix> values may be composed with the C<mix> listop or method.

C<Instant>s and C<Duration>s are measured in atomic seconds with fractions.
Notionally they are real numbers which may be implemented in any C<Real>
type of sufficient precision, preferably a C<Rat> or C<FatRat>.
(Implementations that make fixed-point assumptions about the available
subsecond precision are discouraged; the user interface must act like real
numbers in any case.)  Interfaces that take C<Duration> arguments, such as
sleep(), may also take C<Real> arguments, but C<Instant> arguments must be
explicitly created via any of various culturally aware time specification
APIs.  A small number of C<Instant> values that represent common epoch
instant values are also available.

In numeric context a C<Duration> happily returns a C<Rat> or C<FatRat>
representing the number of seconds.  C<Instant> values, on the other hand,
are largely opaque, numerically speaking, and in particular are epoch
agnostic.  (Any epoch is just a particular C<Instant>, and all times related
to that epoch are really C<Instant> ± C<Duration>, which returns a new
C<Instant>.)  In order to facilitate the writing of culturally aware time
modules, the C<Instant> type provides C<Instant> values corresponding to
various commonly used epochs, such as the 1958 TAI epoch, the POSIX epoch,
the Mac epoch, and perhaps the year 2000 epoch as UTC thinks of it.
There's no reason to exclude any useful epoch that is well characterized in
atomic seconds.  All normal times can be calculated from those epoch
instants using addition and subtraction of C<Duration> values.  Note that
the C<Duration> values are still just atomic time without any cultural
deformations; in particular, the C<Duration> formed of by subtracting
C<Instant::Epoch::POSIX> from the current instant will contain more seconds
than the current POSIX C<time()> due to POSIX's abysmal ignorance of leap
seconds.  This is not the fault of the universe, which is not fooled
(neglecting relativistic considerations).  C<Instant>s and C<Duration>s are
always linear atomic seconds.  Systems which cannot officially provide a
steady time base, such as POSIX systems, will simply have to make their best
guess as to the correct atomic time when asked to interconvert between
cultural time and atomic time.  Alternately, they may use some other
less-official time mechanism to achieve steady clock behavior.  Most Unix
systems can count clock ticks, even if POSIX time types get confused.

Although the conceptual type of an C<Instant> resembles C<FatRat>, with
arbitrarily large size in either numerator or denominator, the internal form
may of course be optimized internally for "nearby" times, so that, if we
know the year as an integer, the instant within the year can just be a
C<Rat> representing the offset from the beginning of the year.  Calculations
that fall within the same year can then be done in C<Rat> rather than
C<FatRat>, or a table of yearly offsets can find the difference in integer
seconds between two years, since (so far) nobody has had the nerve to
propose fractional leap seconds.  Or whatever.  C<Instant> is opaque, so we
can swap implementations in and out without user-visible consequences.

The term C<now> returns the current time as an C<Instant>.  As with the
C<rand> and C<self> terms, it is not a function, so don't put parens after
it.  It also never looks for arguments, so the next token should be an
operator or terminator.

    now + 300   # the instant five minutes from now

Basic math operations are defined for instants and durations such that the
sum of an instant and a duration is always an instant, while the difference
of two instants is always a duration.  Math on instants may only be done
with durations (or numbers that will be taken as durations, as above); you
may not add two instants.

    $instant + $instant      # WRONG
    $instant - $instant      # ok, returns a duration
    $instant + $duration     # ok, returns an instant

Numeric operations on durations return C<Duration> where that makes sense
(addition, subtraction, modulus).  The type returned for other numeric
operations is unspecified; they may return normal numeric types or they may
return other dimensional types that attempt to assist in dimensional
analysis.  (The latter approach should likely require explicit declaration
for now, until we can demonstrate that it does not adversely impact the
average programmer, and that it plays well with the concept of gradual
typing.)

The C<Blob> type is like an immutable buffer, and therefore responds both to
array and (some) stringy operations.  Note that, like a C<Buf>, its size is
measured in whatever the base unit is, which is not always bytes.  If you
have a C<my Blob[bit] $blob>, then C<$blob.elems> returns the number of bits
in it.  As with buffers, various native types are automatically derived from
native unsigned int types:

    blob1       Blob[bit], a bit string
    blob2       Blob[uint2], a DNA sequence?
    blob3       Blob[uint[3]], an octal string
    blob4       Blob[uint4], a hex string
    blob8       Blob[uint8], a byte string
    blob16      Blob[uint16]
    blob32      Blob[uint32]
    blob64      Blob[uint64]

The C<utf8> type is derived from C<blob8>, with the additional constraint
that it may only contain validly encoded UTF-8.  Likewise, C<utf16> is
derived from C<blob16>, and C<utf32> from C<blob32>.

Note that since these are type names, parentheses must always be used to
call them as coercers, since the listop form is not allowed for coercions.
That is:

    utf8 op $x

is always parsed as

    (utf8) op $x

and never as

    utf8(op $x)

These types do (at least) the following roles:

    Class       Roles
    =====       =====
    Str         Stringy
    Bit         Numeric Boolean Integral
    Int         Numeric Real Integral
    Num         Numeric Real
    Rat         Numeric Real Rational
    FatRat      Numeric Real Rational
    Complex     Numeric
    Bool        Boolean
    Block       Callable
    Range       Iterable
    Set         Setty Iterable
    Bag         Baggy Iterable
    Mix         Mixy Iterable
    Enum        Associative
    EnumMap     Associative Positional Iterable
    Signature
    List        Positional Iterable
    Capture     Positional Associative
    Blob        Stringy Positional
    Instant     Numeric Real
    Duration    Numeric Real
    HardRoutine Routine

[Conjecture:  C<Stringy> may best be split into 2 roles where both C<Str>
and C<Blob> compose the more general one and just C<Str> composes a less
general one.  The more general of those would apply to what is common to any
dense sequence ("string") that C<Str> and C<Blob> both are (either of
characters or bits or integers etc), and the string operators like
concatenation (C<~>) and replication (C<x>, C<xx>) would be part of the more
general role.  The more specific role would apply to C<Str> but not C<Blob>
and includes any specific operators that are specific to I<characters> and
don't apply to bits or integers etc.  The other alternative is to more
clearly distance character strings from bit strings, keeping C<~>/etc for
character strings only and adding an analogy for bit strings.]

The C<Iterable> role indicates not that you can iterate the type directly,
but that you can request the type to return an iterator.  Iterable types may
have multiple iterators (lists) running across them simultaneously, but an
iterator/list itself has only one thread of consumption.  Every time you do
C<get> on an iterator, a value disappears from its list.

Note that C<Set> iterators return only the keys, not the boolean values.
You must explicitly use C<.pairs> to get key/value pairs.  The C<Bag> and
C<Mix> types, on the other hand, default to returning pairs, as a C<Hash>
does.

=head2 Mutable types

Objects with these types have distinct C<.WHICH> values that do not change
even if the object's contents change.  (Routines are considered mutable
because they can be wrapped in place.)

    Iterator    Perl list
    RangeIter   Iterator over a Range
    Scalar      Perl scalar
    Array       Perl array
    Hash        Perl hash
    SetHash     Setty QuantHash[Bool,False]
    BagHash     Baggy QuantHash[UInt,0]
    MixHash     Mixy  QuantHash[Real,0.0]
    Pair        A single key-to-value association
    Buf         Perl buffer (array of integers with some stringy features)
    IO          Perl filehandle
    Routine     Base class for all wrappable executable objects
    Sub         Perl subroutine
    Method      Perl method
    Submethod   Perl subroutine acting like a method
    Macro       Perl compile-time subroutine
    Regex       Perl pattern
    Match       Perl match, usually produced by applying a pattern
    Stash       A symbol table hash (package, module, class, lexpad, etc)
    SoftRoutine A routine that is committed to staying mutable

The C<QuantHash> role differs from a normal C<Associative> hash in how it
handles default values.  If the value of a C<QuantHash> element is set to
the default value for the C<QuantHash>, the element is deleted.  If
undeclared, the default default for a C<QuantHash> is 0 for numeric types,
C<False> for boolean types, and the null string for string and buffer types.
A C<QuantHash> of an object type defaults to the undefined prototype for
that type.  More generally, the default default is whatever defined value a
C<Nil> would convert to for that value type.  A C<QuantHash> of C<Scalar>
deletes elements that go to either 0 or the null string.  A C<QuantHash>
also autodeletes keys for normal undefined values (that is, those undefined
values that do not contain an unthrown exception).

A C<SetHash> is a C<QuantHash> of booleans with a default of C<False>.  If
you use the C<Hash> interface and increment an element of a C<SetHash> its
value becomes true (creating the element if it doesn't exist already).  If
you decrement the element it becomes false and is automatically deleted.
Decrementing a non-existing value results in a C<False> value.  Incrementing
an existing value results in C<True>.  When not used as a C<Hash> (that is,
when used as an C<Array> or list or C<Set> object) a C<SetHash> behaves as a
C<Set> of its keys.  (Since the only possible value of a C<SetHash> is the
C<True> value, it need not be represented in the actual implementation with
any bits at all.)

A C<BagHash> is a C<QuantHash> of C<UInt> with a default of 0.  If you use
the C<Hash> interface and increment an element of a C<BagHash> its value is
increased by one (creating the element if it doesn't exist already).  If you
decrement the element the value is decreased by one; if the value goes to 0
the element is automatically deleted.  An attempt to decrement a
non-existing value returns an undefined value.  When not used as a C<Hash>
(that is, when used as an C<Array> or list or C<Bag> object) a C<BagHash>
behaves as a C<Bag> of its pairs.

A C<MixHash> is a C<QuantHash> of C<Real> with a default of 0.0.  If the
value goes to 0 the element is automatically deleted.  When not used as a
C<Hash> (that is, when used as an C<Array> or list or C<Mix> object) a
C<MixHash> behaves as a C<Mix> of its pairs.

As with C<Hash> types, C<Pair> and C<PairSeq> are mutable in their values
but not in their keys.  (A key can be a reference to a mutable object, but
cannot change its C<.WHICH> identity.  In contrast, the value may be rebound
to a different object, just as a hash element may.)

The following roles are supported:

    Iterator    List
    Scalar
    Array       Positional Iterable
    Hash        Associative
    SetHash     Setty QuantHash[Bool]
    BagHash     Baggy QuantHash[UInt]
    MixHash     Mixy  QuantHash[Real]
    Pair        Associative
    PairSeq     Associative Positional Iterable
    Buf         Stringy
    IO
    Routine     Callable
    Sub         Callable
    Method      Callable
    Submethod   Callable
    Macro       Callable
    Regex       Callable
    Match       Positional Associative
    Stash       Associative
    SoftRoutine Routine

Types that do the C<List> role are generally hidden from casual view, since
iteration is typically triggered by context rather than by explicit call to
the iterator's C<.get> method.  Filehandles are a notable exception.

See L<S06/"Wrapping"> for a discussion of soft vs. hard routines.

=head2 Of types

Explicit types are optional. Perl variables have two associated types: their
"of type" and their "container type".  (More generally, any container has a
container type, including subroutines and modules.) The C<of> type is stored
as its C<of> property, while the container type of the container is just the
object type of the container itself.  The word C<returns> is allowed as an
alias for C<of>.

The C<of> type specifies what kinds of values may be stored in the variable.
An C<of> type is given as a prefix or with the C<of> keyword:

    my Dog $spot;
    my $spot of Dog;

In either case this sets the C<of> property of the container to C<Dog>.  You
may not mix these notations; if you do, a compiler error will result.

An C<of> type on an array or hash specifies the type stored by each element:

    my Dog @pound;  # each element of the array stores a Dog

    my Rat %ship;   # the value of each entry stores a Rat

The key type of a hash may be specified as a shape trait--see S09.

Containers enforce type safety on setting, whereas subroutines enforce type
safety on return.  The C<returns> declarations is an alias for the C<of>
type of a subroutine.

    sub get_pet() of Animal {...}       # of type, obviously
    sub get_pet() returns Animal {...}  # of type
    our Animal sub get_pet() {...}      # of type

To coerce your return value, use a coercion type:

    sub get_pet() returns Pet(Animal) {...}  # coerce any Animal to Pet

For a container, however, use of a coercion type as the C<of> coerces upon
setting rather than returning the value.

=head2 Container types

The container type specifies how the variable itself is implemented. It is
given as a trait of the variable:

    my $spot is Scalar;             # this is the default
    my $spot is PersistentScalar;
    my $spot is DataBase;

Defining a container type is the Perl 6 equivalent to tying a variable in
Perl 5.  But Perl 6 variables are tied directly at declaration time, and for
performance reasons may not be tied with a run-time C<tie> statement unless
the variable is explicitly declared with a container type that does the
C<Tieable> role.

However, package variables are always considered C<Tieable> by default.  As
a consequence, all named packages are also C<Tieable> by default.  Classes
and modules may be viewed as differently tied packages.  Looking at it from
the other direction, classes and modules that wish to be bound to a global
package name must be able to do the C<Package> role.

=head2 Hierarchical types

A non-scalar type may be qualified, in order to specify what type of value
each of its elements stores:

    my Egg $cup;                       # the value is an Egg
    my Egg @carton;                    # each elem is an Egg
    my Array of Egg @box;              # each elem is an array of Eggs
    my Array of Array of Egg @crate;   # each elem is an array of arrays of Eggs
    my Hash of Array of Recipe %book;  # each value is a hash of arrays of Recipes

Each successive C<of> makes the type on its right a parameter of the type on
its left. Parametric types are named using square brackets, so:

    my Hash[Array[Recipe]] %book;

actually means:

    my Hash of Array of Recipe %book;

which is:

    my Hash:of(Array:of(Recipe)) %book;

Because the actual variable can be hard to find when complex types are
specified, there is a postfix form as well:

    my Hash of Array of Recipe %book;           # HoHoAoRecipe
    my %book of Hash of Array of Recipe;        # same thing

Alternately, the return type may be specified within the signature:

    my sub get_book ($key --> Hash of Array of Recipe) {...}

You may also specify the type as the C<of> trait (with C<returns>
allowed as a synonym):

    my Hash of Array of Recipe sub get_book ($key) {...}
    my sub get_book ($key) of Hash of Array of Recipe {...}
    my sub get_book ($key) returns Hash of Array of Recipe {...}

=head2 Parameter types

Parameters may be given types, just like any other variable:

    sub max (int @array is rw) {...}
    sub max (@array of int is rw) {...}

=head2 Generic types

Within a declaration, a class variable (either by itself or following an
existing type name) declares a new type name and takes its parametric value
from the actual type of the parameter it is associated with.  It declares
the new type name in the same scope as that of the associated declaration.

    sub max (Num ::X @array) {
        push @array, X.new();
    }

The new type name is introduced immediately, so two such types in the same
signature must unify compatibly if they have the same name:

    sub compare (Any ::T $x, T $y) {
        return $x eqv $y;
    }

=head2 The Cool class (and package)

The C<Cool> type is derived from C<Any>, and contains all the methods that
are "cool" (as in, "I'm cool with an argument of that type.").

More specifically, these are the methods that are culturally universal,
insofar as the typical user will expect the name of the method to imply
conversion to a particular built-in type that understands the method in
question.  For instance, C<$x.abs> implies conversion to an appropriate
numeric type if C<$x> is "cool" but doesn't already support a method of that
name.  Conversely, C<$x.substr> implies conversion to a string or buffer
type.

The C<Cool> module also contains all multisubs of last resort; these are
automatically searched if normal multiple dispatch does not find a viable
candidate.  Note that the C<Cool> package is mutable, and both single and
multiple dispatch must take into account changes there for the purposes of
run-time monkey patching.  However, since the multiple dispatcher uses the
C<Cool> package only as a failover, compile-time analysis of such dispatches
is largely unaffected for any arguments with an exact or close match.
Likewise any single dispatch a method that is more specific than the C<Cool>
class is not affected by the mutability of C<Cool>.  User-defined classes
don't derive from C<Cool> by default, so such classes are also unaffected by
changes to C<Cool>.

=head1 Names and Variables

=head2 Apostrophe separator

The C<$Package'var> syntax is gone.  Use C<$Package::var> instead.  (Note,
however, that identifiers may now contain an apostrophe or hyphen if
followed by a character matching C<< <.alpha> >>)

=head2 Sigils

Perl 6 includes a system of B<sigils> to mark the fundamental structural
type of a variable:

    $   scalar (object)
    @   ordered array
    %   unordered hash (associative array)
    &   code/rule/token/regex
    ::  package/module/class/role/subset/enum/type/grammar

Within a declaration, the C<&> sigil also declares the visibility of the
subroutine name without the sigil within the scope of the declaration:

    my &func := sub { say "Hi" };
    func;   # calls &func

Within a signature or other declaration, the C<::> pseudo-sigil followed by
an identifier marks a type variable that also declares the visibility of a
package/type name without the sigil within the scope of the declaration.
The first such declaration within a scope is assumed to be an unbound type,
and takes the actual type of its associated argument.  With subsequent
declarations in the same scope the use of the pseudo-sigil is optional,
since the bare type name is also declared.

A declaration nested within must not use the sigil if it wishes to refer to
the same type, since the inner declaration would rebind the type.  (Note
that the signature of a pointy block counts as part of the inner block, not
the outer block.)

=head3 Sigils indicate interface

Sigils indicate overall interface, not the exact type of the bound object.
Different sigils imply different minimal abilities.

C<$x> may be bound to any object, including any object that can be bound to
any other sigil.  Such a scalar variable is always treated as a singular
item in any kind of list context, regardless of whether the object is
essentially composite or unitary.  It will not automatically dereference to
its contents unless placed explicitly in some kind of dereferencing context.
In particular, when interpolating into list context, C<$x> never expands its
object to anything other than the object itself as a single item, even if
the object is a container object containing multiple items.

C<@x> may be bound to an object of the C<Array> class, but it may also be
bound to any object that does the C<Positional> role, such as a C<Range>,
C<Buf>, C<List>, or C<Capture>.  The C<Positional> role implies the
ability to support C<< postcircumfix:<[ ]> >>.

Likewise, C<%x> may be bound to any object that does the C<Associative>
role, such as C<Pair>, C<Set>, C<Bag>, C<Mix>, or C<Capture>.  The
C<Associative> role implies the ability to support C<< postcircumfix:<{ }>
>>.

C<&x> may be bound to any object that does the C<Callable> role, such as any
C<Block> or C<Routine>.  The C<Callable> role implies the ability to support
C<< postcircumfix:<( )> >>.

In any case, the minimal container role implied by the sigil is checked at
binding time at the latest, and may fail earlier (such as at compile time)
if a semantic error can be detected sooner.  If you wish to bind an object
that doesn't yet do the appropriate role, you must either stick with the
generic C<$> sigil, or mix in the appropriate role before binding to a more
specific sigil.

An object is allowed to support both C<Positional> and C<Associative>.  An
object that does not support C<Positional> may not be bound directly to
C<@x>.  However, any construct such as C<%x> that can interpolate the
contents of such an object into list context can automatically construct a
list value that may then be bound to an array variable.  Subscripting such a
list does not imply subscripting back into the original object.

=head3 No intervening whitespace

Unlike in Perl 5, you may no longer put whitespace between a sigil and its
following name or construct.

=head2 Twigils

Ordinary sigils indicate normally scoped variables, either lexical or
package scoped.  Oddly scoped variables include a secondary sigil (a
B<twigil>) that indicates what kind of strange scoping the variable is
subject to:

    $foo        ordinary scoping
    $.foo       object attribute public accessor
    $^foo       self-declared formal positional parameter
    $:foo       self-declared formal named parameter
    $*foo       dynamically overridable global variable
    $?foo       compiler hint variable
    $=foo       Pod variable
    $<foo>      match variable, short for $/{'foo'}
    $!foo       object attribute private storage
    $~foo       the foo sublanguage seen by the parser at this lexical spot

Most variables with twigils are implicitly declared or assumed to be
declared in some other scope, and don't need a "my" or "our".  Attribute
variables are declared with C<has>, though.

=head2 Scope declarators

Normal names and variables are declared using a I<scope declarator>:

    my          # introduces lexically scoped names
    our         # introduces package-scoped names
    has         # introduces attribute names
    anon        # introduces names that are private to the construct
    state       # introduces lexically scoped but persistent names
    augment     # adds definitions to an existing name
    supersede   # replaces definitions of an existing name
    unit        # like our, but introduces a compilation-unit scoped name

Names may also be declared in the signature of a function.  These are
equivalent to a C<my> declaration inside the block of the function, except
that such parameters default to readonly.

The C<anon> declarator allows a declaration to provide a name that can be
used in error messages, but that isn't put into any external symbol table:

    my $secret = anon sub marine () {...}
    $secret(42);  # too many arguments to sub marine

However, the name is introduced into the scope of the declaration itself, so
it may be used to call itself recursively:

    my $secret = anon sub tract($n) { say $n; tract($n-1) if $n };
    $secret(5); # 5 4 3 2 1 0

=head2 Invariant sigils

Sigils are now invariant.  C<$> always means a scalar variable, C<@> an
array variable, and C<%> a hash variable, even when subscripting.  In item
context, variables such as C<@array> and C<%hash> simply return themselves
as C<Array> and C<Hash> objects. (Item context was formerly known as scalar
context, but we now reserve the "scalar" notion for talking about variables
rather than contexts, much as arrays are disassociated from list context.)

=head2 List stringification

In string contexts, lists and list-like objects automatically stringify to
appropriate (white-space separated) string values.  In numeric contexts, the
number of elements in the container is returned.  In boolean contexts, a
true value is returned if and only if there are any elements in the
container.

=head2 The C<.perl> method

To get a Perlish representation of any object, use the C<.perl> method.
Like the C<Data::Dumper> module in Perl 5, the C<.perl> method will put
quotes around strings, square brackets around list values, curlies around
hash values, constructors around objects, properly handle circular
references etc., so that Perl can evaluate the result back to the same
object.  The C<.perl> method will return a representation of the object on
the assumption that, if the code is reparsed at some point, it will be used
to regenerate the object as a scalar in item context.  If you wish to
interpolate the regenerated object in a list context, it may be necessary to
use C<< prefix:<|> >> to force interpolation.

Note that C<.perl> has a very specific definition, and it is expected that
some modules will rely on the ability to roundtrip values with C<EVAL>.  As
such, overriding C<.perl> with a different format (globally using
C<MONKEY-TYPING>, or for specific classes unless special care is taken to
maintain parsability) is unwise.  Code which does not depend on C<.perl>'s
definition should use C<.gist> instead to allow more control.

=head2 The C<.gist> method

C<.gist>, by contrast with C<.perl>, returns a flexible form of an object
intended for human interpretation.  For example, when presented with a very
long list or array, only the first 100 entries will be printed, followed by
C<...> to indicate there are more entries.  If that's not what you want,
stringify the list instead.  This method is only supposed to give you the
gist of the value, not the whole value.

Specific user classes are encouraged to override C<.gist> to do something
appropriate, and it is completely acceptable to monkey patch C<.gist>
methods while doing debugging, without risk of breaking any used module.
C<.gist>, like any method, will accept and ignore unrecognized named
arguments; implementations of C<.gist> are encouraged to standardize on a
set of flags.

[Some conjectural suggestions:

    :oneline        Do not indent or linebreak output
    :width($d)      Wrap output at $d chars
    :charset($obj)  Represent unrecognized characters as escapes
    :ascii          Short for some instantiation of :charset

Conjecturally, C<.gist> on system-defined classes could redispatch to
C<&*PRETTYPRINTER> or some similar system, allowing for a more disciplined
way to change pretty formats.

It may also be desirable to use a richer format for intermediate strings
than simple C<Str>, for instance using an object format that can handle
intelligent line breaking.  However, that's probably overkill.]

=head2 The C<.fmt> method

To get a formatted representation of any scalar value, use the
C<.fmt('%03d')> method to do an implicit C<sprintf> on the value.

To format an array value separated by commas, supply a second argument:
C<.fmt('%03d', ', ')>.  To format a hash value or list of pairs, include
formats for both key and value in the first string: C<< .fmt('%s: %s', "\n")
>>.

=head2 Subscripts

Subscripts now consistently dereference the container produced by whatever
was to their left.  Whitespace is not allowed between a variable name and
its subscript.  However, there are two ways to stretch the construct out
visually.  Since a subscript is a kind of postfix operator, there is a
corresponding B<dot> form of each subscript (C<@foo.[1]> and C<%bar.{'a'}>)
that makes the dereference a little more explicit. Constant string
subscripts may be placed in angles, so C<%bar.{'a'}> may also be written as
C<< %bar<a> >> or C<< %bar.<a> >>.  Additionally, you may insert extra
whitespace using the unspace.

Slicing is specified by the nature of the subscript, not by the sigil.

=head3 Subscripts have list context

The context in which a subscript is evaluated is no longer controlled by the
sigil either.  Subscripts are always evaluated in list context.  (More
specifically, they are evaluated in a variant of list context known as
I<lol> context (List of List), which preserves dimensional information so
that you can do multi-dimensional slices using semicolons.  However, each
slice dimension evaluates its sublist in normal list context, so functions
called as part of a subscript don't see a lol context.  See S09 for more on
slicing.)

If you need to force inner context to item (scalar), we now have convenient
single-character context specifiers such as + for numbers and ~ for strings:

    $x        =  g();       # item context for g()
    @x[f()]   =  g();       # list context for f() and g()
    @x[f()]   = +g();       # list context for f(), numeric item context for g()
    @x[+f()]  =  g();       # numeric item context for f(), list context for g()

    @x[f()]   =  @y[g()];   # list context for f() and g()
    @x[f()]   = +@y[g()];   # list context for f() and g()
    @x[+f()]  =  @y[g()];   # numeric item context for f(), list context for g()
    @x[f()]   =  @y[+g()];  # list context for f(), numeric item context for g()

    %x{~f()}  =  %y{g()};   # string item context for f(), list context for g()
    %x{f()}   =  %y{~g()};  # list context for f(), string item context for g()

Sigils used as functions with parenthesis also force context, so these also
work:

    @x[$(g())]         # item context for g()
    %x{$(g())}         # item context for g()

But note that these don't do the same thing:

    @x[$g()]           # call function in $g
    %x{$g()}           # call function in $g

Array and Hash variables can be evaluated in item context by prefixing them
with a single dollar sign:

    $@a               # same as  item @a
    $%h               # same as  item %h

=head2 List assignment and binding

There is a need to distinguish list assignment from list binding.  List
assignment works much like it does in Perl 5, copying the values.  There's a
new C<:=> binding operator that lets you bind names to C<Array> and C<Hash>
objects without copying, in the same way as subroutine arguments are bound
to formal parameters.  See S06 for more about binding.

=head2 List

Comma-separated values (as well as word-quoting constructs such as
C<< <a b c> >>) form a C<List>:

    (1,2,3,:mice<blind>)

The result is a C<List> object containing three C<Int> objects and a
C<Pair> object, that is, four positional objects.  When, however, you say
something like:

    rhyme(1,2,3,:mice<blind>)

the syntactic list is translated (at compile time, in this case) into a
C<Capture> object with three positionals and one named argument in
preparation for binding.  More generally, a list is transmuted to a
capture any time it is bound to a complete signature.

You may force immediate conversion to a C<Capture> object by prefixing the
list with a backslash:

    $args = \(1,2,3,:mice<blind>)

Individual arguments in an argument list (or capture composer) are parsed as ordinary
expressions, and any functions mentioned are called immediately, with each
function's results placed as an argument within the outer argument list.  Whether any
given argument is flattened will depend on its eventual binding, and in
general cannot be known at composition time.

We use "argument" here to mean anything that would be taken as a single
argument if bound to a positional or named parameter:

    rhyme(1,2,3,:mice<blind>)     # rhyme has 4 arguments
    rhyme((1,2),3,:mice<blind>)   # rhyme has 3 arguments
    rhyme((1,2,3),:mice<blind>)   # rhyme has 2 arguments
    rhyme((1,2),(3,:mice<blind>)) # rhyme has 2 arguments
    rhyme((1,2,3,:mice<blind>))   # rhyme has 1 argument

In these examples, the first argument to the function is a list in all but
the first case, where it is simply the literal integer 1.  An argument is
either of:

=over

=item *

A parenthesized list that groups together a sublist, or

=item *

Any other object that can function as a single argument.

=back

Looking at it the other way, all arguments that don't actually need to be
wrapped up in a list are considered degenerate lists in their own right
when it comes to binding.  Note that a capture is not considered a kind of
list, so does not flatten in flat context.

=head2 Lists, parameters, and Captures

When an argument is bound to a parameter, the behavior depends on whether
the parameter is "flattening" or "argumentative".  Positional parameters and
slice parameters are argumentative
and just return the next syntactic argument
without flattening.  (A slice differs from an ordinary positional
parameter in being "slurpy", that is, it is intended to fetch multiple
values from the variadic region of the surrounding capture.  Slurpy contexts
come in flattening (C<*> parameters), slicing (C<**> parameters), and one-arg (C<+> parameters)
forms.)

The fact that a parameter is being bound implies that there is an outer
capture being bound to a signature.  The capture's iterator provides a
C<.get> and a C<.getarg> method to tell the iterator what context to bind
in.  For positional/slice parameters, the C<.getarg> method returns the
entire next argument from the iterator. It returns other objects unchanged.
In contrast, flat parameters call C<.get> on the capture's iterator, which
flattens any sublists before pulling out the next item.  In either case,
no bare list object is seen as a normal bound argument.  (There is a way
to bind the underlying list using backslash, however.  This is how
internal routines can deal with lists as real objects.)

In contrast to parameter binding, if a C<List> is bound to an entire
signature (typically as part of a function or method call), it will be
transformed first into a capture object, which is much like a list but has
its arguments divvied up into positional and named subsets for faster
binding.  (Usually this transformation happens at compile time.) If the
first positional is followed by a colon instead of a comma, it is marked as
the invocant in case it finds itself in a context that cares.  It's illegal
to use the colon in place of the comma anywhere except after the first
argument.

Explicit binding to an individual variable is considered a form of signature
binding, which is to say a declarator puts implicit signature parens around
the unparenthesized form:

    my (*@x) := foo(); # signature binding
    my *@x := foo();   # same thing

The parens are, of course, required if there is more than one parameter.

C<Capture> objects are immutable in the abstract, but evaluate their
arguments lazily.  Before everything inside a C<Capture> is fully evaluated
(which happens at compile time when all the arguments are constants), the
eventual value may well be unknown.  All we know is that we have the promise
to make the bits of it immutable as they become known.

C<Capture> objects may contain multiple unresolved iterators such as feeds
or lazy lists.  How these are resolved depends on what they
are eventually bound to.  Some bindings are sensitive to multiple dimensions
while others are not.  Binding to a list of lists is often known as
"slicing", because it's commonly used to index "slices" of a potentially
multi-dimensional array.

You may retrieve parts from a C<Capture> object with a prefix sigil operator:

    $args = \3;     # same as "$args = \(3)"
    @$args;         # same as "Array($args)"
    %$args;         # same as "Hash($args)"

When cast into an array, you can access all the positional arguments; into a
hash, all named arguments.

All prefix sigil operators accept one positional argument, evaluated in item
context as a rvalue.  They can interpolate in strings if called with
parentheses.  The special syntax form C<$()> translates into C<$( $/.made //
Str($/) )> to operate on the current match object; similarly C<@()> and
C<%()> can extract positional and named submatches.

C<Capture> objects fill the ecological niche of references in
Perl 6.  You can think of them as "fat" references, that is, references that
can capture not only the current identity of a single object, but also the
relative identities of several related objects.  Conversely, you can think
of Perl 5 references as a degenerate form of C<Capture> when you want to
refer only to a single item.

The C<sink> statement prefix will eagerly evaluate any block or statement,
throw away the results, and instead return the empty C<List> value, C<()>.
This can be useful to peg some behavior to an empty list while still
returning an empty list:

    # Check that incoming argument list isn't null
    @inclist = map { $_ + 1 }, @list || sink warn 'Nil input!';

    @inclist = do for @list || sink { warn 'Nil input!'; $warnings++; } {
        $_ + 1;
    }

    # Check that outgoing result list isn't null
    @inclist = do map { $_ + 1 }, @list or sink warn 'Nil result!';

    @inclist = do for @list {
        $_ + 1;
    } or sink { warn 'Nil result'; $warnings++; }

Given C<sink>, there's no need for an "else" clause on Perl 6's loops, and
the C<sink> construct works in any list, not just C<for> loops.

=head2 CaptureCursors

A C<CaptureCursor> object is a view into another capture with an associated
start position.  Such a cursor is essentially a pattern-matching state.
Capture cursors are used for operations like C<grep> and C<map> and C<for>
loops that need to apply a short signature multiple times to a longer list
of values supplied by the base capture.  When we say "capture" we sometimes
mean either C<Capture> or C<CaptureCursor>.  C<CaptureCursors> are also
immutable.  When pattern matching a signature against a cursor, you get a
new cursor back which tells you the new position in the base capture.

=head2 Signature objects

A signature object (C<Signature>) may be created with colon-prefixed parens:

    my ::MySig ::= :(Int, Num, Complex, Status)

Expressions inside the signature are parsed as parameter declarations rather
than ordinary expressions.  See S06 for more details on the syntax for
parameters.

Declarators generally make the colon optional:

    my ($a,$b,$c);      # parsed as signature

Signature objects bound to type variables (as in the example above) may be
used within other signatures to apply additional type constraints.  When
applied to a capture argument, the signature allows you to take the types of
the capture's arguments from C<MySig>, but declare the (untyped) variable
names yourself via an additional signature in parentheses:

    sub foo (Num  $num, MySig $a ($i,$j,$k,$mousestatus)) {...}
    foo($mynum, \(1, 2.7182818, 1.0i, statmouse());

=head2 Ampersand and invocation

Unlike in Perl 5, the notation C<&foo> merely stands for the C<foo> function
as a C<Routine> object without calling it.  You may call any Code object by
dereferencing it with parens (which may, of course, contain arguments):

    &foo($arg1, $arg2);

Whitespace is not allowed before the parens because it is parsed as a
postfix.  As with any postfix, there is also a corresponding C<.()>
operator, and you may use the "unspace" form to insert optional whitespace
and comments between the backslash and either of the postfix forms:

    &foo\   ($arg1, $arg2);
    &foo\   .($arg1, $arg2);
    &foo\#`[
        embedded comment
    ].($arg1, $arg2);

Note however that the parentheses around arguments in the "normal" named
forms of function and method calls are not postfix operators, so do not
allow the C<.()> form, because the dot is indicative of an actual
dereferencing operation, which the named forms aren't doing.  You may,
however, use "unspace" to install extra space before the parens in the
forms:

    foo()       # okay
    foo\ ()     # okay
    foo.()      # means foo().()

    .foo()      # okay
    .foo\ ()    # okay
    .foo.()     # means .foo().()

    $.foo()     # okay
    $.foo\ ()   # okay
    $.foo.()    # means $.foo().()

If you I<do> use the dotty form on these special forms, it will assume you
wanted to call the named form without arguments, and then dereference the
result of that.

=head2 Specifying a dispatch candidate

With multiple dispatch, C<&foo> is actually the name of a C<dispatch>
routine (instantiated from a C<proto>) controlling a set of candidate
functions (which you can use as if it were an ordinary function, because a
C<dispatch> is really an C<only> function with pretensions to management of
a dispatcher).  However, in that case C<&foo> by itself is not sufficient to
uniquely name a specific function.  To do that, the type may be refined by
using a signature literal as a postfix operator:

    &foo:(Int,Num)

Use of a signature that does not unambiguously select a single multi results
in failure.

It still just returns a C<Routine> object.  A call may also be partially
applied (primed) by using the C<.assuming> method:

    &foo.assuming(1,2,3,:mice<blind>)

=head2 Multidimensional slices and lists

Slicing syntax is covered in S09.  A multidimensional slice will be done
with semicolons between individual slice sublists.  The semicolons imply one
extra level of tree-ness.  So when you say

    @matrix[1..*; 0]

really means

    @matrix[List.new( (1..*), 0 )]

Each such slice sub-list is evaluated lazily.

Just as parens and brackets can be used to compose lists and arrays, if
you put any semicolons into either form, it becomes a multi-dimensional
composer:

    (1..*; 0)   # same as (lol (1..*), 0), that is LoL.new($(1..*), 0)
    [1..*; 0]   # same as [lol (1..*), 0], that is Array.new($(1..*), 0)

A consequence of this is that you may not put more than one statement inside
parens or brackets expecting sequence semantics, that is, the way a normal
block evaluates all but the final statement for declarations or side
effects, then returns the value of the final statement.  In order to do that
in Perl 6, you need to use one of these constructs:

    do { my $x = 42; $x }
    $( my $x = 42; $x )
    @( my @x = 42,43; @x )
    %( my %x = a => 42; %x )

Note that the first one limits the scope of the declaration to the block,
while the parenthesized forms are parasitic on the outer lexical scope.

=head2 Subscript adverbs

To make a slice subscript return something other than values, append an
appropriate adverb to the subscript.

    @array = <A B>;
    @array[0,1,2];      # returns 'A', 'B', (Any)
    @array[0,1,2] :p;   # returns 0 => 'A', 1 => 'B'
    @array[0,1,2] :kv;  # returns 0, 'A', 1, 'B'
    @array[0,1,2] :k;   # returns 0, 1
    @array[0,1,2] :v;   # returns 'A', 'B'

    %hash = (:a<A>, :b<B>);
    %hash<a b c>;       # returns 'A', 'B', (Any)
    %hash<a b c> :p;    # returns a => 'A', b => 'B'
    %hash<a b c> :kv;   # returns 'a', 'A', 'b', 'B'
    %hash<a b c> :k;    # returns 'a', 'b'
    %hash<a b c> :v;    # returns 'A', 'B'

These adverbial forms all weed out non-existing entries if the adverb is
true; if not, they leave them in, just as an ordinary slice would.  So:

    @array[0,1,2] :!p;  # returns 0 => 'A', 1 => 'B', 2 => (Any)
    %hash<a b c>  :!kv; # returns 'a', 'A', 'b', 'B', 'c', (Any)

Likewise,

    my ($a,$b,$c) = %hash<a b c> :delete;

deletes the entries I<en passant> while returning them.  (Of course, any of
these forms also work in the degenerate case of a slice containing a single
index.)  Note that these forms work by virtue of the fact that the subscript
is the topmost previous operator.  You may have to parenthesize or force
list context if some other operator that is tighter than comma would appear
to be topmost:

    1 + (%hash{$x} :delete);
    $x = (%hash{$x} :delete);
    ($x) = %hash{$x} :delete;

(The situation does not often arise for the slice modifiers above because
they are usually used in list context, which operates at comma precedence.)

The element is deleted only if the adverb is true.  While C<:!delete> is
essentially a no-op; you could conditionally delete entries I<en passant>
based on passing a flag such as in C<:delete($kill'em)>.  In either case,
the values are returned.

You may also perform an existence test, either on a single item or a
junction of items:

    if %hash<foo> :exists {...}
    if %hash{any <a b c>} :exists {...}
    if %hash{all <a b c>} :exists {...}
    if %hash{one <a b c>} :exists {...}
    if %hash{none <a b c>} :exists {...}

Using the C<:exists> adverb together with a list slice results in in a
C<List> of C<Bool>, which you could also put in a junction with similar
semantics:

    if any %hash<a b c> :exists {...}
    if all %hash<a b c> :exists {...}
    if one %hash<a b c> :exists {...}
    if none %hash<a b c> :exists {...}

although with different optimization options for the compiler.

You may use C<:!exists> to test for non-existence.  This is specifically
handy because of precedence rules making C<< !%hash<a> :exists >> apply the
C<:exists> to the prefix C<!>.  C<< %hash<a> :!exists >> does not have that
problem.

=head2 Combining subscript adverbs

Like named parameters in a call, there is no order in handling multiple
adverbs with subscripts.  Some combinations make sense, such as:

  %a = %b{@keys-to-extract} :delete :p; # same as :p :delete

would slice out pairs for the given keys out of one hash into another.
Whereas

  @actually-deleted = %h{@keys-to-extract} :delete :k; # same as :k :delete

would return the I<keys> that were actually deleted from the hash.

The adverbs that specify a return type only, can B<not> be combined, because
combinations such as C<:kv :p> or C<:v :k> simply do not make sense.

These combinations are considered legal and mean the following:

  :delete :kv            delete, return key/values of actually deleted keys
  :delete :!kv           delete, return key/values of all keys attempted
  :delete :p             delete, return pairs of actually deleted keys
  :delete :!p            delete, return pairs of all keys attempted
  :delete :k             delete, return actually deleted keys
  :delete :!k            delete, return all keys attempted to delete
  :delete :v             delete, return values of actually deleted keys
  :delete :!v            delete, return values of all keys attempted
  :delete :exists        delete, return Bools indicating keys existed
  :delete :!exists       delete, return Bools indicating keys did not exist
  :delete :exists :kv    delete, return list with key,True for key existed
  :delete :!exists :kv   delete, return list with key,False for key existed
  :delete :exists :!kv   delete, return list with key,Bool whether key existed
  :delete :!exists :!kv  delete, return list with key,!Bool whether key existed
  :delete :exists :p     delete, return pairs with key/True for key existed
  :delete :!exists :p    delete, return pairs with key/False for key existed
  :delete :exists :!p    delete, return pairs with key/Bool whether key existed
  :delete :!exists :!p   delete, return pairs with key/!Bool whether key existed
  :exists :kv            return pairs with key,True for key exists
  :!exists :kv           return pairs with key,False for key exists
  :exists :!kv           return pairs with key,Bool for key exists
  :!exists :!kv          return pairs with key,!Bool for key exists
  :exists :p             return pairs with key/True for key exists
  :!exists :p            return pairs with key/False for key exists
  :exists :!p            return pairs with key/Bool for key exists
  :!exists :!p           return pairs with key/!Bool for key exists

An implementation is free to silently ignore any other combinations or
silently prefer one of the adverbs given above any other.

=head2 Numeric and boolean context of hashes

In numeric context (i.e. when cast into C<Int> or C<Num>), a C<Hash> object
becomes the number of pairs contained in the hash.  In a boolean context, a
Hash object is true if there are any pairs in the hash.

=head2 List sorting

Sorting a list of pairs should sort on their keys by default, then on their
values.  Sorting a list of lists should sort on the first elements, then the
second elements, etc.  For more on C<sort> see S29.

=head2 Special variables

Many of the special variables of Perl 5 are going away.  Those that apply to
some object such as a filehandle will instead be attributes of the
appropriate object.  Those that are truly global will have global alphabetic
names, such as C<$*PID> or C<@*ARGS>.

Any remaining special variables will be lexically scoped.  This includes
C<$_> and C<@_>, as well as the new C<$/>, which is the return value of the
last regex match.  C<$0>, C<$1>, C<$2>, etc., are aliases into the C<$/>
object.

=head2 Array end index

The C<$#foo> notation is dead.  Use C<@foo.end> or C<@foo[*-1]> instead.
(Or C<@foo.shape[$dimension]> for multidimensional arrays.)

=head1 Names

An I<identifier> is composed of an alphabetic character followed by any
sequence of alphanumeric characters.  The definitions of alphabetic and
numeric include appropriate Unicode characters.  Underscore is always
considered alphabetic.  An identifier may also contain isolated apostrophes
or hyphens provided the next character is alphabetic.

A I<name> is anything that is a legal part of a variable name (not counting
the sigil).  This includes

    $foo                # simple identifiers
    $Foo::Bar::baz      # compound identifiers separated by ::
    $Foo::($bar)::baz   # compound identifiers that perform interpolations
    $42                 # numeric names
    $!                  # certain punctuational variables

When not used as a sigil, the semantic function of C<::> within a name is to
force the preceding portion of the name to be considered a package through
which the subsequent portion of the name is to be located.  If the preceding
portion is null, it means the package is unspecified and must be searched
for according to the nature of what follows.  Generally this means that an
initial C<::> following the main sigil is a no-op on names that are known at
compile time, though C<::()> can also be used to introduce an interpolation
(see below).  Also, in the absence of another sigil, C<::> can serve as its
own sigil indicating intentional use of a not-yet-declared package name.

Unlike in Perl 5, if a sigil is followed by comma, semicolon, a colon not
followed by an identifier, or any kind of bracket or whitespace (including
Unicode brackets and whitespace), it will be taken to be a sigil without a
name rather than a punctuational variable.  This allows you to use sigils as
coercion operators:

    print $( foo() )    # foo called in item context
    print %( foo() )   # foo called in hash context

Bare sigils may be used as placeholders for anonymous variables:

    my ($a, $, $c) = 1..3;
    print unless (state $)++;

Outside of declarative constructs you may also use C<*> for a placeholder:

    ($a, *, $c) = 1..3;

Which would be the same as:

    ($a, $, $c) = 1..3;

=head2 Package-qualified names

Ordinary package-qualified names look like they do in Perl 5:

    $Foo::Bar::baz      # the $baz variable in package Foo::Bar

Sometimes it's clearer to keep the sigil with the variable name, so an
alternate way to write this is:

    Foo::Bar::<$baz>

This is resolved at compile time because the variable name is a constant.

=head2 Pseudo-packages

The following pseudo-package names are reserved at the front of a name:

    MY          # Symbols in the current lexical scope (aka $?SCOPE)
    OUR         # Symbols in the current package (aka $?PACKAGE)
    CORE        # Outermost lexical scope, definition of standard Perl
    GLOBAL      # Interpreter-wide package symbols, really UNIT::GLOBAL
    PROCESS     # Process-related globals (superglobals)
    COMPILING   # Lexical symbols in the scope being compiled

The following relative names are also reserved but may be used
anywhere in a name:

    CALLER      # Contextual symbols in the immediate caller's lexical scope
    CALLERS     # Contextual symbols in any caller's lexical scope
    DYNAMIC     # Contextual symbols in my or any caller's lexical scope
    OUTER       # Symbols in the next outer lexical scope
    OUTERS      # Symbols in any outer lexical scope
    LEXICAL     # Contextual symbols in my or any outer's lexical scope
    UNIT        # Symbols in the outermost lexical scope of compilation unit
    SETTING     # Lexical symbols in the unit's DSL (usually CORE)
    PARENT      # Symbols in this package's parent package (or lexical scope)
    CLIENT      # The nearest CALLER that comes from a different package

Other all-caps names are semi-reserved.  We may add more of them in the
future, so you can protect yourself from future collisions by using mixed
case on your top-level packages.  (We promise not to break any existing
top-level CPAN package, of course.  Except maybe C<ACME>, and then only for
coyotes.)

The file's scope is known as C<UNIT>, but there are one or more lexical
scopes outside of that corresponding to the linguistic setting (often known
as the prelude in other cultures).  Hence, the C<SETTING> scope is
equivalent to C<UNIT::OUTERS>.  For a standard Perl program C<SETTING> is the
same as C<CORE>, but various startup options (such as C<-n> or C<-p>) can
put you into a domain specific language, in which case C<CORE> remains the
scope of the standard language, while C<SETTING> represents the scope
defining the DSL that functions as the setting of the current file.  When used
as a search term in the middle of a name, C<SETTING> includes all its outer scopes
up to C<CORE>.  To get I<only> the setting's outermost scope, use C<UNIT::OUTER> instead.
See
also the C<-L>/C<--language> switch described in L<S19>.  If a setting
wishes to gain control of the main execution, it merely needs to declare a
C<MAIN> routine as documented in S06.  In this case the ordinary execution
of the user's code is suppressed; instead, execution of the user's code is
entirely delegated to the setting's C<MAIN> routine, which calls back to the
user's lexically embedded code with C<{YOU_ARE_HERE}>.

The C<{YOU_ARE_HERE}> functions within the setting as a proxy for the user's
C<UNIT> block, so C<-n> and C<-p> may be implemented in a setting with:

    for $*ARGFILES.lines {YOU_ARE_HERE}                 # -n
    map *.say, do for $*ARGFILES.lines {YOU_ARE_HERE}   # -p

or

    map {YOU_ARE_HERE}, $*ARGFILES.lines;               # -n
    map *.say, map {YOU_ARE_HERE}, $*ARGFILES.lines;    # -p

and the user may use loop control phasers as if they were directly in the
loop block.  Any C<OUTER> in the user's code refers to the block outside of
C<{YOU_ARE_HERE}>.  If used as a standalone statement, C<{YOU_ARE_HERE}>
runs as if it were a bare block.

Note that, since the C<UNIT> of an C<EVAL> is the evaluated string itself,
the C<SETTING> of an C<EVAL> is the language in effect at the point of the
C<EVAL>, not the language in effect at the top of the file.  (You may,
however, use C<OUTER::SETTING> to get the setting of the code that is
executing the C<EVAL>.)  In more traditional terms, the normal program is
functioning as the "prelude" of the C<EVAL>.

So the outermost lexical scopes nest like this, traversed via C<OUTER>:

    CORE <= SETTING < UNIT < (your_block_here)

The outermost package scopes nest like this, traversed via C<PARENT>:

    GLOBAL <  (your_package_here)

Your main program starts up in the C<GLOBAL> package and the C<UNIT> lexical
scope.  Whenever anything is declared with "our" semantics, it inserts a
name into both the current package and the current lexical scope.  (And "my"
semantics only insert into the current lexical scope.)  Note that the
standard setting, C<CORE>, is a lexical scope, not a package; the various
items that are defined within (or imported into) C<CORE> are *not* in
C<GLOBAL>, which is pretty much empty when your program starts compiling,
and mostly only contains things you either put there yourself, or some other
module put there because you used that module.  In general things defined
within (or imported into) C<CORE> should only be declared or imported with
"my" semantics.  All Perl code can see C<CORE> anyway as the outermost
lexical scope, so there's no need to also put such things into C<GLOBAL>.

The C<GLOBAL> package itself is accessible via C<UNIT::GLOBAL>.  The
C<PROCESS> package is accessible via C<UNIT::PROCESS>.  The C<PROCESS>
package is not the parent of C<GLOBAL>.  However, searching up the dynamic
stack for dynamic variables will look in all nested dynamic scopes (mapped
automatically to each call's lexical scope, not package scope) out to the
main dynamic scope; once all the dynamic scopes are exhausted, it also looks
in the C<GLOBAL> package and then in the C<PROCESS> package, so C<$*OUT>
typically finds the process's standard output handle.  Hence, C<PROCESS> and
C<GLOBAL> serve as extra outer dynamic scopes, much like C<CORE> and
C<SETTING> function as extra outer lexical scopes.

Extra C<SETTING> scopes keep their identity and their nesting within
C<CORE>, so you may have to go to C<OUTER> several times from C<UNIT> before
you get to C<CORE>.  Normally, however, there is only the core setting, in
which case C<UNIT::OUTER> ends up meaning the same as C<SETTING> which is
the same as C<CORE>.

Extra C<GLOBAL> scopes are treated differently.  Every compilation unit has
its own associated C<UNIT::GLOBAL> package.  As the currently compiling
compilation unit expresses the need for various other compilation units, the
global names known to those other units must be merged into the new unit's
C<UNIT::GLOBAL>.  (This includes the names in all the packages within the
global package.)  If two different units use the same global name, they must
generally be taken to refer to the same item, but only if the type
signatures can be meshed (and augmentation rules followed, in the case of
package names).  If two units provide package names with incompatible type
signatures, the compilation of the unit fails.  In other words, you may not
use incompatible global types to provide a union type.  However, if one or
the other unit underspecifies the type in a compatible way, the
underspecified type just takes on the extra type information as it learns
it.  (Presumably some combination of Liskov substitution, duck-typing, and
run-time checking will prevent tragedy in the unit that was compiled with
the underspecified type.  Alternately, the compiler is allowed to recompile
or re-examine the unit with the new type constraints to see if any issues
are certain to arise at run time, in which case the compiler is free to
complain.)

Any dynamic variable declared with C<our> in the user's main program
(specifically, the part compiled with C<GLOBAL> as the current package) is
accessible (by virtue of being in C<GLOBAL>) as a dynamic variable even if
not directly in the dynamic call chain.  Note that dynamic vars do *not*
look in C<CORE> for anything.  (They I<might> look in C<SETTING> if you're
running under a setting distinct from C<CORE>, if that setting defines a
dynamic scope outside your main program, such as for the C<-n> or C<-p>
switch.)  Context variables declared with C<our> in the C<GLOBAL> or
C<PROCESS> packages do not need to use the C<*> twigil, since the twigil is
stripped before searching those packages.  Hence, your environment variables
are effectively declared without the sigil:

    augment package GLOBAL { our %ENV; }

=head2 Interpolating into names

You may interpolate a string into a package or variable name using
C<::($expr)> where you'd ordinarily put a package or variable name.  The
string is allowed to contain additional instances of C<::>, which will be
interpreted as package nesting.  You may only interpolate entire names,
since the construct starts with C<::>, and either ends immediately or is
continued with another C<::> outside the parens.  Most symbolic references
are done with this notation:

    $foo = "Bar";
    $foobar = "Foo::Bar";
    $::($foo)           # lexically-scoped $Bar
    $::("MY::$foo")     # lexically-scoped $Bar
    $::("OUR::$foo")    # package-scoped $Bar
    $::("GLOBAL::$foo") # global $Bar
    $::("PROCESS::$foo")# process $Bar
    $::("PARENT::$foo") # current package's parent's $Bar
    $::($foobar)        # $Foo::Bar
    $::($foobar)::baz   # $Foo::Bar::baz
    $::($foo)::Bar::baz # $Bar::Bar::baz
    $::($foobar)baz     # ILLEGAL at compile time (no operator baz)

Note that unlike in Perl 5, initial C<::> doesn't imply global.  Here as
part of the interpolation syntax it doesn't even imply package.  After the
interpolation of the C<::()> component, the indirect name is looked up
exactly as if it had been there in the original source code, with priority
given first to leading pseudo-package names, then to names in the lexical
scope (searching scopes outwards, ending at C<CORE>). The current package is
searched last.

Use the C<MY> pseudopackage to limit the lookup to the current lexical
scope, and C<OUR> to limit the scopes to the current package scope.

=head2 Strict lookup

When "strict" is in effect (which is the default except for one-liners),
non-qualified variables (such as C<$x> and C<@y>) are only looked up from
lexical scopes, but never from package scopes.

To bind package variables into a lexical scope, simply say C<our ($x, @y)>.
To bind global variables into a lexical scope, predeclare them with C<use>:

    use PROCESS <$IN $OUT>;

Or just refer to them as C<$*IN> and C<$*OUT>.

=head2 Direct lookup

To do direct lookup in a package's symbol table without scanning, treat the
package name as a hash:

    Foo::Bar::{'&baz'}  # same as &Foo::Bar::baz
    PROCESS::<$IN>      # Same as $*IN
    Foo::<::Bar><::Baz> # same as Foo::Bar::Baz

The C<::> before the subscript is required here, because the
C<Foo::Bar{...}> syntax is reserved for attaching a "WHENCE" initialization
closure to an autovivifiable type object.  (see S12).

Unlike C<::()> symbolic references, this does not parse the argument for
C<::>, nor does it initiate a namespace scan from that initial point.  In
addition, for constant subscripts, it is guaranteed to resolve the symbol at
compile time.

The null pseudo-package is reserved to mean the same search list as an
ordinary name search.  That is, the following are all identical in meaning:

    $foo
    $::{'foo'}
    ::{'$foo'}
    $::<foo>
    ::<$foo>

That is, each of them scans lexical scopes outward, and then the current
package scope (though the package scope is then disallowed when "strict" is
in effect).

As a result of these rules, you can write any arbitrary variable name as
either of:

    $::{'!@#$#@'}
    ::{'$!@#$#@'}

You can also use the C<< ::<> >> form as long as there are no spaces in the
name.

=head2 Symbol tables

The current lexical symbol table is now accessible through the
pseudo-package C<MY>.  The current package symbol table is visible as
pseudo-package C<OUR>.  The C<OUTER> name refers to the C<MY> symbol table
immediately surrounding the current C<MY>, and C<OUTER::OUTER> is the one
surrounding that one.

    our $foo = 41;
    say $::foo;         # prints 41, :: is no-op
    {
        my $foo = 42;
        say MY::<$foo>;         # prints "42"
        say $MY::foo;           # same thing
        say $::foo;             # same thing, :: is no-op here

        say OUR::<$foo>;        # prints "41"
        say $OUR::foo;          # same thing

        say OUTER::<$foo>;      # prints "41" (our $foo is also lexical)
        say $OUTER::foo;        # same thing
    }

You may not use any lexically scoped symbol table, either by name or by
reference, to add symbols to a lexical scope that is done compiling.  (We
reserve the right to relax this if it turns out to be useful though.)

=head2 Dynamic lookup

The C<CALLER> package refers to the lexical scope of the (dynamically
scoped) caller.  The caller's lexical scope is allowed to hide any
user-defined variable from you.  In fact, that's the default, and a lexical
variable must have the trait "C<is dynamic>" to be visible via C<CALLER>.
(C<$_>, C<$!> and C<$/> are always dynamic, as are any variables whose
declared names contain a C<*> twigil.) If the variable is not visible in the
caller, it returns C<Failure>.  Variables whose names are visible at the
point of the call but that come from outside that lexical scope are
controlled by the scope in which they were originally declared as dynamic.
Hence the visibility of C<< CALLER::<$*foo> >> is determined where C<$*foo>
is actually declared, not by the caller's scope (unless that's where it
happens to be declared).  Likewise C<< CALLER::CALLER::<$x> >> depends only
on the declaration of C<$x> visible in your caller's caller.

User-defined dynamic variables should generally be initialized with C<::=>
unless it is necessary for variable to be modified.  (Marking dynamic
variables as readonly is very helpful in terms of sharing the same value
among competing threads, since a readonly variable need not be locked.)

Empty C<proto> definitions defined with C<{*}> are considered invisible to
C<CALLER>, so a C<multi> may refer directly to the caller of the C<proto>
using a single C<CALLER> lookup.  Autogenerated C<proto> entries follow the
same rule.

=head2 C<DYNAMIC>

The C<DYNAMIC> pseudo-package is just like C<CALLER> except that it starts
in the current dynamic scope and from there scans outward through all
dynamic scopes (frames) until it finds a dynamic variable of that name in
that dynamic frame's associated lexical pad.  (This search is implied for
variables with the C<*> twigil; hence C<$*FOO> is equivalent to C<<
DYNAMIC::<$*FOO> >>.)  If, after scanning outward through all those dynamic
scopes, there is no variable of that name in any immediately associated
lexical pad, it strips the C<*> twigil out of the name and looks in the
C<GLOBAL> package followed by the C<PROCESS> package.  If the value is not
found, it returns C<Failure>.

Unlike C<CALLER>, C<DYNAMIC> will see a dynamic variable that is declared in
the current scope, since it starts searching 0 scopes up the stack rather
than 1.  You may, however, use C<< CALLER::<$*foo> >> to bypass a dynamic
definition of C<$*foo> in your current scope, such as to initialize it with
the outer dynamic value:

    my $*foo ::= CALLER::<$*foo>;

The C<temp> declarator may be used (without an initializer) on a dynamic
variable to perform a similar operation:

    temp $*foo;

The main difference is that by default it initializes the new C<$*foo> with
its current value, rather than the caller's value.  Also, it is allowed only
on read/write dynamic variables, since the only reason to make a copy of the
outer value would be because you'd want to override it later and then forget
the changes at the end of the current dynamic scope.

You may also use C<< OUTER::<$*foo> >> to mean you want to start the search
in your outer lexical scope, but this will succeed only if that outer
lexical scope also happens to be one of your current I<dynamic> scopes.
That is, the same search is done as with the bare C<$*foo>, but any "hits"
are ignored until we've got to the C<OUTER> scope in our traversal.

=head2 Package lookup

There is no longer any special package hash such as C<%Foo::>.  Just
subscript the package object itself as a hash object, the key of which is
the variable name, including any sigil.  The package object can be derived
from a type name by use of the C<::> postfix:

    MyType::<$foo>

(Directly subscripting the type with either square brackets or curlies is
reserved for various generic type-theoretic operations.  In most other
matters type names and package names are interchangeable.)

Typeglobs are gone.  Use binding (C<:=> or C<::=>) to do aliasing.
Individual variable objects are still accessible through the hash
representing each symbol table, but you have to include the sigil in the
variable name now: C<MyPackage::{'$foo'}> or the equivalent C<<
MyPackage::<$foo> >>.

=head2 Globals

Interpreter globals live in the C<GLOBAL> package.  The user's program
starts in the C<GLOBAL> package, so "our" declarations in the mainline code
go into that package by default.  Process-wide variables live in the
C<PROCESS> package.  Most predefined globals such as C<$*UID> and C<$*PID>
are actually process globals.

=head2 The C<PROCESS> package

There is only ever a single C<PROCESS> package.  For an ordinary Perl
program running by itself, there is only one C<GLOBAL> package as well.
However, in certain situations (such as shared hosting under a webserver),
the actual process may contain multiple virtual processes or interpreters,
each running its own "main" code.  In this case, the C<GLOBAL> namespace
holds variables that properly belong to the individual virtual process,
while the C<PROCESS> namespace holds variables that properly belong to the
actual process as a whole.  From the viewpoint of the program there is
little difference as long as all global variables are accessed as if they
were dynamic variables (by using the C<*> twigil).  The process as a whole
may place restrictions on the mutability of process variables as seen by the
individual subprocesses.  Also, individual subprocesses may not create new
process variables.  If the process wishes to grant subprocesses the ability
to communicate via the C<PROCESS> namespace, it must supply a writeable
dynamic variable to all the subprocesses granted that privilege.

=head2 Dynamic variable creation

It is illegal to assign or bind a dynamic variable that does not already
exist.  It will not be created in C<GLOBAL> (or C<PROCESS>) automatically,
nor is it created in any lexical scope.  Instead, you must assign directly
using the package name to get that to work:

    GLOBAL::<$mynewvar> = $val;

=head2 The magic input handle

The magic command-line input handle is C<$*ARGFILES>.  The arguments
themselves come in C<@*ARGS>.  See also "Declaring a MAIN subroutine" in
S06.

=head2 Magical access to documentation

The Pod documentation in a file is accessible from code in the same file via
variables with a C<=> secondary sigil. C<$=data> is the accessor for your
C<=data> section(s), for instance. All Pod structures are available as a
hierarchical data structure, through C<$=pod>. As with C<*>, the C<=> may
also be used as a package name: C<$=::data>.

=head2 Magical lexically scoped values

Magical lexically scoped values live in variables with a C<?> secondary
sigil.  These are all values that are known to the compiler, and may in fact
be dynamically scoped within the compiler itself, and only appear to be
lexically scoped because dynamic scopes of the compiler resolve to lexical
scopes of the program.  All C<$?> variables are considered constants, and
may not be modified after being compiled in.  The user is also allowed to
define or (redefine) such constants:

    constant $?TABSTOP = 4;     # assume heredoc tabs mean 4 spaces

(Note that the constant declarator always evaluates its initialization
expression at compile time.)

C<$?FILE> and C<$?LINE> are your current file and line number, for instance.
Instead of C<$?OUTER::FOO> you probably want to write C<< OUTER::<$?FOO> >>.
Within code that is being run during the compile, such as C<BEGIN> blocks,
or macro bodies, or constant initializers, the compiler variables must be
referred to as (for instance) C<< COMPILING::<$?LINE> >> if the bare
C<$?LINE> would be taken to be the value during the compilation of the
currently running code rather than the eventual code of the user's
compilation unit.  For instance, within a macro body C<$?LINE> is the line
within the macro body, but C<< COMPILING::<$?LINE> >> is the line where the
macro was invoked.  See below for more about the C<COMPILING> pseudo
package.

Here are some possibilities:

    $?FILE      Which file am I in?
    $?LINE      Which line am I at?
    &?ROUTINE   Which routine am I in?
    &?BLOCK     Which block am I in?
    %?LANG      What is the current set of interwoven languages?

The following return objects that contain all pertinent info:

    $?KERNEL    Which kernel am I compiled for?
    $?DISTRO    Which OS distribution am I compiling under
    $?VM        Which virtual machine am I compiling under
    $?XVM       Which virtual machine am I cross-compiling for
    $?PERL      Which Perl am I compiled for?
    $?SCOPE     Which lexical scope am I in?
    $?PACKAGE   Which package am I in?
    $?MODULE    Which module am I in?
    $?CLASS     Which class am I in? (as variable)
    $?ROLE      Which role am I in? (as variable)
    $?GRAMMAR   Which grammar am I in?
    %?META      The META6.json data associated with the module
    %?RESOURCE  Associated resource files, shortcut for %?META<resource>

It is relatively easy to smartmatch these constant objects against pairs to
check various attributes such as name, version, or authority:

    given $?VM {
        when :name<Parrot> :ver(v2) { ... }
        when :name<CLOS>            { ... }
        when :name<SpiderMonkey>    { ... }
        when :name<JVM> :ver(v6.*)  { ... }
    }

Matches of constant pairs on constant objects may all be resolved at compile
time, so dead code can be eliminated by the optimizer.

Note that some of these things have parallels in the C<*> space at run time:

    $*KERNEL    Which kernel I'm running under
    $*DISTRO    Which OS distribution I'm running under
    $*VM        Which VM I'm running under
    $*PERL      Which Perl I'm running under

You should not assume that these will have the same value as their
compile-time cousins.

=head2 The C<COMPILING> pseudopackage

While C<$?> variables are constant to the run time, the compiler has to have
a way of changing these values at compile time without getting confused
about its own C<$?> variables (which were frozen in when the compile-time
code was itself compiled).  The compiler can talk about these
compiler-dynamic values using the C<COMPILING> pseudopackage.

References to C<COMPILING> variables are automatically hoisted into the
lexical scope currently being compiled.  Setting or temporizing a
C<COMPILING> variable sets or temporizes the incipient C<$?> variable in the
surrounding lexical scope that is being compiled.  If nothing in the context
is being compiled, an exception is thrown.

    BEGIN { COMPILING::<$?FOO> = 42 }
    say $?FOO;                  # prints 42
    {
        say $?FOO;              # prints 42
    }
    {
        BEGIN { temp COMPILING::<$?FOO> = 43 } # temporizes to *compiling* block
        say $?FOO;              # prints 43
    }
    {
        BEGIN {
            COMPILING::<$?FOO> = 44;
            say COMPILING::<$?FOO>; # prints 44, but $?FOO probably undefined
        }
        say $?FOO;              # prints 44
    }
    say $?FOO;                  # prints 42 (left scope of temp above)
    $?FOO = 45;                 # always an error
    COMPILING::<$?FOO> = 45;    # an error unless we are compiling something

Note that C<< CALLER::<$?FOO> >> might discover the same variable as
C<COMPILING::<$?FOO>>, but only if the compiling scope is the immediate
caller.  Likewise C<< OUTER::<$?FOO> >> might or might not get you to the
right place.  In the abstract, C<COMPILING::<$?FOO>> goes outwards
dynamically until it finds a compiling scope, and so is guaranteed to find
the "right" C<$?FOO>.  (In practice, the compiler hopefully keeps track of
its current compiling scope anyway, so no scan is needed.)

Perceptive readers will note that this subsumes various "compiler hints"
proposals.  Crazy readers will wonder whether this means you could set an
initial value for other lexicals in the compiling scope.  The answer is yes.
In fact, this mechanism is probably used by the exporter to bind names into
the importer's namespace.

=head2 Switching parsers

The currently compiling Perl parser is switched by modifying one of the
braided languages in C<< COMPILING::<%?LANG> >>.  Lexically scoped parser
changes should temporize the modification.  Changes from here to
end-of-compilation unit can just assign or bind it.  In general, most parser
changes involve deriving a new grammar and then pointing one of the C<<
COMPILING::<%?LANG> >> entries at that new grammar.  Alternately, the tables
driving the current parser can be modified without derivation, but at least
one level of anonymous derivation must intervene from the preceding Perl
grammar, or you might be messing up someone else's grammar.  Basically, the
current set of grammars in C<%?LANG> has to belong only to the current
compiling scope.  It may not be shared, at least not without explicit
consent of all parties.  No magical syntax at a distance.  Consent of the
governed, and all that.

=head2 Slangs

Individual sublanguages ("slangs") may be referred to using the C<~> twigil.
The following are useful:

    $~MAIN       the current main language (e.g. Perl statements)
    $~Quote      the current root of quoting language
    $~Quasi      the current root of quasiquoting language
    $~Regex      the current root of regex language
    $~Trans      the current root of transliteration language
    $~P5Regex    the current root of the Perl 5 regex language

Hence, when you are defining a normal Perl macro, you're replacing C<$~MAIN>
with a derived language, but when you define a new regex backslash sequence,
you're replacing C<$~Regex> with a derived language.  (There may or may not
be a syntax in the main language to do this.)  Note that such changes are
automatically scoped to the lexical scope; as with real slang, the
definitions are temporary and embedded in a larger language inherited from
the surrounding culture.

Instead of defining macros directly you may also mix in one or more grammar
rules by lexically scoped declaration of a new sublanguage:

    augment slang Regex {  # derive from $~Regex and then modify $~Regex
        token backslash:std<\Y> { YY };
    }

This tends to be more efficient since it only has to do one mixin at the end
of the block.  Note that the slang declaration has nothing to do with
package C<Regex>, but only with C<$~Regex>.  Sublanguages are in their own
namespace (inside the current value of C<%?LANG>, in fact).  Hence
C<augment> is modifying one of the local strands of a braided language, not
a package somewhere else.

You may also supersede a sublang entirely if, for example, you just want to
disable that sublanguage in the current lexical scope:

    supersede slang P5Regex {}
    m:P5/./;             # kaboom

If you supersede C<MAIN> then you're replacing the Perl parser entirely.
This might be done by, say, the "use COBOL" declaration. C<:-)>

=head2 Extended identifiers

It is often convenient to have names that contain arbitrary characters or
other data structures.  Typically these uses involve situations where a set
of entities shares a common "short" name, but still needs for each of its
elements to be identifiable individually.  For example, you might use a
module whose short name is C<ThatModule>, but the complete long name of a
module includes its version, naming authority, and perhaps even its source
language.  Similarly, sets of operators work together in various syntactic
categories with names like C<prefix>, C<infix>, C<postfix>, etc.  The long
names of these operators, however, often contain characters that are
excluded from ordinary identifiers.

For all such uses, an identifier followed by a subscript-like adverbial form
(see below) is considered an I<extended identifier>:

    infix:<+>    # the official name of the operator in $a + $b
    infix:<*>    # the official name of the operator in $a * $b
    infix:«<=»   # the official name of the operator in $a <= $b
    prefix:<+>   # the official name of the operator in +$a
    postfix:<--> # the official name of the operator in $a--

This name is to be thought of semantically, not syntactically.  That is, the
bracketing characters used do not count as part of the name; only the quoted
data matters.  These are all the same name:

    infix:<+>
    infix:<<+>>
    infix:«+»
    infix:['+']

Despite the appearance as a subscripting form, these names are resolved not
at run time but at compile time.  The pseudo-subscripts need not be simple
scalars.  These are extended with the same two-element list:

    circumfix:«<< >>»
    circumfix:['<<','>>']

An identifier may be extended with multiple named identifier extensions, in
which case the names matter but their order does not.  These name the same
module:

    use ThatModule:auth<Somebody>:ver<2.7.18.28.18>
    use ThatModule:ver<2.7.18.28.18>:auth<Somebody>

Adverbial syntax is described in L</Adverbial Pair forms>.

=head1 Literals

Perl 6 has a rich set of literal forms, many of which can be used for
textual input as well.  For those forms simple enough to be allowed, the
C<val()> function treats such a string value as if it were a literal in the
program.  In some cases the C<val()> function will be applied on your
behalf, and in other cases you must do so explicitly.  The rationale for
this function is that there are many cases where the programmer or user is
forced to use a string type to represent a value that is intended to become
a numeric type internally.  Committing pre-emptively to either a string type
or a numeric type is likely to be wrongish, so Perl 6 instead provides the
concept of I<allomorphic> literals.  How these work is described below in
L<Allomorphic value semantics>.

When used as literals in a program, most of these forms produce an exact
type, and are not subject to C<val()> processing.  The exceptions will be
noted as we go.

=head2 Underscores

A single underscore is allowed only between any two digits in a literal
number, where the definition of digit depends on the radix.  (A single
underscore is also allowed between a radix prefix and a following digit, as
explained in the next section.) Underscores are not allowed anywhere else in
any numeric literal, including next to the radix point or exponentiator, or
at the beginning or end.

=head2 Radix markers

Initial C<0> no longer indicates octal numbers by itself.  You must use an
explicit radix marker for that.  Pre-defined radix prefixes include:

    0b          base 2, digits 0..1
    0o          base 8, digits 0..7
    0d          base 10, digits 0..9
    0x          base 16, digits 0..9,a..f (case insensitive)

Each of these allows an optional underscore after the radix prefix but
before the first digit.  These all mean the same thing:

    0xbadcafe
    0xbad_cafe
    0x_bad_cafe

=head2 General radices

The general radix form of a number involves prefixing with the radix in
adverbial form:

    :10<42>             same as 0d42 or 42
    :16<DEAD_BEEF>      same as 0xDEADBEEF
    :8<177777>          same as 0o177777 (65535)
    :2<1.1>             same as 0b1.1 (0d1.5)

Extra digits are assumed to be represented by C<a>..C<z> and C<A>..C<Z>, so
you can go up to base 36.  (Use C<A> and C<B> for base twelve, not C<T> and
C<E>.) Alternatively you can use a list of values, which is convenient for
decimals:

    :60[12,34,56]       # 12 * 3600 + 34 * 60 + 56
    :100[3,'.',14,16]   # pi

All numbers representing digits must be less than the radix, or an error
will result (at compile time if constant-folding can catch it, or at run
time otherwise).

Any radix may include a fractional part.  A dot is never ambiguous because
you have to tell it where the number ends:

    :16<dead_beef.face> # fraction
    :16<dead_beef>.face # method call

=head2 Exponentials

Only base 10 (in any form) allows an additional exponentiator starting with
'e' or 'E'.  All other radixes must either rely on the constant folding
properties of ordinary multiplication and exponentiation, or supply the
equivalent two numbers as part of the string, which will be interpreted as
they would outside the string, that is, as decimal numbers by default:

    :16<dead_beef> * 16**8
    :16<dead_beef*16**8>

It's true that only radixes that define C<e> as a digit are ambiguous that
way, but with any radix it's not clear whether the exponentiator should be
10 or the radix, and this makes it explicit:

    0b1.1e10                    ILLEGAL, could be read as any of:

    :2<1.1> * 2 ** 10           1536
    :2<1.1> * 10 ** 10          15,000,000,000
    :2<1.1> * :2<10> ** :2<10>  6

So we write those as

    :2<1.1*2**10>               1536
    :2<1.1*10**10>              15,000,000,000
    :2«1.1*:2<10>**:2<10>»      6

The generic string-to-number converter will recognize all of these forms
(including the * form, since constant folding is not available to the run
time).  Also allowed in strings are leading plus or minus, and maybe a
trailing Units type for an implied scaling.  Leading and trailing whitespace
is ignored.  Note also that leading C<0> by itself I<never> implies octal in
Perl 6.

In all these cases, the type produced will be the narrowest of C<Int>,
C<Rat>, or C<Num> that can accurately represent the number.  If no type can
represent it exactly, it should be returned as either a C<Rat> or a C<Num>,
whichever is more accurate.  (C<Rat64> will tend to be more accurate for
numbers of normal or large magnitude, while C<Num64> may be more accurate
for numbers of very small magnitude, since the C<Rat>'s size mismatch of
numerator and denominator will eventually cost more accuracy than the Num's
exponent overhead.  As a limiting case, a C<Rat64> cannot represent any
number smaller than C<< :10<1*2**-64> >>.)

A consequence of the preceding is that you cannot make a C<FatRat> using
colon notation.  You must rely on constructors and constant folding:

    FatRat.new(1,2) ** 128
    FatRat.new(1, 2 ** 128)     # same thing

=head2 Conversion functions

Any of the adverbial forms may be used as a function:

    :2($x)      # "bin2num"
    :8($x)      # "oct2num"
    :10($x)     # "dec2num"
    :16($x)     # "hex2num"

Think of these as setting the default radix, not forcing it.  Like Perl 5's
old C<oct()> function, any of these will recognize a number starting with a
different radix marker and switch to the other radix.  However, note that
the C<:16()> converter function will interpret leading C<0b> or C<0d> as hex
digits, not radix switchers.

Use of the functional form on anything that is not a string will throw an
exception explaining that the user has confused a number with the textual
representation of a number.  This is to catch errors such as a C<:8(777)>
that should have been C<< :8<777> >>, or the attempt to use the function in
reverse to produce a textual representation from a number.

=head2 Rational literals

Rational literals are indicated by separating two integer literals (in any
radix) with a slash, and enclosing the whole in angles:

    <1/2>       # one half literal Rat

Whitespace is not allowed on either side of the slash or it will be split
under normal quote-words semantics:

    < 1 / 2 >   # (IntStr('1'), '/', IntStr('2'))
    < 1/2 >     # okay, returns RatStr('1/2') rather than Rat

Because of constant folding, you may often get away with leaving out the
angles:

    1/2         # 1 divided by 2

However, in that case you have to pay attention to precedence and
associativity.  The following does I<not> cube C<2/3>:

    2/3**3      # 2/(3**3), not (2/3)**3

Decimal fractions not using "e" notation are also treated as literal C<Rat>
values:

    6.02e23.WHAT     # Num
    1.23456.WHAT     # Rat
    0.11 == 11/100   # True

Literals specified without spaces in angle brackets are exempt from C<val()>
processing, so C<< <1/2> >> produces a value that is C<Rat>, while C<< < 1/2
> >> produces a value that is both a C<Rat> and a C<Str>.  See L<Allomorphic
value semantics> below.

=head2 Complex literals

Complex literals are similarly indicated by writing an addition or
subtraction of two real numbers (again, without spaces around the operators)
inside angles:

    <5.2+1e42i>
    <-3-1i>

As with rational literals, constant folding would produce the same complex
number, but this form parses as a single term, ignoring surrounding
precedence.

(Note that these are not actually special syntactic forms: both rational and
complex literal forms fall out naturally from the semantic rules of qw
quotes described below.)

Literals specified without spaces in angle brackets are exempt from C<val()>
processing, so C<< <1+2i> >> produces a value that is a C<Complex> while C<<
< 1+2i > >> produces a value that is both a C<Complex> and a C<Str>.  See
L<Allomorphic value semantics> below.

=head2 C<Blob> literals

C<Blob> literals look similar to integer literals with radix markers, but
use curlies instead of angles:

    :2{0010_1110_1000_10}   a blob1, base 2, 1 bit per column
    :4{}                    a blob2, 2 bits per column
    :8{5235 0437 6}         a blob3, 3 bits per column
    :16{A705E}              a blob4, 4 bits per column

Whitespace and underscores are allowed but ignored.

=head2 Radix interpolation

Characters indexed by hex numbers can be interpolated into strings by
introducing with C<"\x">, followed by either a bare hex number (C<"\x263a">)
or a hex number in square brackets (C<"\x[263a]">).  Similarly, C<"\o12">
and C<"\o[12]"> interpolate octals--but generally you should be using hex in
the world of Unicode.  Multiple characters may be specified within any of
the bracketed forms by separating the numbers with comma: C<"\x[41,42,43]">.
You must use the bracketed form to disambiguate if the unbracketed form
would "eat" too many characters, because all of the unbracketed forms eat as
many characters as they think look like digits in the radix specified.  None
of these notations work in normal Perl code.  They work only in
interpolations and regexes and the like.

Note that the inside of the brackets is not an expression, and you may not
interpolate there, since that would be a double interpolation.  Use curlies
to interpolate the values of expressions.

The old C<\123> form is now illegal, as is the C<\0123> form.  Only C<\0>
remains, and then only if the next character is not in the range
C<'0'..'7'>.  Octal characters must use C<\o> notation.  Note also that
backreferences are no longer represented by C<\1> and the like--see S05.

=head2 Angle quotes (quote words)

The C<qw/foo bar/> quote operator now has a bracketed form: C<< <foo bar>
>>.  When used as a subscript it performs a slice equivalent to
C<{'foo','bar'}>.  Elsewhere it is equivalent to a parenthesized list of
strings: C<< ('foo','bar') >>.  Since parentheses are generally reserved
just for precedence grouping, they merely autointerpolate in flat list
context.  Therefore

    @a = 1, < x y >, 2;

is equivalent to:

    @a = 1, ('x', 'y'), 2;

which is the same as:

    @a = 1, 'x', 'y', 2;

In item context, the implied grouping parentheses are still there, so

    $a = < a b >;

is equivalent to:

    $a = ('a', 'b');

which assigns a C<List> to the variable.  On the other hand, if you
backslash the list:

    $a = \<a b>;

it is like:

    $a = \('a', 'b');

and ends up storing a C<Capture> object (which weeds out any named arguments
into a separate structure, in contrast to a C<List>, which keeps
everything in its original list).

Binding is different from assignment.  If bound to a signature, the C<< <a
b> >> list will be promoted to a C<Capture> object, but if bound to a
parameter, it will make the flattening/slicing decision based on the nature
of the individual parameter.  That is, if you pass C<< <a b> >> as an
argument, it will bind as a single item to a positional or slice parameter,
and as two items to a slurpy parameter.

But note that under the parenthesis-rewrite rule, a single value will still
act like a single value.  These are all the same:

    $a = < a >;
    $a = ('a');
    $a = 'a';

Strings within angle brackets are subject to C<val()> processing, and any
component that parses successfully as a numeric literal will become both a
string and a number.  See L<Allomorphic value semantics> below.

=head3 Explicit List construction

As the previous section shows, a list is not automatically constructed by
parens; the list is actually constructed by the comma, not by the parens.
To force a single value to become a composite object in item context, either
add a comma inside parens, or use an appropriate constructor or composer for
clarity as well as correctness:

    $a = (< a >,);
    $a = ('a',);
    $a = List.new('a');
    $a = ['a'];

=head3 Empty List

Even though there is no comma in it, the empty list is represented by
C<()>.  Normally, one could itemize this by prefixing a C<$> (as in C<$()>),
but that translates to the special syntax form C<$( $/.made // Str($/) )>.
Instead, one can C<().item>, or less legibly, introduce a space as in C<$(
)>.

=head3 Disallowed forms

The degenerate case C<< <> >> is disallowed as a probable attempt to do IO
in the style of Perl 5; that is now written C<lines()>.  (C<< <STDIN> >> is
also disallowed.)  Empty lists are better written with C<()> in any case
because C<< <> >> will often be misread as meaning C<('')>.  (Likewise the
subscript form C<< %foo<> >> should be written C<%foo{}> to avoid misreading
as C<@foo{''}>.) If you really want the angle form for stylistic reasons,
you can suppress the error by putting a space inside: C<< < > >>.

=head3 Relationship between <> and «»

Much like the relationship between single quotes and double quotes, single
angles do not interpolate while double angles do.  The double angles may be
written either with French quotes, C<«$foo @bar[]»>, or with ASCII quotes,
C<<< <<$foo @bar[]>> >>>, as the ASCII workaround.  The implicit split is
done after interpolation, but respects quotes in a shell-like fashion, so
that C<«'$foo' "@bar[]"»> is guaranteed to produce a list of two "words"
equivalent to C<< ('$foo', "@bar[]") >>.  C<Pair> notation is also
recognized inside C<«...»> and such "words" are returned as C<Pair> objects.

Colon pairs (but not arrow pairs) are recognized within double angles.  In
addition, the double angles allow for comments beginning with C<#>.  These
comments work exactly like ordinary comments in Perl code.  Unlike in the
shells, any literal C<#> must be quoted, even ones without whitespace in
front of them, but note that this comes more or less for free with a colon
pair like C<< :char<#x263a> >>, since comments only work in double angles,
not single.

=head2 Adverbial Pair forms

There is now a generalized adverbial form of Pair notation, also known
as a "colon pair" form.  The following table shows the correspondence
to the "fatarrow" notation:

    Fat arrow           Adverbial pair  Paren form
    =========           ==============  ==========
    a => True           :a
    a => False          :!a
    a => 0              :a(0)
    a => $x             :a($x)
    a => 'foo'          :a<foo>         :a(<foo>)
    a => <foo bar>      :a<foo bar>     :a(<foo bar>)
    a => «$foo @bar»    :a«$foo @bar»   :a(«$foo @bar»)
    a => {...}          :a{...}         :a({...})
    a => [...]          :a[...]         :a([...])
    a => $a             :$a
    a => @a             :@a
    a => %a             :%a
    a => &a             :&a
    a => %foo<a>        %foo<a>:p

The fatarrow construct may be used only where a term is expected because
it's considered an expression in its own right, since the fatarrow itself is
parsed as a normal infix operator (even when autoquoting an identifier on
its left).  Because the left side is a general expression, the fatarrow form
may be used to create a Pair with I<any> value as the key.  On the other
hand, when used as above to generate C<Pair> objects, the adverbial forms
are restricted to the use of identifiers as keys.  You must use the fatarrow
form to generate a C<Pair> where the key is not an identifier.

Despite that restriction, it's possible for other things to come between a
colon and its brackets; however, all of the possible non-identifier
adverbial keys are reserved for special syntactical forms.  Perl 6 currently
recognizes decimal numbers and the null key.  In the following table the
first and second columns do I<not> mean the same thing:

    Simple pair         DIFFERS from    which means
    ===========         ============    ===========
    2 => <101010>       :2<101010>      radix literal 0b101010
    8 => <123>          :8<123>         radix literal 0o123
    16 => <deadbeef>    :16<deadbeef>   radix literal 0xdeadbeef
    16 => $somevalue    :16($somevalue) radix conversion function
    '' => $x            :($x)           signature literal
    '' => ($x,$y)       :($x,$y)        signature literal
    '' => <x>           :<x>            name extension
    '' => «x»           :«x»            name extension
    '' => [$x,$y]       :[$x,$y]        name extension
    '' => { .say }      :{ .say }       adverbial block (not allowed on names)

All of the adverbial forms (including the normal ones with identifier keys)
are considered special tokens and are recognized in various positions in
addition to term position.  In particular, when used where an infix would be
expected they modify the previous topmost operator that is tighter in
precedence than "loose unary" (see S03):

    1 == 100 :fuzz(3)     # calls: infix:<==>(1, 100, fuzz => 3)

Within declarations the adverbial form is used to rename parameter
declarations:

    sub foo ( :externalname($myname) ) {...}

Adverbs modify the meaning of various quoting forms:

    q:x 'cat /etc/passwd'

When appended to an identifier (that is, in postfix position), the adverbial
syntax is used to generate unique variants of that identifier; this syntax
is used for naming operators such as C<< infix:<+> >> and
multiply-dispatched grammatical rules such as C<statement_control:if>.  When
so used, the adverb is considered an integral part of the name, so C<<
infix:<+> >> and C<< infix:<-> >> are two different operators.  Likewise C<<
prefix:<+> >> is different from C<< infix:<+> >>.  (The notation also has
the benefit of grouping distinct identifiers into easily accessible sets;
this is how the standard Perl 6 grammar knows the current set of infix
operators, for instance.)

Only identifiers that produce a list of one or more values (preferably
strings) are allowed as name extensions; in particular, closures do not
qualify as values, so the C<:{...}> form is not allowed as a name extender.
In particular, this frees up the block form after a method name, so it
allows us to parse a block as a method argument:

    @stuff.sort:{ +$_ }

These might look like it is using pairs, but it is really equivalent to

    @stuff.sort: { +$_ }

So the colons here are not really introducing pairs, but rather introducing
the argument list of the method.  In any other location, C<:{...}> would be
taken in one of two ways, depending on whether the brackets define a closure
or a hash.  If taken as a closure, C<:{...}> creates a pair mapping the null
key to the closure.  If taken as a hash composer, the null key is ignored,
and C<:{...}> creates an object-keyed hash rather than the string-keyed hash
that C<{...}> would without the colon.)

Either fatarrow or adverbial pair notation may be used to pass named
arguments as terms to a function or method.  After a call with parenthesized
arguments, only the adverbial syntax may be used to pass additional
arguments.  This is typically used to pass an extra block:

    find($directory) :{ when not /^\./ }

This just naturally falls out from the preceding rules because the adverbial
block is in operator position, so it modifies the "find operator".  (Parens
aren't considered an operator.)

Note that (as usual) the C<{...}> form (either identifier-based or special)
can indicate either a closure or a hash depending on the contents.  It does
I<not> indicate a subscript, since C<:key{}> is really equivalent to C<< key
=> {} >>, and the braces are not behaving as a postfix at all.  (The
function to which it is passed can I<use> the value as a subscript if it
chooses, however.)

Note also that the C<< <a b> >> form is not a subscript and is therefore
equivalent not to C<.{'a','b'}> but rather to C<('a','b')>.  Bare C<< <a> >>
turns into C<('a')> rather than C<('a',)>.  (However, as with the other
bracketed forms, the value may end up being used as a subscript depending on
context.)

Two or more adverbs can always be strung together without intervening
punctuation anywhere a single adverb is acceptable.  When used as named
arguments in an argument list, you I<may> put comma between, because they're
just ordinary named arguments to the function, and a fatarrow pair would
work the same.  However, this comma is allowed only when the first pair
occurs where a term is expected.  Where an infix operator is expected, the
adverb is always taken as modifying the nearest preceding operator that is
not hidden within parentheses, and if you string together multiple such
pairs, you may not put commas between, since that would cause subsequent
pairs to look like terms.  (The fatarrow form is not allowed at all in
operator position.) See S06 for the use of adverbs as named arguments.

The negated form (C<:!a>) and the sigiled forms (C<:$a>, C<:@a>, C<:%a>)
never take an argument and don't care what the next character is.  They are
considered complete.  These forms require an identifier to serve as the key.
A sigiled form that includes a twigil will not include the twigil in the
key.

For identifiers that take a non-negative integer argument, it is allowed to
abbreviate, for example, C<:sweet(16)> to C<:16sweet>. (This is
distinguishable from the C<< :16<deadbeef> >> form, which never has an
alphabetic character following the number.) Only literal non-negative
integers numbers may be swapped this way. Please note that this abbreviation
allows:

  s:2nd/foo/bar/  # or 3rd, 4th, 5th etc.

The other forms of adverb (including the bare C<:a> form) I<always> look for
an immediate bracketed argument, and will slurp it up.  If that's not
intended, you must use whitespace between the adverb and the opening
bracket.  The syntax of individual adverbs is the same everywhere in Perl 6.
There are no exceptions based on whether an argument is wanted or not.
(There is a minor exception for quote and regex adverbs, which accept
I<only> parentheses as their bracketing operator, and ignore other brackets,
which must be placed in parens if desired.  See "Paren form" in the table
above.)

Except as noted above, the parser always looks for the brackets.  Despite
not indicating a true subscript, the brackets are similarly parsed as
postfix operators.  As postfixes the brackets may be separated from their
initial C<:foo> with either unspace or dot (or both), but nothing else.

Regardless of syntax, adverbs used as named arguments (in either term or
infix position) generally show up as optional named parameters to the
function in question--even if the function is an operator or macro.  The
function in question neither knows nor cares how weird the original syntax
was.

=head2 C<Q> forms

In addition to C<q> and C<qq>, there is now the base form C<Q> which does
I<no> interpolation unless explicitly modified to do so.  So C<q> is really
short for C<Q:q> and C<qq> is short for C<Q:qq>.  In fact, all quote-like
forms derive from C<Q> with adverbs:

    q//         Q :q //
    qq//        Q :qq //
    rx//        Q :regex //
    s///        Q :subst ///
    tr///       Q :trans ///

Adverbs such as C<:regex> change the language to be parsed by switching to a
different parser.  This can completely change the interpretation of any
subsequent adverbs as well as the quoted material itself.

    q:s//       Q :q :scalar //
    rx:s//      Q :regex :scalar //

Just as C<q[...]> has the short form C<'...'>, and C<qq[...]> has the short
form C<"...">, the completely quoted C<Q[...]> has a short form that uses
halfwidth corner brackets: C<｢...｣>.

=head2 Adverbs on quotes

Generalized quotes may now take adverbs:

    Short       Long            Meaning
    =====       ====            =======
    :x          :exec           Execute as command and return results
    :w          :words          Split result on words (no quote protection)
    :ww         :quotewords     Split result on words (with quote protection)
    :v          :val            Evaluate word or words for value literals
    :q          :single         Interpolate \\, \q and \' (or whatever)
    :qq         :double         Interpolate with :s, :a, :h, :f, :c, :b
    :s          :scalar         Interpolate $ vars
    :a          :array          Interpolate @ vars
    :h          :hash           Interpolate % vars
    :f          :function       Interpolate & calls
    :c          :closure        Interpolate {...} expressions
    :b          :backslash      Interpolate \n, \t, etc. (implies :q at least)
    :to         :heredoc        Parse result as heredoc terminator
                :regex          Parse as regex
                :subst          Parse as substitution
                :trans          Parse as transliteration
                :code           Quasiquoting
    :p          :path           Return a Path object (see S16 for more options)

You may omit the first colon by joining an initial C<Q>, C<q>, or C<qq> with
a single short form adverb, which produces forms like:

    qw /a b c/;                         # P5-esque qw// meaning q:w
    Qc '...{$x}...';                    # Q:c//, interpolate only closures
    qqx/$cmd @args[]/                   # equivalent to P5's qx//

(Note that C<qx//> doesn't interpolate.)

If you want to abbreviate further, just define a macro:

    macro qx { 'qq:x ' }          # equivalent to P5's qx//
    macro qTO { 'qq:x:w:to ' }    # qq:x:w:to//
    macro quote:<❰ ❱> ($text) { quasi { {{{$text}}}.quoteharder } }

All the uppercase adverbs are reserved for user-defined quotes.  All Unicode
delimiters above Latin-1 are reserved for user-defined quotes.

A consequence of the above is that we can now say:

    %hash = qw:c/a b c d {@array} {%hash}/;

or

    %hash = qq:w/a b c d {@array} {%hash}/;

to interpolate items into a C<qw>.  Conveniently, arrays and hashes
interpolate with only whitespace separators by default, so the subsequent
split on whitespace still works out.  (But the built-in C<«...»> quoter
automatically does interpolation equivalent to C<qq:ww:v/.../>.  The
built-in C<< <...> >> is equivalent to C<q:w:v/.../>.)

=head2 The C<:val> modifier

The C<:v>/C<:val> modifier runs each word through the C<val()> function,
which will attempt to recognize literals as defined by the current slang.
(See L<Allomorphic value semantics> below.)  Only pure literals such as
numbers, versions, and enums are so recognized; all other words are left as
strings.  In any case, use of such an intuited value as a string will
reproduce the original string including any leading or trailing whitespace:

    say +val(' +2/4 ')   # '0.5'
    say ~val(' +2/4 ')   # ' +2/4 '

Of course, words derived from C<:w> and C<:ww> will not have any whitespace,
since that is what the words are split apart on.

=head2 Whitespace before adverbs

Whitespace is allowed between the "q" and its adverb: C<q :w /.../>.

=head2 Overriding the definitions of quoting keywords

If you define an identifier (either as a term or a subroutine) that happens
to shadow one of the quoting or matching keywords, that keyword is no longer
available for quoting purposes:

    my \q = 42;    say q / 2;  # prints 21
    sub m { 42 };  say m / 2;  # prints 21

Unlike with keyword overrides, it doesn't matter whether there's whitespace
after it; the name will always just be parsed as a term or function call,
unless followed explicitly by a colon.  Generally you can work around such a
definition by using a related form of the same quote, or by adding a useless
modifier (either with or without the colon):

    my \q = 42;   say Q:q /2/;    # prints 2
    my \q = 42;   say qs  /2/;    # prints 2
    my \q = 42;   say q:s /2/;    # prints 2

    sub m { 42 }; say / 2 /;      # matches 2
    sub m { 42 }; say m:1st/ 2 /; # matches 2

=head2 Delimiters of quoting forms

For these "q" forms the choice of delimiters has no influence on the
semantics.  That is, C<''>, C<"">, C<< <> >>, C<«»>, C<``>, C<()>, C<[]>,
and C<{}> have no special significance when used in place of C<//> as
delimiters.  There may be whitespace before the opening delimiter. (Which is
mandatory for parens because C<q()> is a subroutine call and C<q:w(0)> is an
adverb with arguments).  A colon may never be used as the delimiter since it
will always be taken to mean another adverb regardless of what's in front of
it.  Nor may a C<#> character be used as the delimiter since it is always
taken as whitespace (specifically, as a comment).  You may not use
whitespace or alphanumerics for delimiters.

=head2 Quotes from Macros

New quoting constructs may be declared as macros:

    macro quote:<qX> (*%adverbs) {...}

Note: macro adverbs are automatically evaluated at macro call time if the
adverbs are included in the parse.  If an adverb needs to affect the parsing
of the quoted text of the macro, then an explicit named parameter may be
passed on as a parameter to the C<is parsed> subrule, or used to select
which subrule to invoke.

=head2 Interpolating into a single-quoted string

You may interpolate double-quotish text into a single-quoted string using
the C<\qq[...]> construct.  Other "q" forms also work, including
user-defined ones, as long as they start with "q".  Otherwise you'll just
have to embed your construct inside a C<\qq[...]>.

=head2 Interpolation rules

Bare scalar variables always interpolate in double-quotish strings.  Bare
array, hash, and subroutine variables may I<never> be interpolated.
However, any scalar, array, hash or subroutine variable may start an
interpolation if it is followed by a sequence of one or more bracketed
dereferencers: that is, any of:

=over 4

=item 1. An array subscript

=item 2. A hash subscript

=item 3. A set of parentheses indicating a function call

=item 4. Any of 1 through 3 in their B<dot> form

=item 5. A method call that includes argument parentheses

=item 6. A sequence of one or more unparenthesized method call, followed by any of 1 through 5

=back

In other words, this is legal:

    "Val = $a.ord.fmt('%x')\n"

and is equivalent to

    "Val = { $a.ord.fmt('%x') }\n"

However, no interpolated postfix may start with a backslash, so any
backslash or unspace is not recognized, but instead will be assumed to be
part of the string outside of the interpolation, and subject to the normal
backslashing rules of that quote context:

    my $a = 42;
    "Val = $a\[junk\]";  # Val = 42[junk]
    "Val = $a\[junk]";   # Val = 42[junk]
    "Val = $a\ [junk]";  # Val = 42 [junk]
    "Val = $a\.[junk]";  # Val = 42.[junk]


=head3 Arrays

In order to interpolate an entire array, it's necessary now to subscript
with empty brackets:

    print "The answers are @foo[]\n"

Note that this fixes the spurious "C<@>" problem in double-quoted email
addresses.

As with Perl 5 array interpolation, the elements are separated by a space.
(Except that a space is not added if the element already ends in some kind
of whitespace.  In particular, a list of pairs will interpolate with a tab
between the key and value, and a newline after the pair.)

=head3 Hashes

In order to interpolate an entire hash, it's necessary to subscript with
empty braces or angles:

    print "The associations are:\n%bar{}"
    print "The associations are:\n%bar<>"

Note that this avoids the spurious "C<%>" problem in double-quoted printf
formats.

By default, keys and values are separated by tab characters, and pairs are
separated by newlines.  (This is almost never what you want, but if you
want something polished, you can be more specific.)

=head3 Sub calls

In order to interpolate the result of a sub call, it's necessary to include
both the sigil and parentheses:

    print "The results are &baz().\n"

=head3 Method calls

In order to interpolate the result of a method call without arguments, it's
necessary to include parentheses or extend the call with something ending in
brackets:

    print "The attribute is $obj.attr().\n"
    print "The attribute is $obj.attr<Jan>.\n"

The method is called in item context.  (If it returns a list, that list is
interpolated as if it were an array.)

It is allowed to have a cascade of argumentless methods as long as the last
one ends with parens:

    print "The attribute is %obj.keys.sort.reverse().\n"

(The cascade is basically counted as a single method call for the
end-bracket rule.)

=head3 Multiple dereferencers

Multiple dereferencers may be stacked as long as each one ends in some kind
of bracket or is a bare method:

    print "The attribute is @baz[3](1, 2, 3).gethash.{$xyz}<blurfl>.attr().\n"

Note that the final period above is not taken as part of the expression
since it doesn't introduce a bracketed dereferencer.  The parens are not
required on the C<.gethash>, but they are required on the C<.attr()>, since
that terminates the entire interpolation.

In no case may any of the top-level components be separated by whitespace or
unspace.  (These are allowed, though, inside any bracketing constructs, such
as in the C<(1, 2, 3)> above.)

=head3 Closures

A bare closure also interpolates in double-quotish context.  It may not be
followed by any dereferencers, since you can always put them inside the
closure.  The expression inside is evaluated in string item context.  You
can force list context on the expression using the C<list> operator if
necessary.  A closure in a string establishes its own lexical scope.
(Expressions that sneak in without curlies, such as C<$(...)>, do not
establish their own lexical scope, but use the outer scope, and may even
declare variables in the outer scope, since all the code inside (that isn't
in an C<EVAL>) is seen at compile time.)

The following means the same as the previous example.

    print "The attribute is { @baz[3](1,2,3).gethash.{$xyz}<blurfl>.attr }.\n"

The final parens are unnecessary since we're providing "real" code in the
curlies.  If you need to have double quotes that don't interpolate curlies,
you can explicitly remove the capability:

    qq:c(0) "Here are { $two uninterpolated } curlies";

or equivalently:

    qq:!c "Here are { $two uninterpolated } curlies";

Alternately, you can build up capabilities from single quote to tell it
exactly what you I<do> want to interpolate:

    q:s 'Here are { $two uninterpolated } curlies';

=head3 Twigils

Secondary sigils (twigils) have no influence over whether the primary sigil
interpolates.  That is, if C<$a> interpolates, so do C<$^a>, C<$*a>, C<$=a>,
C<$?a>, C<$.a>, etc.  It only depends on the C<$>.

=head3 Other expressions

No other expressions interpolate.  Use curlies.

=head3 Class methods

A class method may not be directly interpolated.  Use curlies:

    print "The dog bark is {Dog.bark}.\n"

=head3 Old disambiguation

The old disambiguation syntax:

    ${foo[$bar]}
    ${foo}[$bar]

is dead.  Use closure curlies instead:

    {$foo[$bar]}
    {$foo}[$bar]

(You may be detecting a trend here...)

=head3 Topical methods

To interpolate a topical method, use curlies: C<"{.bark}">.

=head3 Function calls

To interpolate a function call without a sigil, use curlies: C<"{abs $var}">.

=head3 Backslash sequences

Backslash sequences still interpolate, but there's no longer any C<\v> to
mean I<vertical tab>, whatever that is...  (C<\v> now matches vertical
whitespace in a regex.)  Literal character representations are:

    \a          BELL
    \b          BACKSPACE
    \t          TAB
    \n          LINE FEED
    \f          FORM FEED
    \r          CARRIAGE RETURN
    \e          ESCAPE

=head3 Other functions

There's also no longer any C<\L>, C<\U>, C<\l>, C<\u>, or C<\Q>.  Use
curlies with the appropriate function instead: C<"{tclc $word}">.

=head3 Unicode codepoints

You may interpolate any Unicode codepoint by name using C<\c> and square
brackets:

    "\c[NEGATED DOUBLE VERTICAL BAR DOUBLE RIGHT TURNSTILE]"

Multiple codepoints constituting a single character may be interpolated with
a single C<\c> by separating the names with comma:

    "\c[LATIN CAPITAL LETTER A, COMBINING RING ABOVE]"

Whether that is regarded as one character or two depends on the Unicode
support level of the current lexical scope.  It is also possible to
interpolate multiple codepoints that do not resolve to a single character:

    "\c[LATIN CAPITAL LETTER A, LATIN CAPITAL LETTER B]"

[Note: none of the official Unicode character names contains comma.]

You may also put one or more decimal numbers inside the square brackets:

    "\c[13,10]" # CRLF

Any single decimal number may omit the brackets:

    "\c8" # backspace

(Within a regex you may also use C<\C> to match a character that is not the
specified character.)

If the character following C<\c> or C<\C> is neither a left square bracket
nor a decimal digit, the single following character is turned into a control
character by the usual trick of XORing the 64 bit.  This allows C<\c@> for
NULL and C<\c?> for DELETE, but note that the ESCAPE character may not be
represented that way; it must be represented something like:

    \e
    \c[ESCAPE]
    \c27
    \x1B
    \o33

Obviously C<\e> is preferred when brevity is needed.

=head3 Backslashing

Any character that I<would> start an interpolation in the current quote
context may be protected from such interpolation by prefixing with
backslash.  The backslash is always removed in this case.

The treatment of backslashed characters that would I<not> have introduced an
interpolation varies depending on the type of quote:

=over 4

=item 1.

Any quoting form that includes C<qq> or C<:qq> in its semantic derivation
(including the normal double quote form) assumes that all backslashes are to
be considered meaningful.  The meaning depends on whether the following
character is alphanumeric; if it is, the non-interpolating sequence produces
a compile-time error.  If the character is non-alphanumeric, the backslash
is silently removed, on the assumption that the string was erroneously
backslashed by an overenthusiastic algorithm or programmer.

=item 2.

All other quoting forms (including standard single quotes) assume that
non-interpolating sequences are to be left unaltered because they are
probably intended to pass through to the result.  Backslashes are removed
I<only> for the terminating quote or for characters that would interpolate
if unbackslashed.  (In either case, a special exception is made for
brackets; if the left bracket would interpolate, the right bracket may
optionally also be backslashed, and if so, the backslash will be removed.
If brackets are used as the delimiters, both left and right I<must> be
backslashed the same, since they would otherwise be counted wrong in the
bracket count.)

=back

As a consequence, these all produce the same literal string:

    " \{ this is not a closure } "
    " \{ this is not a closure \} "
    q:c / \{ this is not a closure } /
    q:c / \{ this is not a closure \} /
    q:c { \{ this is not a closure \} }
    q { { this is not a closure } }
    q { \{ this is not a closure \} }

(Of course, matching backslashes is likely to make your syntax highlighter a
bit happier, along with any other naïve bracket counting algorithms...)

=head2 Bare identifiers

There are no barewords in Perl 6.  An undeclared bare identifier will always
be taken to mean a subroutine name, and be parsed as a list operator.
(Class names (and other type names) are predeclared, or prefixed with the
C<::> type sigil when you're declaring a new one.)  A consequence of this is
that there's no longer any "C<use strict 'subs'>".  Since the syntax for
method calls is distinguished from sub calls, it is only unrecognized sub
calls that must be treated specially.

You still must declare your subroutines, but a bareword with an unrecognized
name is provisionally compiled as a subroutine call, on the assumption that
such a declaration will occur by the end of the current compilation unit:

    foo;         # provisional call if neither &foo nor ::foo is defined so far
    foo();       # provisional call if &foo is not defined so far
    foo($x);     # provisional call if &foo is not defined so far
    foo($x, $y); # provisional call if &foo is not defined so far

    $x.foo;      # not a provisional call; it's a method call on $x
    foo $x:;     # not a provisional call; it's a method call on $x
    foo $x: $y;  # not a provisional call; it's a method call on $x

If a postdeclaration is not seen, the compile fails at C<CHECK> time, that
is, at the end of compilation for this compilation unit.  (You are still
free to predeclare subroutines explicitly, of course.) The postdeclaration
may be in any lexical or package scope that could have made the declaration
visible to the provisional call had the declaration occurred before rather
than after the provisional call.

This fixup is done only for provisional calls.  If there is I<any> real
predeclaration visible, it always takes precedence.

If the unrecognized subroutine name is followed by C<< postcircumfix:<( )>
>>, it is compiled as a provisional function call of the parenthesized form.
If it is not, it is compiled as a provisional function call of the list
operator form, which may or may not have an argument list.  When in doubt,
the attempt is made to parse an argument list.  As with any list operator,
an immediate postfix operator is illegal unless it is a form of parentheses,
whereas anything following whitespace will be interpreted as an argument
list if possible.

Some examples of how listops, methods and labels interact syntactically:

    foo.bar             # foo().bar
    foo .bar            # foo($_.bar)   -- no postfix starts with whitespace
    foo\ .bar           # foo().bar
    foo++               # foo()++
    foo 1,2,3           # foo(1,2,3)    -- args always expected after listop
    foo + 1             # foo(+1)       -- term always expected after listop
    foo;                # foo();        -- no postfix, but no args either
    foo:                #   label       -- must be label at statement boundary.
                                        -- ILLEGAL otherwise
    foo: bar:           #   two labels in a row, okay
    .foo: 1             # $_.foo: 1     -- must be "dot" method with : args
    .foo(1)             # $_.foo(1)     -- must be "dot" method with () args
    .foo                # $_.foo()      -- must be "dot" method with no args
    .$foo: 1            # $_.$foo: 1    -- indirect "dot" method with : args
    foo bar: 1          # bar.foo(1)    -- bar must be predecl as class
                                        -- sub bar allowed here only if 0-ary
                                        -- otherwise you must say (bar):
    foo bar 1           # foo(bar(1))   -- both subject to postdeclaration
                                        -- never taken as indirect object
    foo $bar: 1         # $bar.foo(1)   -- indirect object even if declared sub
                                        -- $bar considered one token
    foo (bar()): 1      # bar().foo(1)  -- even if foo declared sub
    foo bar():          # ILLEGAL       -- bar() is two tokens.
    foo .bar:           # foo(.bar:)    -- colon chooses .bar to listopify
    foo bar baz: 1      # foo(baz.bar(1)) -- colon controls "bar", not foo.
    foo (bar baz): 1    # bar(baz()).foo(1) -- colon controls "foo"
    $foo $bar           # ILLEGAL       -- two terms in a row
    $foo $bar:          # ILLEGAL       -- use $bar."$foo"() for indirection
    (foo bar) baz: 1    # ILLEGAL       -- use $baz.$(foo bar) for indirection

The indirect object colon only ever dominates a simple term, where "simple"
includes classes and variables and parenthesized expressions, but explicitly
not method calls, because the colon will bind to a trailing method call in
preference.  An indirect object that parses as more than one token must be
placed in parentheses, followed by the colon.

In short, only an identifier followed by a simple term followed by a postfix
colon is I<ever> parsed as an indirect object, but that form will I<always>
be parsed as an indirect object regardless of whether the identifier is
otherwise declared.

=head2 Dereferences

There's also no "C<use strict 'refs'>" because symbolic dereferences are now
syntactically distinguished from hard dereferences.  C<@($array)> must now
provide an actual array object, while C<@::($string)> is explicitly a
symbolic reference.  (Yes, this may give fits to the P5-to-P6 translator,
but I think it's worth it to separate the concepts.  Perhaps the symbolic
ref form will admit real objects in a pinch.)

=head2 Hash subscripts and bare keys

There is no hash subscript autoquoting in Perl 6.  Use C<< %x<foo> >> for
constant hash subscripts, or the old standby C<< %x{'foo'} >>.  (It also
works to say C<%x«foo»> as long as you realized it's subject to
interpolation.)

But C<< => >> still autoquotes any bare identifier to its immediate left
(horizontal whitespace allowed but not comments).  The identifier is not
subject to keyword or even macro interpretation.  If you say

    $x = do {
        call_something();
        if => 1;
    }

then C<$x> ends up containing the pair C<< ("if" => 1) >>.  Always.  (Unlike
in Perl 5, where version numbers didn't autoquote.)

You can also use the C<:key($value)> form to quote the keys of option pairs.
To align values of option pairs, you may use the "unspace" postfix forms:

    :longkey\  ($value)
    :shortkey\ <string>
    :fookey\   { $^a <=> $^b }

These will be interpreted as

    :longkey($value)
    :shortkey<string>
    :fookey{ $^a <=> $^b }

=head2 Double-underscore forms

The double-underscore forms are going away:

    Old                 New
    ---                 ---
    __LINE__            $?LINE
    __FILE__            $?FILE
    __PACKAGE__         $?PACKAGE
    __SUB__             &?ROUTINE
    __END__             =begin finish
    __DATA__            =begin data

The C<=begin finish> Pod stream (usually written as just C<=finish>) is
special in that it assumes there's no corresponding C<=end finish> before
end of file. Anything in a source file after a C<=finish> is always treated
as Pod.

There is no longer any special C<DATA> stream--any Pod block in the current
file can be accessed via a Pod object, such as C<< $=data >> or C<<
$=SYNOPSIS >> or C<< $=UserBlock >> etc. That is: a variable with the same
name of the desired block, and a C<=> twigil.

These Pod objects can be used as C<Positional>s (indexed by their block
sequence). They can also be treated as C<Associative>s (indexed by C<:key>
options specified with the block). Either way, each C<Positional> or
C<Associative> element represents the entire contents of the corresponding
Pod block. You have to split those contents into lines yourself. Each chunk
has a C<.range> property that indicates its line number range within the
source file.

[Speculative] It may also be possible to treat a Pod object as an
IO::Handle, to read the Pod information line-by-line (like the C<DATA>
filehandle in Perl 5, but for I<any> Pod block).

The lexical routine itself is C<&?ROUTINE>; you can get its name with
C<&?ROUTINE.name>.  The current block is C<&?BLOCK>.  If the block has any
labels, those shows up in C<&?BLOCK.labels>.  Within the lexical scope of a
statement with a label, the label is a pseudo-object representing the
I<dynamically> visible instance of that statement.  (If inside multiple
dynamic instances of that statement, the label represents the innermost
one.) This is known as I<lexotic> semantics.

When you say:

    next LINE;

it is really a method on this pseudo-object, and

    LINE.next;

would work just as well.  You can exit any labeled block early by saying

    MyLabel.leave(@results);

=head2 Heredocs

Heredocs are no longer written with C<<< << >>>, but with an adverb on any
other quote construct:

    print qq:to/END/;
        Give $amount to the man behind curtain number $curtain.
        END

Other adverbs are also allowed, as are multiple heredocs within the same
expression:

    print q:c:to/END/, q:to/END/;
        Give $100 to the man behind curtain number {$curtain}.
        END
        Here is a $non-interpolated string
        END

=head3 Optional whitespace

Heredocs allow optional whitespace both before and after terminating
delimiter.  Leading whitespace equivalent to the indentation of the
delimiter will be removed from all preceding lines.  If a line is deemed to
have less whitespace than the terminator, only whitespace is removed, and a
warning may be issued.  (Hard tabs will be assumed to align to the next
multiple of C<< ($?TABSTOP // 8) >> spaces, but as long as tabs and spaces
are used consistently that doesn't matter.)  A null terminating delimiter
terminates on the next line consisting only of whitespace, but such a
terminator will be assumed to have no indentation.  (That is, it's assumed
to match at the beginning of any whitespace.)

=head3 One-pass heredoc parsing

There are two possible ways to parse heredocs.  One is to look ahead for the
newline and grab the lines corresponding to the heredoc, and then parse the
rest of the original line.  This is how Perl 5 does it.  Unfortunately this
suffers from the problem pervasive in Perl 5 of multi-pass parsing, which is
masked somewhat because there's no way to hide a newline in Perl 5.  In
Perl 6, however, we can use "unspace" to hide a newline, which means that an
algorithm looking ahead to find the newline must do a full parse (with
possible untoward side effects) in order to locate the newline.

Instead, Perl 6 takes the one-pass approach, and just lazily queues up the
heredocs it finds in a line, and waits until it sees a "real" newline to
look for the text and attach it to the appropriate heredoc.  The downside of
this approach is a slight restriction--you may not use the actual text of
the heredoc in code that must run before the line finishes parsing.  Mostly
that just means you can't write:

    BEGIN { say q:to/END/ }; morestuff();
        Say me!
        END

You must instead put the entire heredoc into the C<BEGIN>:

    BEGIN {
        say q:to/END/;
        Say me!
        END
    }; morestuff();

The parser is, however, smart enough to recognize that it's already at
the end of a line if you don't put C<morestuff()> there.  Hence this works:

    BEGIN { say q:to/END/ }
        Say me!
        END

=head2 Version literals

A version literal is written with a 'v' followed by the version number in
dotted form.  This always constructs a C<Version> object, not a string.
Only integers and certain wildcards are allowed; for anything fancier you
must coerce a string to a C<Version>:

    v1.2.3                      # okay
    v1.2.*                      # okay, wildcard version
    v1.2.3+                     # okay, wildcard version
    v1.2.3beta                  # illegal
    Version('1.2.3beta')        # okay

Note though that most places that take a version number in Perl accept it as
a named argument, in which case saying C<< :ver<1.2.3beta> >> is fine.  See
S11 for more on using versioned modules.

Version objects have a predefined sort order that follows most people's
intuition about versioning: each sorting position sorts numerically between
numbers, alphabetically between alphas, and alphabetics in a position before
numerics.  Missing final positions are assumed to be '.0'.  Except for '0'
itself, numbers ignore leading zeros.  For splitting into sort positions, if
any alphabetics (including underscore) are immediately adjacent to a number,
a dot is assumed between them.  Likewise any non-alphanumeric character is
assumed to be equivalent to a dot.  So these are all equivalent:

    1.2.1alpha1.0
    1.2.1alpha1
    1.2.1.alpha1
    1.2.1alpha.1
    1.2.1.alpha.1
    1.2-1+alpha/1

And these are also equivalent:

    1.2.1_01
    1.2.1_1
    1.2.1._1
    1.2.1_1
    1.2.1._.1
    001.0002.0000000001._.00000000001
    1.2.1._.1.0.0.0.0.0

So these are in sorted version order:

    1.2.0.999
    1.2.1_01
    1.2.1_2
    1.2.1_003
    1.2.1a1
    1.2.1.alpha1
    1.2.1b1
    1.2.1.beta1
    1.2.1.gamma
    1.2.1α1
    1.2.1β1
    1.2.1γ
    1.2.1

Note how the last pair assume that an implicit .0 sorts after anything
alphabetic, and that alphabetic is defined according to Unicode, not just
according to ASCII.  The intent of all this is to make sure that prereleases
sort before releases.  Note also that this is still a subset of the
versioning schemes seen in the real world.  Modules with such strange
versions can still be used by Perl since by default Perl imports external
modules by exact version number.  (See S11.)  Only range operations will be
compromised by an unknown foreign collation order, such as a system that
sorts "delta" before "gamma".

=head2 Allomorphic value semantics

When C<val()> processing is attempted on any list of strings (typically on
the individual words within angle brackets), the function attempts to
determine if the intent of the programmer or user might have been to provide
a numeric value.

For any item in the list that appears to be numeric, the literal is stored
as an object with both a string and a numeric nature, where the string
nature always returns the original string.  This is implemented via multiple
inheritance, to truly represent the allomorphic nature of a literal value
that has not committed to which type the user intends.  The numeric type
chosen depends on the appearance of the literal.  Hence:

    < 1 1/2 6.02e23 1+2i >

produces objects of classes defined as:

    class IntStr is Int is Str {...}; IntStr('1')
    class RatStr is Rat is Str {...}; RatStr('1/2')
    class NumStr is Num is Str {...}; NumStr('6.02e23')
    class ComplexStr is Complex is Str {...}; ComplexStr('1+2i')

One purpose of this is to facilitate compile-time analysis of multi-method
dispatch, when the user prefers angle notation as the most readable way to
represent a list of numbers, which it often is.  Due to the MI semantics,
the new object is equally a string and a number, and can be bound as-is to
either a string or a numeric parameter.

In case multiple dispatch determines that it could dispatch as either string
or number, a tie results, which may result in an ambiguous dispatch error.
You'll need to use prefix C<+> or C<~> on the argument to resolve the
ambiguity in that case.

[Conjecture: we may someday find a way to make strings bind a little looser
than the numeric types, but for now we conservatively outlaw the dispatch as
ambiguous, and watch how this plays out in use.]

The allomorphic behavior of angle brackets is not a special case; it's
actually an example of a more general process of figuring out type
information by parsing text that comes from any situation where the user is
forced to enter text when they really mean other kinds of values.  A
function prompting the user for a single value might usefully pass the
result through C<val()> to intuit the proper type.

The angle form with a single value serves as the literal form of numbers
such as C<Rat> and C<Complex> that would otherwise have to be constructed
via constant folding.  It also gives us a reasonable way of visually
isolating any known literal format as a single syntactic unit:

    <-1+2i>.polar
    (-1+2i).polar       # same, but only by constant folding

Any such literal, when written without spaces, produces a pure numeric value
without a stringy allomorphism.  Put spaces to override that:

    <1/2>       # a Rat
    < 1/2 >     # a RatStr

Or use the C<«»> form of quotewords, which is always allomorphic:

    «1/2»       # a RatStr
    « 1/2 »     # a RatStr

=head3 Allomorphic Rats

Any rational literal that would overflow a C<Rat64> in the denominator is
also stored as a string.  (That is, angle brackets will be assumed in this
case, producing a C<RatStr>.) If a coercion to a wider type, such as
C<FatRat>, is requested, the literal reconverts from the entire original
string, rather than just the value that would fit into a C<Rat64>.  (It may
then cache that converted value for next time, of course.) So if you declare
a constant with excess precision, it does not automatically become a
C<FatRat>, which would force all calculations into the pessimal C<FatRat>
type.

    constant pi is export = 3.14159_26535_89793_23846_26433_83279_50288;
    say pi.norm.nude # 1570796326794896619 500000000000000000 (as Rat, reduced)
    say pi.perl;     # 3.14159_26535_89793_23846_26433_83279_50288
    say pi.Num       # 3.14159265358979 (approximately)
    say pi.Str;      # 3.14159_26535_89793_23846_26433_83279_50288
    say pi.FatRat;   # 3.14159265358979323846264338327950288

=head1 Context

=over 4

=item *

Perl still has the three main contexts: sink (aka void), item (aka scalar),
and list.

=item *

In addition to undifferentiated items, we also have these item contexts:

    Context     Type    OOtype   Operator
    -------     ----    ------   --------
    boolean     bit     Bit      ?
    integer     int     Integral int
    numeric     num     Num      +
    string      buf     Str      ~

There are also various container contexts that require particular kinds of
containers (such as slice and hash context; see S03 for details).

=item *

Unlike in Perl 5, objects are no longer always considered true.  It depends
on the state of their C<.Bool> method, which may either be a synthetic
attribute or an explicitly represented bit in the object.  Classes get to
decide which of their values are true and which are false. In general,
most classes choose a single distinguished value to be false but defined,
such as 0 for the various numeric types, or the empty string for string types.
Individual objects can override the class definition:

    return 0 but True;

This overrides the C<.Bool> method of the C<0> without changing its official
type (by mixing the method into an anonymous derived type).

=item *

The definition of C<.Bool> for the most ancestral type (that is, the C<Mu>
type) is equivalent to C<.defined>.  Since type objects are considered
undefined, all type objects (including C<Mu> itself) are false unless the
type overrides the definition of C<.Bool> to include undefined values.
Instantiated objects default to true unless the class overrides the
definition.  Note that if you could instantiate a C<Mu> it would be
considered defined, and thus true.  (It is not clear that this is allowed,
however.)

=item *

In general any container types should return false if they are empty, and
true otherwise.  This is true of all the standard container types except
Scalar, which always defers the definition of truth to its contents.
Non-container types define truthiness much as Perl 5 does, except
that the string C<"0"> is now considered true.  Coerce to numeric with
C<<prefix:<+> >> if you want the other semantics.

Just as with the standard types, user-defined types should feel free to
partition their defined values into true and false values if such a
partition makes sense in control flow using boolean contexts.

=back

=head1 Lists

=head2 Lazy flattening

List context in Perl 6 is by default lazy.  This means a list can contain
infinite generators without blowing up.  No flattening happens to a lazy
list until it is bound to the signature of a function or method at call time
(and maybe not even then).  We say that such an argument list is "lazily
flattened", meaning that we promise to flatten the list on demand, but not
before.

=head2 C<list>, C<flat>, C<item>, and C<.tree>

There is a "C<list>" operator which imposes a list context on its arguments
even if C<list> itself occurs in item context.

To force explicit flattening, use the C<flat> contextualizer.  This
recursively flattens all lists into a 1-dimensional list.  When bound to a
slurpy parameter, a capture flattens the rest of its positional arguments.

To reform a list so that sub-lists turn into tree nodes, use the C<.tree>
method, which is essentially a level-sensitive C<map>, with one closure
provided for remapping the lists at each level:

    $p.tree                # recursively set all levels to item
    $p.tree(*)             # same thing
    $p.tree(*.item)        # force level 1 lists to item
    $p.tree(1)             # same thing
    $p.tree(*.item,*.list) # force level 1 lists to item, level 2 to list
    $p.tree(*.Array,*)     # Turn all sublists into item recursively

When bound to a slice parameter (indicated with C<**>), a capture reforms
the rest of its positional arguments with one level of "treeness",
equivalent to C<@args.tree(1)>, that is, a list of lists, or C<LoL>.  The
sublists are not automatically flattened; that is, if a sublist is a
C<List>, it remains a list until subsequent processing decides how flat or
treelike the sublist should be.

To force a non-flattening item context, use the "C<item>" operator.

=head2 Forcing capture context

The C<|> prefix operator may be used to force "capture" context on its
argument and I<also> defeat any scalar argument checking imposed by
subroutine signature declarations.  Any resulting list arguments are then
evaluated lazily.

=head2 The C<eager> operator

To force non-lazy list processing, use the C<eager> list operator.  List
assignment is also implicitly eager. (Actually, when we say "eager" we
usually mean "mostly eager" as defined in L<S07>).

    eager $filehandle.lines;    # read all remaining lines

By contrast,

    $filehandle.lines;

makes no guarantee about how many lines ahead the iterator has read.
Iterators feeding a list are allowed to process in batches, even when stored
within an array.  The array knows that it is extensible, and calls the
iterator as it needs more elements.  (Counting the elements in the array
will also force eager completion.)

This operator is agnostic towards flattening or slicing.  It merely changes
the work-ahead policy for the value generator.

=head2 The C<hyper> operator

A variant of C<eager> is the C<hyper> list operator, which declares not only
that you want all the values generated now, but that you want them badly
enough that you don't care what order they're generated in, as long as the
results come back in the right order.  That is, C<eager> requires sequential
evaluation of the list, while C<hyper> requests (but does not require)
parallel evaluation.  In any case, it declares that you don't care about the
evaluation order, only the result order.

This operator is agnostic towards flattening or slicing.  It merely changes
the work-ahead policy for the value generator.

=head2 The C<race> operator

A further variant of C<hyper> is the C<race> list operator, which declares
that you want the results so badly that you don't even care what order they
come back in.  Within its arguments, the C<race> operator forces parallel
evaluation of any iterator, hyper, or junction, such that if any single
thread dies or hangs its computation, it does not block any other thread
from returning its results to the race list.  When the demand for the race
list drops, hung threads may be killed.  You can think of it as a C<gather>
with a 'C<try take start {...}>' on parallel computation.  Note that
exceptions are trapped by default; if your car crashes, you simply do not
finish the race.  If you want notifications of some sort back to the pit
crew, you'll have to arrange them yourself.

This operator is agnostic towards flattening or slicing.  It merely changes
the work-ahead policy for the value generator.  It is a transitive
contextualizer insofar as iterators will have to pass on the policy to
subiterators.

=head2 The C<< => >> operator

The C<< => >> operator now constructs C<Pair> objects rather than merely
functioning as a comma.  Both sides are in item context.

=head2 The C<< .. >> operator
X<..>

The C<< .. >> operator now constructs a C<Range> object rather than merely
functioning as an operator.  Both sides are in item context.  Semantically,
the C<Range> acts like a list of its values to the extent possible, but does
so lazily, unlike Perl 5's eager range operator.

=head2 Hash assignment

There is no such thing as a hash list context.  Assignment to a hash
produces an ordinary list context.  You may assign alternating keys and
values just as in Perl 5.  You may also assign lists of C<Pair> objects, in
which case each pair provides a key and a value.  You may, in fact, mix the
two forms, as long as the pairs come when a key is expected.  If you wish to
supply a C<Pair> as a key, you must compose an outer C<Pair> in which the
key is the inner C<Pair>:

    %hash = (($keykey => $keyval) => $value);

=head2 The anonymous C<enum> function

The anonymous C<enum> function takes a list of keys or pairs, and adds
values to any keys that are not already part of a pair.  The value added is
one more than the previous key or pair's value.  This works nicely with the
new C<qq:ww> form:

    %hash = enum <<:Mon(1) Tue Wed Thu Fri Sat Sun>>;
    %hash = enum « :Mon(1) Tue Wed Thu Fri Sat Sun »;

are the same as:

    %hash = ();
    %hash<Mon Tue Wed Thu Fri Sat Sun> = 1..7;

=head2 Hash binding

In contrast to assignment, binding to a hash requires a C<Hash> (or C<Pair>)
object, or anything that does the C<Associative> role.

=head1 Files

=over 4

=item *

Filename globs are no longer done with angle brackets.  Use the C<glob>
function.

=item *

Input from a filehandle is no longer done with angle brackets.  Instead of

    while (<HANDLE>) {...}

you now write

    for $handle.lines {...}

=back

=head1 Grammatical Categories

Lexing in Perl 6 is controlled by a system of grammatical categories.  At
each point in the parse, the lexer knows which subset of the grammatical
categories are possible at that point, and follows the longest-token rule
across all the active alternatives, including those representing any
grammatical categories that are ready to match.  See L<S05> for a detailed
description of this process.

To get a list of the current categories, grep 'token category:' from
STD.pm6.

Category names are used as the short name of both various operators and the
rules that parse them, though the latter include an extra "sym":

    infix:<cmp>           # the infix cmp operator
    infix:sym<cmp>        # the rule that parses cmp

As you can see, the extension of the name uses colon pair notation.  The
C<:sym> typically takes an argument giving the string name of the operator;
some of the "circumfix" categories require two arguments for the opening and
closing strings.  Since there are so many match rules whose symbol is an
identifier, we allow a shorthand:

    infix:cmp             # same as infix:sym<cmp> (not infix:<cmp>)

Conjecturally, we might also have other kinds of rules, such as tree rewrite
rules:

    infix:match<cmp>      # rewrite a match node after reducing its arguments
    infix:ast<cmp>        # rewrite an ast node after reducing its arguments

Within a grammar, matching the proto subrule C<< <infix> >> will match all
visible rules in the infix category as parallel alternatives, as if they
were separated by 'C<|>'.

Here are some of the names of parse rules in STD:

    category:sym<prefix>                           prefix:<+>
    circumfix:sym<[ ]>                             [ @x ]
    dotty:sym<.=>                                  $obj.=method
    infix_circumfix_meta_operator:sym['»','«']     @a »+« @b
    infix_postfix_meta_operator:sym<=>             $x += 2;
    infix_prefix_meta_operator:sym<!>              $x !~~ 2;
    infix:sym<+>                                   $x + $y
    package_declarator:sym<role>                   role Foo;
    postcircumfix:sym<[ ]>                         $x[$y] or $x.[$y]
    postfix_prefix_meta_operator:sym('»')          @array »++
    postfix:sym<++>                                $x++
    prefix_circumfix_meta_operator:sym<[ ]>       [*]
    prefix_postfix_meta_operator:sym('«')          -« @magnitudes
    prefix:sym<!>                                  !$x (and $x.'!')
    quote:sym<qq>                                  qq/foo/
    routine_declarator:sym<sub>                    sub foo {...}
    scope_declarator:sym<has>                      has $.x;
    sigil:sym<%>                                   %hash
    special_variable:sym<$!>                       $!
    statement_control:sym<if>                      if $condition { 1 } else { 2 }
    statement_mod_cond:sym<if>                     .say if $condition
    statement_mod_loop:sym<for>                    .say for 1..10
    statement_prefix:sym<gather>                   gather for @foo { .take }
    term:sym<!!!>                                  $x = { !!! }
    trait_mod:sym<does>                            my $x does Freezable
    twigil:sym<?>                                  $?LINE
    type_declarator:sym<subset>                    subset Nybble of Int where ^16

Note that some of these produce correspondingly named operators, but not all
of them.  When they do correspond (such as in the C<cmp> example above),
this is by convention, not by enforcement.  (However, matching C<< <sym> >>
within one of these rules instead of the literal operator makes it easier to
set up this correspondence in subsequent processing.)

The STD::Regex grammar also adds these:

    assertion:sym<!>                         /<!before \h>/
    backslash:sym<w>                         /\w/ and /\W/
    metachar:sym<.>                          /.*/
    mod_internal:sym<P5>                     m:/ ... :P5 ... /
    quantifier:sym<*>                        /.*/

=head1 Deprecations

A language that doesn't evolve, is a dead language.  Constructs that seem
like a good idea now, may turn out not to be such a good idea in the future.
Such constructs will thus need to be deprecated.  To mark a construct as
being deprecated, one can add the "is DEPRECATED($alternative)" trait to a
class, an attribute, or a sub / method.  During execution, this will cause
the caller sites to be recorded without any warnings.  When execution
finishes, a report should be printed to STDERR stating which deprecated
features where called where.

=head1 AUTHORS

    Larry Wall <larry@wall.org>

=for vim:set expandtab sw=4: