Skip to content

Commit

Permalink
Merge branch 'master' of git@github.com:rakudo/rakudo
Browse files Browse the repository at this point in the history
  • Loading branch information
chromatic committed Feb 15, 2010
2 parents 100bb16 + c65c28b commit cfe8cb3
Show file tree
Hide file tree
Showing 7 changed files with 138 additions and 114 deletions.
2 changes: 1 addition & 1 deletion build/PARROT_REVISION
@@ -1 +1 @@
43953
43956
210 changes: 106 additions & 104 deletions docs/compiler_overview.pod
Expand Up @@ -8,8 +8,9 @@ This document describes the architecture and operation of the Rakudo
Perl 6 (or simply Rakudo) compiler. The F<README> describes how to
build and run Rakudo.

Rakudo has six main parts (source code paths are relative to Rakudo's
F<src/> directory):
Rakudo has six main parts summarized below. Source code paths are
relative to Rakudo's F<src/> directory, and platform specific filename
extensions such as F<.exe> are sometimes omitted for brevity.

=over 4

Expand Down Expand Up @@ -42,11 +43,15 @@ F<core/*.pm>, F<glue/*.pir>, F<metamodel/*>)

=back

The F<Makefile> (generated from F<build/Makefile.in> by F<Configure.pl>)
compiles all the parts to form the F<perl6.pbc> executable and the
F<perl6> or F<perl6.exe> "fake executable". We call it fake because it
has only a small stub of code to launch the Parrot executable and pass
itself as a chunk of bytecode for Parrot to execute.
The F<Makefile> (generated from F<build/Makefile.in> by
F<../Configure.pl>) compiles all the parts to form the F<perl6.pbc>
executable and the F<perl6> or F<perl6.exe> "fake executable". We call
it fake because it has only a small stub of code to launch the Parrot
executable, and passes itself as a chunk of bytecode for Parrot to
execute. The source code of the "fakecutable" is generated as
F<perl6.c> with the stub at the very end. The entire contents of
F<perl6.pbc> are represented as escaped octal characters in one huge
string called C<program_code>. What a hack!

=head2 1. NQP[-RX]

Expand All @@ -55,17 +60,32 @@ Perl 6, the remainder in Parrot Intermediate Representation (PIR) or C.
Not Quite Perl (nqp) provides the bootstrap step of compiling compiler
code (yes!) written in a subset of Perl 6, into PIR.

The latest version is called B<nqp-rx> because it now also includes a
powerful Perl 6 regex engine. This has produced a streamlined compiler
framework on which to build a very functional Perl 6 implementation.
The latest version of NQP is called B<nqp-rx> because it now also
includes a powerful Perl 6 regex engine. This gives a streamlined
compiler framework on which to build a very functional Perl 6
implementation.

NQP itself is also written in PIR, is an important part of the Parrot
Compiler Toolkit (PCT), and is installed with Parrot. PCT is a standard
framework to make and use Parrot based languages. The source code of
NQP is in F<../parrot/ext/nqp-rx/> and the resulting (compiler-)
compiler is F<../parrot_install/bin/parrot-nqp>. Note, NQP only
I<builds> the Rakudo compiler, and does not compile or run user
programs.
NQP is in F<../parrot/ext/nqp-rx/> and the resulting compiler is
F<../parrot_install/bin/parrot-nqp>. Note, NQP only I<builds> the
Rakudo compiler, and does not compile or run user programs.

=head3 Stages

NQP[-RX] compiles us a very good compiler in F<gen/perl6.pbc>, referred
to as "stage-1", or C<S1_PERL6_PBC> in the F<Makefile>. This version
would be limited in production though, because libraries of classes and
methods available at run time (for example Complex) have not yet been
added.

The "stage-1" compiler (note: not NQP) compiles all Rakudo's Perl 6 code
again, this time including all the library modules (F<gen/core.pm>), to
make F<perl6.pbc> (note: not in F<gen/>). That F<gen/core.pm> file is
generated by F<build/gen_core_pm.pl> from a list called C<CORE_SOURCES>
in F<Makefile>. Thanks to the staging process, a large and growing
proportion of Rakudo's source code is written in Perl 6.

We can conceivably use the Rakudo compiler to compile itself to PIR and
eliminate the need for NQP entirely. At some point as Rakudo matures we
Expand Down Expand Up @@ -140,13 +160,79 @@ the user program.

=head2 3. Actions

The C<Perl6/Actions.pm> file defines what the compiler must output when
it matches certain tokens or rules. The output is a tree hierarchy of
objects representing language syntax elements, such as a statement.
The tree is called a Parrot Abstract Syntax Tree (PAST).
The F<Perl6/Actions.pm> file defines the code that the compiler
generates when it matches each token or rule. The output is a tree
hierarchy of objects representing language syntax elements, such as a
statement. The tree is called a Parrot Abstract Syntax Tree (PAST).

The C<Perl6::Actions> class inherits from C<HLL::Actions>, another part
of the Parrot Compiler Toolkit. The source is in
F<../parrot/ext/nqp-rx/stage0/src/HLL-s0.pir>, look for several
instances of C<.namespace ["HLL";"Actions"]>.

When the PCT calls the C<'parse'> method on a grammar, it passes not
only the program source code, but also a pointer to a parseactions class
such as our compiled C<Perl6::Actions>. Then, each time the parser
matches a named regex in the grammar, it automatically calls the same
named method in the actions class.

For example, here's the parse rule for Rakudo's C<unless> statement
(in F<Perl6/Grammar.pm>):

token statement_control:sym<unless> {
<sym> :s
<xblock>
[ <!before 'else'> ||
<.panic: 'unless does not take "else", please rewrite using "if"'>
]
}

This token says that an C<unless> statement consists of the word
"unless" (captured into C<< $<sym> >>), and then an expression followed
by a block. If that all matches, the parser invokes the corresponding
action method for C<< statement_control:sym<unless> >>.

Remember that for a match, not only must the C<< <sym> >> match the word
C<unless>, the C<< <xblock> >> must also match the C<xblock> token. If
you read more of F<Perl6/Grammar.pm>, you will learn that C<xblock> in
turn tries to match an C<< <EXPR> >> and a C<< <pblock> >>, which in
turn tries to match .....

This is why parsing source code this way is called Recursive Descent.

Back to the C<unless> example, here's the action method for the
C<unless> statement (from F<Perl6/Actions.pm>):

method statement_control:sym<unless>($/) {
my $past := xblock_immediate( $<xblock>.ast );
$past.pasttype('unless');
make $past;
}

When the parser invokes this action method, the current match object
containing the parsed statement is passed into the method as C<$/>.
In Perl 6, this means that the expression C<< $<xblock> >> refers to
whatever the parser matched to the C<xblock> token. Similarly there
are C<< $<EXPR> >> and C<< $<pblock> >> objects etc until the end of the
recursive descent. By the way, C<< $<xblock> >> is Perl 6 syntactic
sugar for C< $/{'xblock'} >.

The magic occurs in the C<< $<xblock>.ast >> and C<make> expressions in
the method body. The C<.ast> method retrieves the PAST made already for
the C<xblock> subtree. Thus C<$past> becomes a node object describing
code to conditionally execute the block in the subtree.

The C<make> statement at the end of the method sets the newly created
C<xblock_immediate> node as the PAST representation of the unless
statement that was just parsed.

The Parrot Compiler Toolkit provides a wide variety of PAST node types
for representing the various components of a HLL program -- for more
details about the available node types, see PDD 26
(L<http://svn.parrot.org/parrot/trunk/docs/pdds/pdd26_ast.pod>).

The PAST representation is the final stage of processing in Rakudo
itself. The PAST datastructure is then passed on to Parrot directly.
itself. The PAST data structure is then passed on to Parrot directly.
Parrot does the remainder of the work translating from PAST to pir and
then to bytecode.

Expand All @@ -158,85 +244,6 @@ far easier to write this component using PIR instead of a
regular expression, but otherwise it acts just like any other
rule in the grammar.

The action methods (in F<src/parser/actions.pm>) are used to convert the nodes
of the parse tree (produced by the parse grammar) into an equivalent Parrot
Abstract Syntax Tree (PAST) representation, which is then passed on to Parrot.

The action methods are where the Rakudo compiler does the bulk of the work of
creating an executable program. Action methods are written in Perl 6, but we
use NQP to compile them into PIR as F<src/gen_actions.pir>.

When Rakudo is compiling a Perl 6 program, action methods are invoked
by the C< {*} > symbols in the parse grammar. Each C< {*} > in a rule
causes the action method corresponding to the rule's name to be
invoked, passing the current match object as an argument. If the
rule source line containing C< {*} > also contains a comment
starting with C< #= >, any text after the comment is passed as a
separate key argument to the action method. (This is similar to
the approach that STD.pm uses to mark and distinguish actions.)

For example, here's the parse rule for Rakudo's C<unless> statement
(in src/parser/grammar.pg):

rule unless_statement {
$<sym>=[unless] <EXPR> <block>
{*}
}

This rule says that an unless statement consists of the word "unless"
(captured into C<< $<sym> >>), followed by an expression and then a block.
If all of those match successfully, then the C< {*} > invokes the
corresponding action method for unless_statement. Here's the action
method for the unless statement (from src/parser/actions.pm):

method unless_statement($/) {
my $then := $( $<block> );
$then.blocktype('immediate');
my $past := PAST::Op.new( $( $<EXPR> ), $then,
:pasttype('unless'),
:node( $/ )
);
make $past;
}

When this action method is invoked from the unless_statement rule,
the current match object containing the parsed statement is passed
into the method as C< $/ >. In Perl 6, this means that the
expressions C<< $<EXPR> >> and C<< $<block> >> will refer to
whatever was matched by the C<< <EXPR> >> and C<< <block> >>
subrules of the C<unless_statement> rule. ( C<< $<block> >>
is Perl 6 syntactic sugar for C< $/{'block'} >.)

Now then, the purpose of the action methods in our compiler is
to convert the parsed elements of the source program into their
abstract syntax tree (PAST) equivalents. The magic for this
occurs in the C< $(...) > and C<make> expressions in the method
body. The C< $(...) > operator is used to retrieve the PAST
representation of a parsed subtree. Thus, the first two statements
of C<unless_statement> retrieve the PAST representation of the
C<< <block> >> subtree into C<$then>, and set that block to
be an immediately executed block.

The third statement creates a new C<PAST::Op> node for the
unless statement, using the PAST representation of C<< <EXPR> >>
as the condition to be tested, the C<$then> block as the body,
and C<:pasttype('unless')> as the type of operation to be
performed. The C<:node($/)> argument is used to link this
PAST node back to the source code that generated it (e.g., for
error reporting).

Finally, the C<make> statement at the end of the method sets
the newly created PAST::Op node as the PAST representation of
the unless statement that was just parsed.

The Parrot Compiler Toolkit provides a wide variety of PAST
node types for representing the various components of a HLL
program -- for more details about the available node types,
see PDD 26 (L<http://svn.parrot.org/parrot/trunk/docs/pdds/pdd26_ast.pod>).




=head2 6. Builtin functions and runtime support

The last component of the compiler are the various builtin
Expand All @@ -245,11 +252,6 @@ have available when it is running. These include functions
for the basic operations (C<< infix:<+> >>, C<< prefix:<abs> >>)
as well as common global functions such as C<say> and C<print>.

Currently, most of the builtins are written in PIR, either because
it's simpler to write them that way or because they represent
very primitive operations (e.g., math primitives) or they're
easier to write in PIR than in Perl 6 or some other language.

=head2 Still to be documented

* Rakudo PMCs
Expand All @@ -259,7 +261,7 @@ easier to write in PIR than in Perl 6 or some other language.
=head1 AUTHORS

Patrick Michaud <pmichaud@pobox.com> is the primary author and
maintainer.
maintainer of Rakudo. The other contributors and named in F<CREDITS>.

=head1 COPYRIGHT

Expand Down
17 changes: 17 additions & 0 deletions src/core/Any-list.pm
Expand Up @@ -150,6 +150,20 @@ augment class Any {
}
return @args[0];
}

# This needs a way of taking a user-defined comparison
# specifier, but AFAIK nothing has been spec'd yet.
# CHEAT: Almost certainly should be hashed on something
# other than the stringification of the objects.
multi method uniq() {
my %seen;
gather for @.list {
unless %seen{$_} {
take $_;
%seen{$_} = 1;
}
}
}
}

our proto sub join (Str $separator = '', *@values) { @values.join($separator); }
Expand All @@ -158,5 +172,8 @@ our multi sub reverse(*@v) { @v.reverse; }
our proto sub end(@array) { @array.end; }
our proto sub grep($test, @values) { @values.grep($test); }
our proto sub first($test, @values) { @values.first($test); }
our proto sub min($by, *@values) { @values.min($by); }
our proto sub max($by, *@values) { @values.max($by); }
our proto sub uniq(@values) { @values.uniq; }

# vim: ft=perl6
2 changes: 1 addition & 1 deletion src/core/Any-num.pm
@@ -1,6 +1,6 @@
augment class Any {
multi method abs() {
self.Num.abs;
pir::abs__Nn(self.Num);
}

multi method exp() {
Expand Down
10 changes: 7 additions & 3 deletions src/core/Complex.pm
Expand Up @@ -320,9 +320,13 @@ multi sub prefix:<->(Complex $a) {
Complex.new(-$a.re, -$a.im);
}

#multi sub infix:<**>(Complex $a, $b) is default {
# ($a.log * $b).exp;
#}
multi sub infix:<**>(Complex $a, Complex $b) {
($a.log * $b).exp;
}

multi sub infix:<**>(Complex $a, $b) {
($a.log * $b).exp;
}

multi sub infix:<**>($a, Complex $b) {
($a.log * $b).exp;
Expand Down
1 change: 1 addition & 0 deletions src/core/Str.pm
Expand Up @@ -4,4 +4,5 @@ augment class Str {
# CHEAT: this implementation is a bit of a cheat,
# but works fine for now.
multi method Int { (+self).Int; }
multi method Num { (+self).Num; }
}
10 changes: 5 additions & 5 deletions t/spectest.data
Expand Up @@ -443,15 +443,15 @@ S32-list/join.t
# S32-list/reduce.t
S32-list/reverse.t
# S32-list/sort.t
# S32-list/uniq.t
S32-list/uniq.t
S32-num/abs.t
S32-num/complex.t
S32-num/exp.t
S32-num/int.t
S32-num/log.t
# S32-num/pi.t
# S32-num/polar.t
# S32-num/power.t
S32-num/power.t
# S32-num/rand.t
S32-num/rat.t
S32-num/roots.t
Expand All @@ -471,7 +471,7 @@ S32-str/chop.t
S32-str/flip.t
S32-str/index.t
S32-str/lcfirst.t
# S32-str/lc.t
S32-str/lc.t
# S32-str/p5chomp.t
# S32-str/p5chop.t
S32-str/pos.t
Expand All @@ -484,11 +484,11 @@ S32-str/split-simple2.t # CHEAT! simplified version of split-simple.t
# S32-str/substr.t
# S32-str/trim.t
S32-str/ucfirst.t
# S32-str/uc.t # icu
S32-str/uc.t # icu
# S32-str/unpack.t
## S32-str/words.t # icu
# S32-temporal/Temporal.t
# S32-trig/e.t
S32-trig/e.t
# S32-trig/pi.t
# S32-trig/sin.t
# S32-trig/cos.t
Expand Down

0 comments on commit cfe8cb3

Please sign in to comment.