This document describes the architecture and operation of the Rakudo Perl 6 (or simply Rakudo) compiler. The README describes how to build and run Rakudo.
Rakudo has six main parts summarized below. Source code paths are relative to Rakudo's src/ directory, and platform specific filename extensions such as .exe are sometimes omitted for brevity.
Not Quite Perl builds Perl 6 source code parts into Rakudo
A main program drives parsing, code generation and runtime execution (Perl6/Compiler.pir)
A grammar parses user programs (Perl6/Grammar.pm)
Action methods build a Parrot Abstract Syntax Tree (Perl6/Actions.pm)
Parrot extensions provide Perl 6 run time behavior (TODO: describe) (binder/*, ops/*, pmc/*)
Libraries provide functions at run time (builtins/*.pir, cheats/*, core/*.pm, glue/*.pir, metamodel/*)
The Makefile (generated from build/Makefile.in by ../Configure.pl) compiles all the parts to form the perl6.pbc executable and the perl6 or perl6.exe "fake executable". We call it fake because it has only a small stub of code to launch the Parrot executable, and passes itself as a chunk of bytecode for Parrot to execute. The source code of the "fakecutable" is generated as perl6.c with the stub at the very end. The entire contents of perl6.pbc are represented as escaped octal characters in one huge string called program_code
. What a hack!
The source files of Rakudo are preferably and increasingly written in Perl 6, the remainder in Parrot Intermediate Representation (PIR) or C. Not Quite Perl (nqp) provides the bootstrap step of compiling compiler code (yes!) written in a subset of Perl 6, into PIR.
The latest version of NQP is called nqp-rx because it now also includes a powerful Perl 6 regex engine. This gives a streamlined compiler framework on which to build a very functional Perl 6 implementation.
NQP itself is also written in PIR, is an important part of the Parrot Compiler Toolkit (PCT), and is installed with Parrot. PCT is a standard framework to make and use Parrot based languages. The source code of NQP is in ../parrot/ext/nqp-rx/ and the resulting compiler is ../parrot_install/bin/parrot-nqp. Note, NQP only builds the Rakudo compiler, and does not compile or run user programs.
NQP[-RX] compiles us a very good compiler in gen/perl6.pbc, referred to as "stage-1", or S1_PERL6_PBC
in the Makefile. This version would be limited in production though, because libraries of classes and methods available at run time (for example Complex) have not yet been added.
The "stage-1" compiler (note: not NQP) compiles all Rakudo's Perl 6 code again, this time including all the library modules (gen/core.pm), to make perl6.pbc (note: not in gen/). That gen/core.pm file is generated by build/gen_core_pm.pl from a list called CORE_SOURCES
in Makefile. Thanks to the staging process, a large and growing proportion of Rakudo's source code is written in Perl 6.
We can conceivably use the Rakudo compiler to compile itself to PIR and eliminate the need for NQP entirely. At some point as Rakudo matures we will probably do this. However, for the time being it's slightly easier to manage the process if we keep a distinction between the two tools, and using NQP for this stage also helps us to limit ourselves to using a regular, well-defined, and relatively easy-to-implement subset of Perl 6 for the core compiler. So, while it's possible for us to eliminate NQP from the process, there are some good reasons not to do so just yet. (If at some point we discover that we need something for the compiler that NQP can't or won't support, then that will probably be a good point to switch.)
A subroutine called 'main'
, in Perl6/Compiler.pir, starts the source parsing and bytecode generation work. It creates a Perl6::Compiler
object for the 'perl6'
source type. The Perl6::Compiler
class inherits from the HLLCompiler
class of the Parrot Compiler Toolkit, look in ../parrot/compilers/pct/src/PCT/HLLCompiler.pir.
Before tracing Rakudo's execution further, a few words about Parrot process and library initialization.
Parrot execution does not simply begin with 'main'. When Parrot executes a bytecode file, it first calls all subroutines in it that are marked with the :init
modifier. Rakudo has over 50 such subroutines, brought in by .include
directives in Perl6/Compiler.pir, to create classes and objects in Parrot's memory.
Similarly, when the executable loads libraries, Parrot automatically calls subs having the :load
modifier. The Rakudo :init
subs are usually also :load
, so that the same startup sequence occurs whether Rakudo is run as an executable or loaded as a library.
Perl6/Compiler.pir has three .loadlib
commands early on, for perl6_group
, perl6_ops
and math_ops
. All three dynamically extend Parrot with respectively Rakudo specific PMC's (Poly Morphic Containers, formerly Parrot Magic Cookies), opcodes, and mathematical operators. The source is in pmc/*, ops/* and parrot/src/ops/math.ops.
So, that Rakudo 'main' subroutine had created a Perl6::Compiler
object. Next, 'main' invokes the 'command_line'
method on this object, passing the command line arguments in a PMC called args_str
. The 'command_line'
method is inherited from the HLLCompiler
parent class (part of the PCT, remember).
And that's it, apart from a '!fire_phasers'('END')
and an exit
. Well, as far a 'main'
is concerned. The remaining work is divided between PCT, grammar and actions.
Using parrot-nqp
, make
target PERL6_G
uses parrot-nqp to compile Perl6/Grammar.pm to gen/perl6-grammar.pir.
The top-level portion of the grammar is written using Perl 6 rules (Synopsis 5) and is based on the STD.pm grammar in the Pugs repository (http://svn.pugscode.org/pugs/src/perl6/STD.pm). There are a few places where Rakudo's grammar deviates from STD.pm, but the ultimate goal is for the two to converge. The grammar inherits from HLL::Grammar
, which provides the <.panic>
rule to throw exceptions for syntax errors.
The compiler works by calling TOP
method in Perl6/Grammar.pm. After some initialization, TOP matches the user program to the comp_unit (meaning compilation unit) token. That triggers a series of matches to other tokens and rules (two kinds of regex) depending on the source in the user program.
The Perl6/Actions.pm file defines the code that the compiler generates when it matches each token or rule. The output is a tree hierarchy of objects representing language syntax elements, such as a statement. The tree is called a Parrot Abstract Syntax Tree (PAST).
The Perl6::Actions
class inherits from HLL::Actions
, another part of the Parrot Compiler Toolkit. The source is in ../parrot/ext/nqp-rx/stage0/src/HLL-s0.pir, look for several instances of .namespace ["HLL";"Actions"]
.
When the PCT calls the 'parse'
method on a grammar, it passes not only the program source code, but also a pointer to a parseactions class such as our compiled Perl6::Actions
. Then, each time the parser matches a named regex in the grammar, it automatically calls the same named method in the actions class.
For example, here's the parse rule for Rakudo's unless
statement (in Perl6/Grammar.pm):
token statement_control:sym<unless> {
<sym> :s
<xblock>
[ <!before 'else'> ||
<.panic: 'unless does not take "else", please rewrite using "if"'>
]
}
This token says that an unless
statement consists of the word "unless" (captured into $<sym>
), and then an expression followed by a block. If that all matches, the parser invokes the corresponding action method for statement_control:sym<unless>
.
Remember that for a match, not only must the <sym>
match the word unless
, the <xblock>
must also match the xblock
token. If you read more of Perl6/Grammar.pm, you will learn that xblock
in turn tries to match an <EXPR>
and a <pblock>
, which in turn tries to match .....
This is why parsing source code this way is called Recursive Descent.
Back to the unless
example, here's the action method for the unless
statement (from Perl6/Actions.pm):
method statement_control:sym<unless>($/) {
my $past := xblock_immediate( $<xblock>.ast );
$past.pasttype('unless');
make $past;
}
When the parser invokes this action method, the current match object containing the parsed statement is passed into the method as $/
. In Perl 6, this means that the expression $<xblock>
refers to whatever the parser matched to the xblock
token. Similarly there are $<EXPR>
and $<pblock>
objects etc until the end of the recursive descent. By the way, $<xblock>
is Perl 6 syntactic sugar for $/{'xblock'}
.
The magic occurs in the $<xblock>.ast
and make
expressions in the method body. The .ast
method retrieves the PAST made already for the xblock
subtree. Thus $past
becomes a node object describing code to conditionally execute the block in the subtree.
The make
statement at the end of the method sets the newly created xblock_immediate
node as the PAST representation of the unless statement that was just parsed.
The Parrot Compiler Toolkit provides a wide variety of PAST node types for representing the various components of a HLL program -- for more details about the available node types, see PDD 26 (http://svn.parrot.org/parrot/trunk/docs/pdds/pdd26_ast.pod).
The PAST representation is the final stage of processing in Rakudo itself. The PAST data structure is then passed on to Parrot directly. Parrot does the remainder of the work translating from PAST to pir and then to bytecode.
--- ng update progress point
Lastly, the src/parser/quote_expression.pir file implements code to parse the various forms of Perl 6 quoting rules. It's far easier to write this component using PIR instead of a regular expression, but otherwise it acts just like any other rule in the grammar.
The last component of the compiler are the various builtin functions and libraries that a Perl 6 program expects to have available when it is running. These include functions for the basic operations (infix:<+>
, prefix:<abs>
) as well as common global functions such as say
and print
.
* Rakudo PMCs
* The relationship between Parrot classes and Rakudo classes
* Protoobject implementation and basic class hierarchy
Patrick Michaud <pmichaud@pobox.com> is the primary author and maintainer of Rakudo. The other contributors and named in CREDITS.
Copyright (C) 2007-2009, The Perl Foundation.