Skip to content

Latest commit

 

History

History
275 lines (208 loc) · 11 KB

compiler_overview.pod

File metadata and controls

275 lines (208 loc) · 11 KB

RAKUDO COMPILER OVERVIEW

How the Rakudo Perl 6 compiler works

This document describes the architecture and operation of the Rakudo Perl 6 (or simply Rakudo) compiler. The README describes how to build and run Rakudo.

Rakudo has six main parts summarized below. Source code paths are relative to Rakudo's src/ directory, and platform specific filename extensions such as .exe are sometimes omitted for brevity.

  1. Not Quite Perl builds Perl 6 source code parts into Rakudo

  2. A main program drives parsing, code generation and runtime execution (Perl6/Compiler.pir)

  3. A grammar parses user programs (Perl6/Grammar.pm)

  4. Action methods build a Parrot Abstract Syntax Tree (Perl6/Actions.pm)

  5. Parrot extensions provide Perl 6 run time behavior (TODO: describe) (binder/*, ops/*, pmc/*)

  6. Libraries provide functions at run time (builtins/*.pir, cheats/*, core/*.pm, glue/*.pir, metamodel/*)

The Makefile (generated from build/Makefile.in by ../Configure.pl) compiles all the parts to form the perl6.pbc executable and the perl6 or perl6.exe "fake executable". We call it fake because it has only a small stub of code to launch the Parrot executable, and passes itself as a chunk of bytecode for Parrot to execute. The source code of the "fakecutable" is generated as perl6.c with the stub at the very end. The entire contents of perl6.pbc are represented as escaped octal characters in one huge string called program_code. What a hack!

1. NQP[-RX]

The source files of Rakudo are preferably and increasingly written in Perl 6, the remainder in Parrot Intermediate Representation (PIR) or C. Not Quite Perl (nqp) provides the bootstrap step of compiling compiler code (yes!) written in a subset of Perl 6, into PIR.

The latest version of NQP is called nqp-rx because it now also includes a powerful Perl 6 regex engine. This gives a streamlined compiler framework on which to build a very functional Perl 6 implementation.

NQP itself is also written in PIR, is an important part of the Parrot Compiler Toolkit (PCT), and is installed with Parrot. PCT is a standard framework to make and use Parrot based languages. The source code of NQP is in ../parrot/ext/nqp-rx/ and the resulting compiler is ../parrot_install/bin/parrot-nqp. Note, NQP only builds the Rakudo compiler, and does not compile or run user programs.

Stages

NQP[-RX] compiles us a very good compiler in gen/perl6.pbc, referred to as "stage-1", or S1_PERL6_PBC in the Makefile. This version would be limited in production though, because libraries of classes and methods available at run time (for example Complex) have not yet been added.

The "stage-1" compiler (note: not NQP) compiles all Rakudo's Perl 6 code again, this time including all the library modules (gen/core.pm), to make perl6.pbc (note: not in gen/). That gen/core.pm file is generated by build/gen_core_pm.pl from a list called CORE_SOURCES in Makefile. Thanks to the staging process, a large and growing proportion of Rakudo's source code is written in Perl 6.

We can conceivably use the Rakudo compiler to compile itself to PIR and eliminate the need for NQP entirely. At some point as Rakudo matures we will probably do this. However, for the time being it's slightly easier to manage the process if we keep a distinction between the two tools, and using NQP for this stage also helps us to limit ourselves to using a regular, well-defined, and relatively easy-to-implement subset of Perl 6 for the core compiler. So, while it's possible for us to eliminate NQP from the process, there are some good reasons not to do so just yet. (If at some point we discover that we need something for the compiler that NQP can't or won't support, then that will probably be a good point to switch.)

2. Compiler main program

A subroutine called 'main', in Perl6/Compiler.pir, starts the source parsing and bytecode generation work. It creates a Perl6::Compiler object for the 'perl6' source type. The Perl6::Compiler class inherits from the HLLCompiler class of the Parrot Compiler Toolkit, look in ../parrot/compilers/pct/src/PCT/HLLCompiler.pir.

Before tracing Rakudo's execution further, a few words about Parrot process and library initialization.

Parrot execution does not simply begin with 'main'. When Parrot executes a bytecode file, it first calls all subroutines in it that are marked with the :init modifier. Rakudo has over 50 such subroutines, brought in by .include directives in Perl6/Compiler.pir, to create classes and objects in Parrot's memory.

Similarly, when the executable loads libraries, Parrot automatically calls subs having the :load modifier. The Rakudo :init subs are usually also :load, so that the same startup sequence occurs whether Rakudo is run as an executable or loaded as a library.

Perl6/Compiler.pir has three .loadlib commands early on, for perl6_group, perl6_ops and math_ops. All three dynamically extend Parrot with respectively Rakudo specific PMC's (Poly Morphic Containers, formerly Parrot Magic Cookies), opcodes, and mathematical operators. The source is in pmc/*, ops/* and parrot/src/ops/math.ops.

So, that Rakudo 'main' subroutine had created a Perl6::Compiler object. Next, 'main' invokes the 'command_line' method on this object, passing the command line arguments in a PMC called args_str. The 'command_line' method is inherited from the HLLCompiler parent class (part of the PCT, remember).

And that's it, apart from a '!fire_phasers'('END') and an exit. Well, as far a 'main' is concerned. The remaining work is divided between PCT, grammar and actions.

2. Grammar

Using parrot-nqp, make target PERL6_G uses parrot-nqp to compile Perl6/Grammar.pm to gen/perl6-grammar.pir.

The top-level portion of the grammar is written using Perl 6 rules (Synopsis 5) and is based on the STD.pm grammar in the Pugs repository (http://svn.pugscode.org/pugs/src/perl6/STD.pm). There are a few places where Rakudo's grammar deviates from STD.pm, but the ultimate goal is for the two to converge. The grammar inherits from HLL::Grammar, which provides the <.panic> rule to throw exceptions for syntax errors.

The compiler works by calling TOP method in Perl6/Grammar.pm. After some initialization, TOP matches the user program to the comp_unit (meaning compilation unit) token. That triggers a series of matches to other tokens and rules (two kinds of regex) depending on the source in the user program.

3. Actions

The Perl6/Actions.pm file defines the code that the compiler generates when it matches each token or rule. The output is a tree hierarchy of objects representing language syntax elements, such as a statement. The tree is called a Parrot Abstract Syntax Tree (PAST).

The Perl6::Actions class inherits from HLL::Actions, another part of the Parrot Compiler Toolkit. The source is in ../parrot/ext/nqp-rx/stage0/src/HLL-s0.pir, look for several instances of .namespace ["HLL";"Actions"].

When the PCT calls the 'parse' method on a grammar, it passes not only the program source code, but also a pointer to a parseactions class such as our compiled Perl6::Actions. Then, each time the parser matches a named regex in the grammar, it automatically calls the same named method in the actions class.

For example, here's the parse rule for Rakudo's unless statement (in Perl6/Grammar.pm):

token statement_control:sym<unless> {
  <sym> :s
  <xblock>
  [ <!before 'else'> ||
    <.panic: 'unless does not take "else", please rewrite using "if"'>
  ]
}

This token says that an unless statement consists of the word "unless" (captured into $<sym>), and then an expression followed by a block. If that all matches, the parser invokes the corresponding action method for statement_control:sym<unless>.

Remember that for a match, not only must the <sym> match the word unless, the <xblock> must also match the xblock token. If you read more of Perl6/Grammar.pm, you will learn that xblock in turn tries to match an <EXPR> and a <pblock>, which in turn tries to match .....

This is why parsing source code this way is called Recursive Descent.

Back to the unless example, here's the action method for the unless statement (from Perl6/Actions.pm):

method statement_control:sym<unless>($/) {
  my $past := xblock_immediate( $<xblock>.ast );
  $past.pasttype('unless');
  make $past;
}

When the parser invokes this action method, the current match object containing the parsed statement is passed into the method as $/. In Perl 6, this means that the expression $<xblock> refers to whatever the parser matched to the xblock token. Similarly there are $<EXPR> and $<pblock> objects etc until the end of the recursive descent. By the way, $<xblock> is Perl 6 syntactic sugar for $/{'xblock'} .

The magic occurs in the $<xblock>.ast and make expressions in the method body. The .ast method retrieves the PAST made already for the xblock subtree. Thus $past becomes a node object describing code to conditionally execute the block in the subtree.

The make statement at the end of the method sets the newly created xblock_immediate node as the PAST representation of the unless statement that was just parsed.

The Parrot Compiler Toolkit provides a wide variety of PAST node types for representing the various components of a HLL program -- for more details about the available node types, see PDD 26 (http://svn.parrot.org/parrot/trunk/docs/pdds/pdd26_ast.pod).

The PAST representation is the final stage of processing in Rakudo itself. The PAST data structure is then passed on to Parrot directly. Parrot does the remainder of the work translating from PAST to pir and then to bytecode.

--- ng update progress point

Lastly, the src/parser/quote_expression.pir file implements code to parse the various forms of Perl 6 quoting rules. It's far easier to write this component using PIR instead of a regular expression, but otherwise it acts just like any other rule in the grammar.

6. Builtin functions and runtime support

The last component of the compiler are the various builtin functions and libraries that a Perl 6 program expects to have available when it is running. These include functions for the basic operations (infix:<+>, prefix:<abs>) as well as common global functions such as say and print.

Still to be documented

* Rakudo PMCs
* The relationship between Parrot classes and Rakudo classes
* Protoobject implementation and basic class hierarchy

AUTHORS

Patrick Michaud <pmichaud@pobox.com> is the primary author and maintainer of Rakudo. The other contributors and named in CREDITS.

COPYRIGHT

Copyright (C) 2007-2009, The Perl Foundation.