From 6663abf809eecc708037c0dae69fd44237bb8c05 Mon Sep 17 00:00:00 2001 From: Martin Berends Date: Thu, 18 Feb 2010 21:17:12 +0000 Subject: [PATCH] [docs/compiler_overview.pod] clarified some explanations, total around 85% complete --- docs/compiler_overview.pod | 99 ++++++++++++++++++++++---------------- 1 file changed, 58 insertions(+), 41 deletions(-) diff --git a/docs/compiler_overview.pod b/docs/compiler_overview.pod index da736550e37..ae438ab1a7a 100644 --- a/docs/compiler_overview.pod +++ b/docs/compiler_overview.pod @@ -65,8 +65,8 @@ includes a powerful Perl 6 regex engine. This gives a streamlined compiler framework on which to build a very functional Perl 6 implementation. -NQP itself is also written in PIR, is an important part of the Parrot -Compiler Toolkit (PCT), and is installed with Parrot. PCT is a standard +NQP itself is also written in PIR. It is an important part of the Parrot +Compiler Toolkit (PCT) and is installed with Parrot. PCT is a standard framework to make and use Parrot based languages. The source code of NQP is in F<../parrot/ext/nqp-rx/> and the resulting compiler is F<../parrot_install/bin/parrot-nqp>. Note, NQP only I the @@ -104,8 +104,8 @@ to switch.) A subroutine called C<'main'>, in F, starts the source parsing and bytecode generation work. It creates a C object for the C<'perl6'> source type. The -C class inherits from the C class of the -Parrot Compiler Toolkit, look in +C class inherits from the Parrot Compiler Toolkit's +C class, see F<../parrot/compilers/pct/src/PCT/HLLCompiler.pir>. Before tracing Rakudo's execution further, a few words about Parrot @@ -122,13 +122,6 @@ calls subs having the C<:load> modifier. The Rakudo C<:init> subs are usually also C<:load>, so that the same startup sequence occurs whether Rakudo is run as an executable or loaded as a library. -F has three C<.loadlib> commands early on, for -C, C and C. All three dynamically -extend Parrot with respectively Rakudo specific PMC's (Poly Morphic -Containers, formerly Parrot Magic Cookies), opcodes, and mathematical -operators. The source is in F, F and -F. - So, that Rakudo 'main' subroutine had created a C object. Next, 'main' invokes the C<'command_line'> method on this object, passing the command line arguments in a PMC called C. @@ -139,43 +132,17 @@ And that's it, apart from a C<'!fire_phasers'('END')> and an C. Well, as far a C<'main'> is concerned. The remaining work is divided between PCT, grammar and actions. -=head2 2. Grammar +=head2 3. Grammar Using C, C target C uses F to compile F to F. -The top-level portion of the grammar is written using Perl 6 rules -(Synopsis 5) and is based on the STD.pm grammar in the Pugs repository -(L). There are a few -places where Rakudo's grammar deviates from STD.pm, but the ultimate -goal is for the two to converge. The grammar inherits from -C, which provides the C<< <.panic> >> rule to throw -exceptions for syntax errors. - The compiler works by calling C method in F. After some initialization, TOP matches the user program to the comp_unit (meaning compilation unit) token. That triggers a series of matches to other tokens and rules (two kinds of regex) depending on the source in the user program. -=head2 3. Actions - -The F file defines the code that the compiler -generates when it matches each token or rule. The output is a tree -hierarchy of objects representing language syntax elements, such as a -statement. The tree is called a Parrot Abstract Syntax Tree (PAST). - -The C class inherits from C, another part -of the Parrot Compiler Toolkit. The source is in -F<../parrot/ext/nqp-rx/stage0/src/HLL-s0.pir>, look for several -instances of C<.namespace ["HLL";"Actions"]>. - -When the PCT calls the C<'parse'> method on a grammar, it passes not -only the program source code, but also a pointer to a parseactions class -such as our compiled C. Then, each time the parser -matches a named regex in the grammar, it automatically calls the same -named method in the actions class. - For example, here's the parse rule for Rakudo's C statement (in F): @@ -189,8 +156,7 @@ For example, here's the parse rule for Rakudo's C statement This token says that an C statement consists of the word "unless" (captured into C<< $ >>), and then an expression followed -by a block. If that all matches, the parser invokes the corresponding -action method for C<< statement_control:sym >>. +by a block. Remember that for a match, not only must the C<< >> match the word C, the C<< >> must also match the C token. If @@ -198,7 +164,33 @@ you read more of F, you will learn that C in turn tries to match an C<< >> and a C<< >>, which in turn tries to match ..... -This is why parsing source code this way is called Recursive Descent. +That is why this parsing algorithm is called Recursive Descent. + +The top-level portion of the grammar is written using Perl 6 rules +(Synopsis 5) and is based on the STD.pm grammar in the Pugs repository +(F). There are a few +places where Rakudo's grammar deviates from STD.pm, but the ultimate +goal is for the two to converge. Rakudo's grammar inherits from PCT's +C, which provides the C<< <.panic> >> rule to throw +exceptions for syntax errors. + +=head2 4. Actions + +The F file defines the code that the compiler +generates when it matches each token or rule. The output is a tree +hierarchy of objects representing language syntax elements, such as a +statement. The tree is called a Parrot Abstract Syntax Tree (PAST). + +The C class inherits from C, another part +of the Parrot Compiler Toolkit. Look in +F<../parrot/ext/nqp-rx/stage0/src/HLL-s0.pir> for several instances of +C<.namespace ["HLL";"Actions"]>. + +When the PCT calls the C<'parse'> method on a grammar, it passes not +only the program source code, but also a pointer to a parseactions class +such as our compiled C. Then, each time the parser +matches a named regex in the grammar, it automatically invokes the same +named method in the actions class. Back to the C example, here's the action method for the C statement (from F): @@ -236,6 +228,21 @@ itself. The PAST data structure is then passed on to Parrot directly. Parrot does the remainder of the work translating from PAST to pir and then to bytecode. +=head2 5. Parrot extensions + +F has three C<.loadlib> commands early on, for +C, C and C. All three dynamically +extend Parrot with respectively Rakudo specific PMC's (Poly Morphic +Containers, formerly Parrot Magic Cookies), opcodes, and mathematical +operators. The source is in F, F and +F. + +(F) + +(F) + --- ng update progress point Lastly, the F file implements @@ -252,6 +259,16 @@ have available when it is running. These include functions for the basic operations (C<< infix:<+> >>, C<< prefix: >>) as well as common global functions such as C and C. +(F) + +(F) + +(F) + +(F) + +(F) + =head2 Still to be documented * Rakudo PMCs