From a66802273d9100135dc57d82ad1e70a0364001e8 Mon Sep 17 00:00:00 2001 From: Stefan O'Rear Date: Mon, 11 Oct 2010 01:55:36 -0700 Subject: [PATCH] Start on the documentation rewrite --- docs/nam.pod | 203 +++++++++++++++++++++++++++++++++++++ docs/overview.pod | 49 +++++++++ notes.pod | 251 ---------------------------------------------- 3 files changed, 252 insertions(+), 251 deletions(-) create mode 100644 docs/nam.pod create mode 100644 docs/overview.pod delete mode 100644 notes.pod diff --git a/docs/nam.pod b/docs/nam.pod new file mode 100644 index 00000000..403e3459 --- /dev/null +++ b/docs/nam.pod @@ -0,0 +1,203 @@ +=head1 Synopsis + +This document describes NAM, aka CgOp, the Niecza Abstract Machine. It is a +language used to connect the portable parts of Niecza to the unportable, and +as such requires a fairly strong definition. Unfortunately this document +is rather incomplete. + +=head1 General model + +NAM code consists of one or more units. Each unit contains a set of fixups +(details TBD) which allow compiled code to find metaobject data, and a set of +compilable function bodies. One unit shall contain a body named MAIN, which +is run to start execution. + +A body contains some basic metadata such as the number of lexical slots +required, and also a tree of operations. This tree is structured much like a +Lisp program and obeys similar evaluation rules. One difference is that NAM +nodes have two kinds of children, scalar children and node children, which are +treated separately. + +NAM code must be statically typable but this may not always be enforced. +Different data objects have logical types, which can map many-to-one onto +lower-level types, especially in type-poor environments such as Parrot and +JavaScript. + +=head1 Runtime data objects, by static type + +=head2 int + +A native integer, suitable for loop variables and similar purposes. + +=head2 num + +A native float, suitable for the Perl 6 Num class. + +=head2 bool + +A native bool, as returned by comparison operators. + +=head2 str + +A reference to a native immutable string. + +=head2 strbuf + +A reference to a native mutable string. + +=head2 var + +A Perl 6 variable, with identity, potentially mutable and tied. + +=head2 obj + +A reference to a Perl 6 object; not a variable and cannot be assigned to. + +=head2 varhash + +A hash table mapping strings to Perl 6 variables. + +=head2 fvarlist + +An array of Perl 6 variables fixed in length at creation. + +=head2 vvarlist + +An array of Perl 6 variables supporting OZ<>(1) deque operations. + +=head2 stab + +The nexus of HOW, WHAT, WHO, and REPR. Details subject to flux. + +=head2 treader + +A reference to a native text input object. + +=head2 twriter + +A reference to a native text output object. + +=head2 lad + +A node in the LTM Automaton Descriptor metaobject tree. + +=head2 cc + +A reference to a compiled character class. + +=head2 cursor + +A reference to a low-level cursor. Currently a subtype of obj. + +=head2 frame + +A reference to a call frame. Currently a subtype of obj. + +=head1 Operations + +=head2 Aritmetic and logical operations + +=head2 Sequence control + +=head3 prog + +=head3 span + +=head3 ehspan + +=head3 sink + +=head3 labelhere + +=head3 cgoto + +=head3 goto + +=head3 ncgoto + +=head3 ternary + +=head3 whileloop + +=head3 take + +=head3 cotake + +=head2 Data control + +=head3 letn + +=head3 fetch + +=head3 assign + +=head3 newscalar + +=head3 newrwscalar + +=head3 newblankrwscalar + +=head3 newrwlistvar + +=head3 null + +=head3 sink + +=head3 cast + +=head2 Object model + +=head3 getslot + +=head3 setslot + +=head3 how + +=head3 obj_is_defined + +=head3 obj_llhow + +=head3 obj_isa + +=head3 obj_does + +=head3 subcall + +=head3 methodcall + +=head2 I/O + +=head3 say + +=head3 slurp + +=head3 treader_stdin + +=head2 Native operations (CLR) + +These are for user code only. Core library code should define custom system +primitives instead. + +=head3 getfield + +=head3 setfield + +=head3 rawsget + +=head3 rawsset + +=head3 rawscall + +=head3 rawcall + +=head3 setindex + +=head3 getindex + +=head3 rawnew + +=head3 rawnewarr + +=head3 rawnewzarr + +=head3 labelid diff --git a/docs/overview.pod b/docs/overview.pod new file mode 100644 index 00000000..c9ff4352 --- /dev/null +++ b/docs/overview.pod @@ -0,0 +1,49 @@ +=head1 Synopsis + +This is an overview of the Niecza ecosystem, which is currently contained +entirely inside the repository. + +=head1 Compiler + +Found mostly in C, this converts Perl 6 source to C#. See +C for more details. + +=head1 Runtime system + +This comprises C and C; it is the body of C# +primitives necessary for compiler output to function, or that underlie the +lowest levels of library functionality. + +=head1 Core library + +C and C are used automatically in Perl 6 +programs and provide definitions of all Perl 6 functions. They use Niecza +extensions fairly heavily, especially inline NAM code and references into +the runtime. + +=head1 Other libraries + +C currently provides multithreading and a TAP stub. I hope to see +more here eventually. + +=head1 Build system + +C is in charge of getting the compiler and libraries into a +usable state and running the tests. It is supported by C and +C, a Microsoft.Build plugin for running Perl 5 code; not having +to load Niecza more than once saves quite a few seconds. + +=head1 Documentation + +Watch this space. + +=head1 Miscellany + +C contains various scripts and tools used to microbenchmark changes in +Niecza. C contains a handful of unit tests (unmaintained). + +=head1 Test suite + +C is the main test suite; all tests in it are expected to pass. +C and C are much smaller and allowed to contain failing +tests; I use them as a TDD staging area. diff --git a/notes.pod b/notes.pod deleted file mode 100644 index 729d9aab..00000000 --- a/notes.pod +++ /dev/null @@ -1,251 +0,0 @@ -=head1 Notes on P6/CLR mapping - -=head2 Classes and methods - -Perl6 has pervasive duck typing. C actually means -C<< my $x where .^does(Str) >>; and can be overriden. Thus, our methods have to -be able to, in principle, handle any representation type. This is compouned -by the notion of per-object representational polymorphism in Perl 6. - -Three representations will be used often. The null representation provides for -protoobjects, and fails any attempt to access attributes. A special dynamic -representation can have its shape modified at any time, and is thus needed for -pre-CHECK code. Finally, the CLR native representation is used for normal -run-time objects. CLR native representation is postponed for now. - -There is also a CLR external representation. More on why it's needed later. - -Representations define the essential Perl 6 object operations - calling methods, -interrogative pronouns, and slot access. Since all objects are handled as CLR -objects, the natural way to implement representation operations is as special -methods. In the interests of allomorphic and polymorphic operation, the -representation interface is exposed to the CLR as an IPerl6Object interface. - -Every Perl6 class creates a CLR class and a CLR interface for the CLR native -representation. The interface makes efficient-ish multiple inheritance and -representation polymorphism possible. Note that code has to use IPerl6Object, -not the specific interface, prior to specialization. Specialization can go all -the way to the class form. - -Methods present as one or more CLR methods. Role methods (and non-method subs) -are static. Every method has multiple supported calling conventions, -especially if old code can't be thrown out after a model change. Capture -conversion is handled in the generic IPerl6Object; specialized representation -interfaces can offer more specific calling conventions. - -=head2 Pessimizers - -We have objects called "pessimizers". They encapsulate optimizations that need -to be delayed, changes to the program that could happen. For instance, a -pessimizer could represent the possibility that SomeClass could be augmented. -Pessimizers watch the program model; they also have a "cancel" operation that -allows the optimizations to be performed. Pessimizers can be linked, such that -cancelling the outer pessimizer cancels the inner. When a use statement is -compiled, or any other recursive compiler invocation, the used code's -pessimizer is linked to the user. After the outermost call to the compiler -returns, the main program CHECK time in the first such case but also after -requires and evals, the resulting pessimizer is automatically cancelled. Since -we change our model so much, we like to generate IL lazily if we can. - -=head2 Multiple dispatch - -In all multiple dispatch situations, the full set of candidates is known at -some places. Lexical multiple dispatch has a fixed set in any lexical scope; -method multiple dispatch has a fixed set in any class; package multiple -dispatch in namespaces. So we just need to compile a dispatcher right there. -Generation of decision trees for pattern matching is well studied; see -'Compiling pattern matching' by Lennart 'augustss' Augustsson. - -=head2 CLR imports - -Namespaces from other CLR assemblies manifest as sealed packages. Classes so -obtained will be unsuitable for usage in multiple inheritance. Individual -objects cannot be directly used, so they will be wrapped up in an object with -the "CLR external" representation; this, conveniently, provides IPerl6Object -and any mapped role classes. This is invisible to the user, as WHICH and WHERE -are delegated to the repr and work correctly. - -=head2 CLR exports - -A set of Perl6 modules could be compiled to an assembly. Since CLR has no -pessimizers, any module so compiled would need immediate optimization. Many -questions remain. - -=head2 gather/take and CPS - -C and C require the ability to stop a sub in mid-execution and -continue it later. Since any unknown function can call C, all functions -(until we have pessimizers working) need to be encoded with explicit stack -frame objects; and we need to B the transformation in all cases. - -For CLR compatibility all exposed functions which use CPS internally need to -provide a recursive-runloop-starting face... - -=head2 Control exceptions - -Control flow in Perl 6 is logically handled using exceptions. However, there -is a beautifully simple implementation of this (from pmichaud I think?); just -store the frame to return to in a closed-over lexical variable, and rely on -outward continuation semantics to implement the lexotic control exception. - -=head2 Resumable exceptions - -In Perl 6, throwing exceptions doesn't unwind the stack; it just calls a -handler, which can unwind the stack itself using lexotic control flow if it -wants. (Yes, this is circular with the last paragraph. Deal.) For normal -exceptions, the handler is not allowed to return normally, which makes the -distinction moot; warnings and take work differently. - -=head2 Lexical continuation - -Perl 6 eval requires the ability to access outer lexicals. That's just -introspection on the caller's frame. The setting is a special case of this; -there's a function in the compiler (probably written in C# or something) which -defines a bunch of primitives, then evals the user code. Or maybe it uses an -internal API directly. - -=head2 SPECIALIZE - -SPECIALIZE (or SPECIALISE) is parsed as a phaser; in it is a list of type -objects. The enclosing function gets extra multis (or not quite; it shouldn't -affect dispatch) for the cases. Will probably be involved in the -implementation of CLR object usage; functions being specialized on IRealAny. -Specialization, and lazy code generation, are the crux of optimization in -Niecza. - -=head2 BEGIN - -C must take special care to not deal with classes being defined, as once -a type has been submitted to ref emit, it cannot be changed. This will result -in some slowdown. I'm OK with that; most C++ template engines don't compile -to native code either. - -=head2 Three kinds of pad - -Normal run-time pads are CLR objects of some dedicated class, based loosely on -Perlesque frame objects. Compile time pads are hashes, so they can be extended -as things are declared. This is very similar to the hash/clr representation -dichotomy with objects; unification could be worthwhile. Package pads will be -hashes like any uncloned pad, but need special handling in CLR export. - -=head2 Variables - -When C is executed, two objects are created, a C and a -C. The C is a native type; it is B a Perl 6 -object, although it can be reified as one. The symbol table points to the -variable, which points to the container, which points to the value. The -variable keeps its identity through binds; the container doesn't. A variable -is a boxed object, representing the runtime manifestation of a bvalue. -Functions return variables, in order that postcircumfixes can be bound; -however, functions do not take variables. C<< &infix:<:=> >> and -C<< prefix: >> are macros, not functions. Since variables are not bound -to functions, the variable object for a lexical often need not be generated; -the container pointer itself is what's passed to functions and it can be stored -in the frame. - -=head2 RIP lexotic continuations - -Because of C, all control exceptions which pass through a statically -unknown set of frames need to use the handler-chain mechanism, except for -C/C's own unwinding. A set of handlers is used to relay -exceptions when the CLR wants to start a recursive runloop. Since we're now -only using CPS for C/C, using Mono's native delimited -continuations is back on the table. - -=head1 Notes on the compiler - -These are in chronological order. If two conflict, the later is right. - -=head2 Finitary bodies and scopes - -We need to keep the C and C objects, which are constructed by -the parser, typologically distinct from C and C. By avoiding -issues of cloning, it is possible to track all such objects created during a -compilation. The compiler is not reentrant in a conventional sense; any call -to C or any C statement adds code to the I compilation. -At C time (or maybe sooner if C time user code needs to run?), -C and C objects are I. This means that the compiler -determines exactly what they need to contain and provides gifts of code. The -objects themselves do not control the compilation; this is necessary to keep -the compiler out of the kernel. - -=head2 The two metamodels - -C is used for most compiling tasks, and is available much -earlier in the setting. A single instance of it can manage many classes, since -it corresponds to a class I, which may be in a clonable context. -The business here is in the compiler; a CIL class is generated and used to -instantiate the metaobject. - -C is able to actually generate unrelated classes at run time. - -[Edit: StaticClassHOW is not going to be used in the setting due to augment -Mu concerns. Mu will be defined using ClassHOW, stripped down.] - -=head2 Weird eval rules - -In order for precompilation to work, we need to fix the code which the compiler -sees. Therefore, C blocks are not allowed to use C<&CORE::eval> when -the precompiler is activated. - -We may go further and ban all I/O from C blocks under these -circumstances. If this is done, the compiler rule becomes unneeded. - -=head2 New interface model - -Interfaces represent duck typing; they are somewhat inferred, like Go, and need -not share referential identity, like COM. If a role or class Foo exists, IFoo -exists; an object can create an IFoo presentation if and only if C<.^does(Foo)>. -A C argument is optimized to directly involve IFoo, so gradual typing -actually eliminates most late binding in this model. Difficulties exist with -monkey typing; the simple solution is to say IFoo is the original version of -Foo, and any added methods must be late bound. We could also version things, -with subtyping (IFoo3 implies IFoo2). - -=head2 Units only - -We don't need to deeply track every single Body and ClassTemplate in use. -Merely tracking compilation units is enough. - -=head2 A new IL - -(Very conjectural) For glue code, we provide SIR, an externalized form of the -CodeGen API. It has a surface syntax based on Forth and provides a stack model -with built-in CPS but mapping near injectively to CIL. - -=head2 PRELUDE and PRE-INIT - -Two special phasers are provided by the Alpha metacompiler (Beta uses the -standard Perl 6 phasers). PRE-INIT runs while runtime structures are being -constructed. It can inject things into the nascent protolexpad. PRELUDE runs -before the structures are constructed at all; the definitions of Frame stuff -needs to go here. - -=head2 Protoclasses - -A special kind of class-like object exists in protopads. It is the protoclass, -and it functions much like a protopad. Protoclasses carry unbound methods; -they exist in a static scope. A protoclass can be cloned into a full class by -associating it with an OUTER:: scope. A protoclass defined in more than one -place (using augment or supercede) will require more than one OUTER::; the main -clone sets up the first OUTER, subsequent ones set up by the runtime -manifestation of augments. - -=head2 Death of NIL - -Due to persistant stack issues, NIL is no more. Instead, use CgOp, which is -a Lisp-like presentation of the low level optree. - -=head1 Brief overview of the compiler - -User runs C. Entry point is in C, the driver. -C parses arguments, decides the user wants to run some Perl -6 code (gasp!), passes it off to C. C calls action methods in -C, which construct a bunch of objects and metaobjects; mostly -specced things like C and C and C, but a few compiler -things too, like C. The top level code, as a C, is -passed back to C. C, seeing that it was -told to B the code, calls C<< postcircumfix:<( )> >>. -C<< postcircumfix:<( )> >> sees that the C<$!Method> property is null and -constructs a CLR method. The method is run. (Eventually, pessimizers and -modules will enter this picture)