Clone this wiki locally
See also Engine architecture, since the design of the compiler and the design of the engine are of course intertwined.
As described at Branches, NetLogo development work has diverged into two separate branches: 5.x and hexy.
The compiler is rather different in these branches. A lot of refactoring work on the compiler was done for NetLogo-Headless, which was later merged into hexy, described at zOld: Compiler Refactoring Notes.
Major changes on hexy include:
dist/depend.graffleupdated for package structure changes. prospective compiler hackers will want to study this.
compilepackage split into
parse(aka "the front end"),
compile.backpackages (formalizing the informal division that already existed).
- Most common stuff, including the AST classes for representing procedure bodies, has been moved to the
parsepackage. depends only on
compilewhich still has grungy ties to
StructureParserrewritten from scratch, using Scala's parser combinators library. top-level
StructureParserclass is in
parse; the combinators live in
StructureConvertertakes the nice clean top-level AST classes in
parse.StructureDeclarationsand turns them into grungy
IdentifierParseris now called
Namerand was rewritten by decomposing it into a bunch of independent
- Some things that were formerly considered optimizations, like
let-verification, agent type checking, and task variable verification, now live in the parser so that they can be used by Tortoise.
The code for the middle and back ends hasn't changed much. The action has all been on the front end because the goal of these changes is to better support Tortoise, and Tortoise only uses the front end. In some instances code that was inappropriate for Tortoise was relocated from the front end to the middle end.
The rest of this page describes the 5.x compiler. The 5.x information remains relevant since it will probably be maintained indefinitely and people are still building extensions and so forth for it. (Perhaps we should make separate wiki pages for the hexy compiler?)
The compiler is in the
Main entry points
The primary class is
Compiler, which implements
CompilerInterface. The main entry points are
compileMoreCode(). These correspond to two different modes of operation: we're either compiling the entire Code tab, or we're compiling an additional snippet of code (e.g. code in a button, or typed into the Command Center) which is compiled with reference to an already-compiled Code tab.
If compilation fails, a
CompilerException is raised.
If compilation succeeds, the result of compilation is a
case class CompilerResults(procedures: Seq[Procedure], program: Program)
In the case of
procedures will contain only a single Procedure.
Other entry points
Some additional entry points support things like syntax highlighting, syntax checking, etc.
Compilation proceeds in phases. Each phase is implemented by a different class. If you read through
Compiler.compile(), you will see each phase in order. The phases are: Tokenizer, StructureParser, IdentifierParser, ExpressionParser, Visitors, Optimizer, TypeParser, ArgumentStuffer, Assembler, Generator.
The phases may informally be divided into three groups: front end, middle end, and back end. The front end does parsing and some semantic analysis; the middle end does AST rewriting and further semantic analysis; the back end does linearization of control structures and code generation.
The phases are described individually below.
Front end phases
Stringcontaining program text
The entry points are in
Tokenizer, which uses the
TokenLexer class automatically generated by JFlex from the specification in
Tokenizing (a.k.a lexing) NetLogo isn't very different from tokenizing any other language. One odd thing we do is distinguish between different kinds of identifiers (command primitives, reporter primitives, built-in variables, etc.) at tokenization time (using a helper class called
TokenMapper); you might have expected that would be handled by IdentifierParser. (I'm not aware of any special reason we do it here instead of there.)
Map[String, Procedure], and a
StructureParser breaks the code up into declarations and procedures. The information in the declarations (such as globals, breeds, etc.) is stored in the new
Program that is returned.
Procedures are created, and the tokens in each procedure body are stored in a map.
- input: for each procedure, a
IdentifierParser resolves named references in the program. For example it connects procedure calls to the Procedure objects they call and connects variable references to the variables they refer to (be they observer, patch, turtle, breed, parameters, locals, or let).
ExpressionParser converts tokens to parse trees. This is what is usually meant by "parsing". It involves figuring out which reporters are inputs for which other reporters and commands, and interpreting brackets and parentheses, while taking operator precedence into account.
The result type is
Seq[ProcedureDefinition] rather than just
ProcedureDefinition because any task bodies inside the procedure result in separate
Middle end phases
- side effects: mutates ASTs
This is actually several classes, each of which could be considered a distinct phase of compilation, but since each class does a very small job, we lump them together as a single "phase".
Each visitor implements the
AstVisitor interface. This use of the Visitor Pattern makes it very easy to write new classes that do some kind of processing of the parse tree. Putting each kind of processing in a separate visitor keeps the code modular.
At present there are eight visitors:
ReferenceVisitor: fill in reference field of prims that take references to agent variables
ConstantFolder: as an optimization, perform computations at compile time (see http://en.wikipedia.org/wiki/Constant_folding)
TaskVisitor: process reporter tasks and references to task variables
LocalsVisitor: change let variables into Procedure-level local variables (slots in the Activation) where possible
_setcalls more specific, e.g.
_errormessagereporters to their enclosing
Technically this is just a ninth visitor, but it's significant enough to deserve treatment as a phase unto itself.
- side effects: mutates ASTs
Rewrites parts of the parse tree into forms which should run faster.
For example the combination of
_with is rewritten to a call to
_anywith which exits early once an example is found.
Another example is the replacing of
Each optimization is a singleton object within
Here's a probably very out of date list of the optimizations performed:
@ = got lost between 4.0 and 4.1 when the optimizer was rewritten from the ground up. maybe should be re-added someday. == Visitors reduce interpreter overhead by reducing arity: _of(_turtle/patch/breedvariable) => _turtle/patch/breedvariableof [SimpleOfVisitor] avoid constructing LetMap & Let objects: _let/_setletvariable => _setprocedurevariable [LocalsVisitor] _letvariable => _procedurevariable [LocalsVisitor] == Optimizer early loop exit: _any(_with) => _anywith _greatherthan(_countwith,_constdouble:0) => _anywith _notequal(_countwith,_constdouble:0) => _anywith _equal(_countwith,_constdouble:0) => _not(_anywith) _oneof(_with) => _oneofwith @ _any(_breedon) => _anybreedon @ _any(_turtleson) => _anyturtleson @ _equal(_count,_constdouble:0) => _not(_any) @ _greatherthan(_count,_constdouble:0) => _any @ _notequal(_count,_constdouble:0) => _any avoid constructing AgentSet: _count(_with) => _countwith _other(_with) => _otherwith _any(_other) => _anyother _any(_otherwith) => _anyotherwith _count(_other) => _countother _count(_otherwith) => _countotherwith @ _other(_turtleshere) => _otherturtleshere @ _other(_breedhere) => _otherbreedhere @ _count(_turtleshere) => _countturtleshere @ _count(_otherturtleshere) => _countotherturtleshere @ _count(_breedhere) => _countbreedhere @ _count(_otherbreedhere) => _countotherbreedhere generate simpler code when something about arguments is known: _random(_constdouble) => _randomconst if arg is positive integer @@ _call => _tailcall @@ _callreport => _tailcallreport avoid extra work by optimizing special cases: _fd(_constdouble:1) => _fd1 _patchat(-1/0/1,-1/0/1) => _patchnorth/south/etc. _with(_patches,_equal1double(px/ycor)) => _patchcol/_patchrow _with(_patches,_equal(px/ycor,_observer/procedurevariable)) => _patchcol/_patchrow avoid constructing list, avoid shuffling (and for some, reduce interpreter overhead): _sum(_patchvariableof(_neighbors4)) => _nsum4 _sum(_patchvariableof(_neighbors)) => _nsum @ _max/mean/min/sum(_of) => _max/min/mean/sumof // Flocking, GasLab, Ising @ _max/mean/min/sum(_patch/turtle/breedvariableof) => _max/mean/min/sumpatch/turtle/breedvariableof @ (uh oh, there's also "turtlevariabledouble", "turtleorlinkvariable", etc. this gets complicated)
- side effects: mutates
This phase is unique to NetLogo. It involves figuring out what is observer code, what is turtle code, what is patch code and what is link code. Code can be runnable by all four, or just by some of them. If it isn't runnable by any agent type, a compiler error occurs.
A long comment at the top of the class explains the details.
Back end phases
- side effects: mutates args arrays in Instructions
Fills the args arrays, in all of the Instructions anywhere in the Procedure, with Reporters.
No real action happens here. We're basically just discarding the "scaffolding" of the tree of AstNodes, leaving just the Command and Reporter objects themselves.
- output: stores
Mainly we're just discarding the "scaffolding" of the tree of AstNodes, leaving just the Command and Reporter objects themselves.
Exception: optimization of some types of tail recursion happens here, though. (Perhaps that could be reimplemented as a Visitor?)
Most commands are assembled generically, but some commands are “custom assembled”. Typically these are control structures such as
while. Each custom assembled command has an
assemble() method which directs its own assembly via an
- side effects: alters/replaces
Reporterobjects in the procedure's
Translates some commands and reporters directly into JVM byte code. Replaces Command and Reporter instances with instances of built-on-the-fly custom subclasses of GeneratedCommand and GeneratedReporter.
Optional phase, can be disabled. Doesn't run in applet, since applets can't use custom classloaders.