Compiler architecture

Robert Grider edited this page Mar 7, 2018 · 33 revisions

See also Engine Architecture and Package Guide, since the design of the compiler and the design of the engine are of course intertwined.

Compiler Versions

When thinking about the NetLogo compiler, it helps to think of it as three separate compilers which share a number of components. The three compilers are:

  • The Desktop compiler: designed to be run in the front-end GUI, supports all desktop extensions
  • The Headless compiler: a stripped-down JVM-only compiler designed to support and validate NetLogo Web
  • The NetLogo Web compiler: A NetLogo-to-Javascript compiler written in scala.js and designed to be run in-browser.

The compiler infrastructure can be divided into several distinct components which will make it helpful to compare the differences between these versions:

  • Main: The top level of the compiler which coordinates all other activities
  • Lex: The process of turning a character stream into syntactic tokens
  • Parse: The process of turning a stream of tokens into an AST
  • AST Validation: Inspecting the AST to ensure that it represents a valid NetLogo program
  • AST Conversion: Connecting parsed primitives with their runtime representation
  • AST Transformation: Inspecting, optimizing, and transforming the AST for use in a particular runtime
  • Last-Line Optimization: Optimizations which mutate the AST substantially and may make other optimizations impossible (such as constant folding)
  • Code Generation ("Back end"): Turning the AST into JVM or Javascript code

We can make a table comparing which classes / packages implement which functionality in each version of the compiler. N/A indicates that a given compiler phase is skipped for that compiler.

Phase \ Compiler Desktop Headless Web
Main org.nlogo.compile.CompilerMain org.nlogo.compile.Compiler org.nlogo.tortoise.Compiler
Lex org.nlogo.lex org.nlogo.lex org.nlogo.lex
Parse org.nlogo.parse org.nlogo.parse org.nlogo.parse
AST Validation org.nlogo.parse org.nlogo.parse org.nlogo.parse
AST Conversion org.nlogo.compile.middle.FrontMiddleBridge org.nlogo.compile.middle.FrontMiddleBridge N/A
AST Transformation org.nlogo.compile.middle.MiddleEnd org.nlogo.compile.middle.MiddleEnd N/A
Last-Line Optimization org.nlogo.compile org.nlogo.compile.back N/A
Code Generation org.nlogo.compile / org.nlogo.generate org.nlogo.compile.back / org.nlogo.generate org.nlogo.tortoise.{ Handler, Prims }

Main entry points

The primary class is Compiler, which implements CompilerInterface. The main entry points are compileProgram() and compileMoreCode(). These correspond to two different modes of operation: we're either compiling the entire Code tab, or we're compiling an additional snippet of code (e.g. code in a button, or typed into the Command Center) which is compiled with reference to an already-compiled Code tab.

If compilation fails, a CompilerException is raised.

If compilation succeeds, the result of compilation is a CompilerResults object:

case class CompilerResults(procedures: Seq[Procedure], program: Program)

In the case of compileMoreCode(), procedures will contain only a single Procedure.

Other entry points

Some additional entry points support things like syntax highlighting, syntax checking, etc.

Phases

Compilation proceeds in phases. Each phase is implemented by a different class. If you read through Compiler.compile(), you will see each phase in order. The phases are: Tokenizer, StructureParser, IdentifierParser, ExpressionParser, Visitors, Optimizer, TypeParser, ArgumentStuffer, Assembler, Generator.

The phases may informally be divided into three groups: front end, middle end, and back end. The front end does parsing and some semantic analysis; the middle end does AST rewriting and further semantic analysis; the back end does linearization of control structures and code generation.

This diagram illustrates the overall structure of compilation:

(See the diagram legend below for how to read this diagram)

The phases are described individually below.

Front End: Parse

We can subdivide the front-end into two subphases - parsing and transformations. We will cover these in order.

StructureParser.parseSources

  • input: String containing program text
  • output: (StructureResults, Seq[FrontEndProcedure])

StructureParser uses Tokenizer to convert program text into Tokens, then uses scala parser-combinators to determine program declarations. The information in the declarations (such as globals, breeds, etc.) is stored in the new Program that is returned as a part of StructureResults. Procedures are created, and the tokens in each procedure body are stored in a map. Also creates new FrontEndProcedures containing name, arguments, and whether the procedure is a reporter; the tokens in each procedure body are kept in an Iterable[Token] available in a Map on the returned StructureResults.

StructureParser.usedNames

  • input: StructureResults, ProceduresMap containing old procedures
  • output: SymbolTable

usedNames produces a global symbol table including names of primitives, breeds, agent variables, and procedure names. This is used by ExpressionParser to determine whether a symbol is undefined. A nice refactor in the future would be to have SymbolTable hold references to the appropriate primitive or agent variable instead of having TransformableTokenStream use TokenMapper through Namer.

ExpressionParser

  • input: A TransformableTokenStream of named tokens constituting a single procedure body and a SymbolTable
  • output: A ProcedureDefinition AST object

ExpressionParser is responsible for taking a stream of tokens and turning it into a core AST. The ProcedureDefinition returned is the primary output of the FrontEnd of the compiler.

Front End: Transformations

The Compiler Front-End handles parsing, but also performs several checks and transformations on the resulting AST. These are described below.

LetReducer

  • input: ProcedureDefinition
  • output: ProcedureDefinition

Removes a special _letname primitive used only during parsing from the AST.

CarefullyVisitor

  • input: ProcedureDefinition
  • output: ProcedureDefinition

Connects the Let set by a carefully primitive on error with the error-message primitive used to access the value of that Let.

ClosureTagger

  • input: ProcedureDefinition
  • output: ProcedureDefinition

Determines the variables that anonymous commands and reporters close over and attaches those to the respective _commandlambda and _reporterlambda primitives.

SourceTagger

  • input: ProcedureDefinition
  • output: ProcedureDefinition

Sets the source for each _commandlambda and _reporterlambda primitive.

LetVerifier

  • input: ProcedureDefinition
  • output: ProcedureDefinition

Forbids statments of the form let x x.

AgentTypeChecker

  • input: Seq[ProcedureDefinition]
  • output: mutates FrontEndProcedure attached to ASTs passed in and core prims contained in the AST

Iterates through procedures and blocks determining their agent-class (which agents can they be run by). Attempts to constain each block and procedure as tightly as possible. Raises errors if there are any blocks which cannot be run by any type of agent (for instance [ show end1 pcolor ]).

ControlFlowVerifier

  • input ProcedureDefinition
  • output: raises exception if invalid

Validates that the following constraints hold:

  • stop may not used inside a reporter
  • report may not be used inside a command or an anonymous command
  • report must appear immediately inside a reporter and not inside an ask inside a reporter.

Front-Middle Bridge

This part of compilation turns core.ProcedureDefinitions with core primitives into nvm.ProcedureDefinitions with nvm primitives. It also turns FrontEndProcedures into nvm.Procedures (unless they are already those).

Middle end phases

The Middle-End of the compiler is all about platform-specific transformations. Note that all of these transformations are desktop-only, so they are not yet implemented in NetLogo Web. Initially, all of the stages in the middle end mutated the AST instead of producing a new one. While most of the stages in the middle end still do mutate the AST, it is strongly recommended that new changes in the middle end be implemented as transformations on the AST instead of mutations.

LambdaLift

  • input: ProcedureDefinition
  • output: Seq[ProcedureDefinition]

Lambda lifting turns all anonymous commands into their own procedures. This is important because it enables further compilation and optimization for each of those procedures. Several of the other stages in middle end optimize by turning let-variables (stored in nvm.Binding) into local variables (stored in an array on nvm.Activation). Lifting command-lambdas into procedures makes them eligible for these types of optimizations.

ReferenceTransformer

  • input: ProcedureDefinition
  • output: ProcedureDefinition

Referencer primitives are primitives which accept the name of an agent variable as an argument and operate on that named variable without getting its value from a particular agent. The canonical example is the diffuse primitive, although uphill and downhill primitives also have this behavior, as do certain primitives in the GIS extension. ReferenceTransformer finds these primitives in the AST, turns the appropriate argument into an agent type and variable number, combines that information with the existing Referencer primitive to form a new primitive to replace the original, and removes the reference argument from the AST. ReferenceTransformer also affects extension primitives which accept arguments of ReferenceType. In the case of extension primitives it works slightly differently: it replaces the argument primitive, turning it from an _agentvariable primitive into a constant primitive which returns the agent type, variable number, and name.

ScopeTransformer

  • input: ProcedureDefinition
  • output: ProcedureDefinition

SourceTagger

  • input: ProcedureDefinition
  • side effects: mutates ASTs

Walks through the AST and tags each primitive with its source code. Primitives have two source fields: source and fullSource. source is given by the user text found in the appropriate SourceLocation (this is the name of the primitive as the user typed it, like "pcolor" or "PI"). fullSource is given by the user text of a primitive and the text of all of its arguments, including any blocks or anonymous procedures. This information is most commonly used to provide effective errors to users.

SimpleOfVisitor

  • input: ProcedureDefinition
  • side effects: mutates ASTs

Converts _of(_*variable) to _*variableof.

LambdaVariableVisitor

  • input: ProcedureDefinition
  • side effects: mutates ASTs

LocalsVisitor

  • input: ProcedureDefinition
  • side effects: mutates ASTs

Changes let variables into Procedure-level local variables (slots in the Activation) where possible.

RepeatVisitor

  • input: ProcedureDefinition
  • side effects: mutates ASTs

SetVisitor

  • input: ProcedureDefinition
  • side effects: mutates ASTs

Make _set calls more specific, e.g. _setobservervariable.

Optimizer

  • input: ProcedureDefinition
  • side effects: mutates ASTs

Optimizer

This is just a ninth visitor, but it's significant enough to deserve treatment as a phase unto itself.

  • input: ProcedureDefinition
  • side effects: mutates ASTs

Rewrites parts of the parse tree into forms which should run faster.

For example the combination of _any and _with is rewritten to a call to _anywith which exits early once an example is found.

Another example is the replacing of _with(_patches,_equal(_patchvariable:PXCOR,_constdouble:5.0)) with _patchcolumn:5.

Each optimization is a subclass of compile.api.ReporterMunger or compile.api.CommandMunger.

A full list of optimizations and how they are performed can be found in Optimization List. Note that Desktop and GUI use slightly different sets of Optimizations which can result in slightly different numerical results.

Back end phases

Constant Folder

  • input: ProcedureDefinition
  • side effects: mutates Pure Instructions into constants

Prform computations at compile time (see wikipedia article).

ArgumentStuffer

  • input: ProcedureDefinition
  • side effects: mutates args arrays in Instructions

Fills the args arrays, in all of the Instructions anywhere in the Procedure, with Reporters.

No real action happens here. We're basically just discarding the "scaffolding" of the tree of AstNodes, leaving just the Command and Reporter objects themselves.

Assembler

  • input: ProcedureDefinition
  • side effect: stores Array[Command] in code field of Procedure

Mainly we're just discarding the "scaffolding" of the tree of AstNodes, leaving just the Command and Reporter objects themselves.

Exception: optimization of some types of tail recursion happens here, though. (Perhaps that could be reimplemented as a Visitor?)

Most commands are assembled generically, but some commands are “custom assembled”. Typically these are control structures such as ask and while. Each custom assembled command has an assemble() method which directs its own assembly via an AssemblerAssistant.

Generator

  • input: Procedure
  • side effects: alters/replaces Command and Reporter objects in the procedure's code field

Translates some commands and reporters directly into JVM byte code. Replaces Command and Reporter instances with instances of built-on-the-fly custom subclasses of GeneratedCommand and GeneratedReporter.

Optional phase, can be disabled.

Diagram Legend

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.