Diamondback

In this assignment, you'll implement a compiler for a small language with functions declarations and function calls, conforming to the C stack layout. You'll also add some static well-formedness checks to the compiler.

This is the best I could do:

Double-word
Intel Architecture
MOstly dynamic
Nested-expression
(Diamondback supports recursion)
Boolean-tagged
ANF-transformed
Compiler.
K?

The Diamondback Language

As usual, we have concrete and abstract syntaxes, along with a specification of semantics.

Concrete Syntax

The major addition to Diamondback are function declarations. Our programs are now a sequence of zero or more function declarations, followed by a single main expression.

<program> :=
  | <decls> <expr>
  | <expr>

<decls> :=
  | <decl>
  | <decl> <decls>

<decl> :=
  | def <identifier>(<ids>): <expr>
  | def <identifier>(): <expr>

<ids> :=
  | <identifier>
  | <identifier> , <ids>

<expr> :=
  | let <bindings> in <expr>
  | if <expr>: <expr> else: <expr>
  | <binop-expr>

<binop-expr> :=
  | <identifier>
  | <number>
  | true
  | false
  | add1(<expr>)
  | sub1(<expr>)
  | isnum(<expr>)
  | isbool(<expr>)
  | print(<expr>)
  | <identifier>(<exprs>)
  | <identifier>()
  | <expr> + <expr>
  | <expr> - <expr>
  | <expr> * <expr>
  | <expr> < <expr>
  | <expr> > <expr>
  | <expr> == <expr>
  | ( <expr> )

<exprs> :=
  | <expr>
  | <expr> , <exprs>

<bindings> :=
  | <identifier> = <expr>
  | <identifier> = <expr>, <bindings>

The other addition is function applications, which are written <identifier>(<exprs>), for example f(1, 2, 3). This is the syntax for a call to a function.

Abstract Syntax

As usual, we have a user-facing syntax and a compiler-facing syntax. In this assignment, these definitions have been split into the separate expr.ml file.

type prim1 =
  | Add1
  | Sub1
  | Print
  | IsNum
  | IsBool

type prim2 =
  | Plus
  | Minus
  | Times
  | Less
  | Greater
  | Equal

type expr =
  | ELet of (string * expr) list * expr
  | EPrim1 of prim1 * expr
  | EPrim2 of prim2 * expr * expr
  | EApp of string * expr list
  | EIf of expr * expr * expr
  | ENumber of int
  | EBool of bool
  | EId of string

type decl =
  | DFun of string * string list * expr

type program =
  | Program of decl list * expr


type immexpr =
  | ImmNumber of int
  | ImmBool of bool
  | ImmId of string

and cexpr =
  | CPrim1 of prim1 * immexpr
  | CPrim2 of prim2 * immexpr * immexpr
  | CApp of string * immexpr list
  | CIf of immexpr * aexpr * aexpr
  | CImmExpr of immexpr

and aexpr =
  | ALet of string * cexpr * aexpr
  | ACExpr of cexpr

and adecl =
  | ADFun of string * string list * aexpr

and aprogram =
  | AProgram of adecl list * aexpr

The EApp and CApp constructors correspond to function applications, and the decl and adecl types respresent function declarations. A program is a list of decls (surface) or adecls (ANF), with a main expression.

Semantics

There are several distinguishing features of Diamondback. The first is function applications. A function application should give the answer we'd get if we followed the rules for substituting argument values for parameter names. So, for example:

def f(x, y):
  x + y

f(1, 2)

Should produce 3.

Your compiler should use the rules for C stacks discussed in class and at this assembly guide to implement this behavior.

There are a number of new errors that can occur now that we have function declarations and calls. Your implementation should catch all of these cases statically; that is, before the program runs:

A function application with the wrong number of arguments should signal an error containing the string "arity"
A function application of a non-existent function should signal an error containing the string "not defined"
An identifier without a corresponding binding location should report an error containing the string "unbound"
A let binding with duplicate names should report an error containg the string "duplicate binding"
A function declaration with duplicate names in the argument list should report an error containg the string "duplicate parameter"
If there are multiple function definitions with the same name, report an error containing the string "duplicate function"
If a numeric constant is too large (as discussed in class), report an error containing the string "too large"

Again, these errors should stop the program from compiling, not happen at runtime. You can continue to assume that all identifiers within a function body have different names, which is a requirement for ANF (we could implement another pass to rename variables, but we won't do that here). See the notes on well_formed below for implementation details.

Implementation

You shouldn't need any new assembly instructions to tackle this implementation. You're free to add your own new instructions (some of you used sar and test and jg, for example, and you can carry those over).

There are a few new pieces to the implementation:

A new structure for ANF designed to reduce the number of intermediate variables created. You should study and understand it so you can add the ECall case. This is the first thing you should add, and you'll probably want to define a helper that performs ANF on a list of expressions, and takes a "k" that expects to receive a list of results.
A new set of well_formed functions, which are called prior to performing ANF and are used to report the errors above. These functions all return a string list, which represents all the errors that are found in a program or expression. So a program like
```
def f(x, x):
  y
f(1)
```
would report three errors, one for y being unbound, one for duplicated x parameters, and one for the arity mismatch on the call to f. Each string should contain (at least) the corresponding substring above.

These errors can be reported in any order; in general (and in grading), it's easy to test for one at a time. But reporting many makes using the main of the compiler much more pleasant, and is a nice view into the kinds of compiler ergonomics we should expect from a modern compiler.
The AProgram structure, which contains both function declarations and the expression for our_code_starts_here. You will probably want to generate an assembly structure that looks like:
```
;; extern and global stuff
```

fun_decl1: ;; code for fun_decl1, including stack management fun_decl2: ;; code for fun_decl2, including stack management ... our_code_starts_here: ;; main entrypoint, as before, with stack management internal_error_non_number: ;; errors, as before ...


### Testing

There is one new testing function provided, `tvg`.  This works just like `t`,
except it runs `valgrind` on the executable it creates, and checks whether or
not it reports errors.  If `valgrind` does report memory errors, the test
fails and the errors are reported.  The test succeeds if there are no memory
errors and the answer is correct.

You can test the well-formedness errors using the error tester as usual.

### Handin

Hand a completed implementation in by **Friday**, March 4 at 11:59pm (there is
no programming assignment out over break).

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
input		input
output		output
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
compile.ml		compile.ml
expr.ml		expr.ml
instruction.ml		instruction.ml
lexer.mll		lexer.mll
main.c		main.c
main.ml		main.ml
parser.mly		parser.mly
runner.ml		runner.ml
test.ml		test.ml

compilers-course-materials/starter-diamondback

Folders and files

Latest commit

History

Repository files navigation

Diamondback

The Diamondback Language

Concrete Syntax

Abstract Syntax

Semantics

Implementation

About

Resources

Stars

Watchers

Forks

Languages