This is a compiler for a subset of scheme. It was largely written by myself, Solomon Kritz, over the course of the Fall 2017 Semester as part of UMD's CMSC430 course, although the testing and utils frameworks originated with Javran Cheng and Professor Gilray.
- Install Racket
- Install llvm and clang++
- Install the Boehm-Demers-Weiser Garbage Collector
- Clone this repository
- Change the variables libgc-path, gc-include-path, and clang++-path in utils.rkt to reflect the locations of libgc.a, the BDWGC include folder, and clang++.
- If using Windows, you may need to make slight changes to the function(s) eval-llvm and/or compile-llvm in utils.rkt as I developed this on linux.
To compile a file, ensure it contains only forms from the input language below and has the extension .scm, then place it in the folder tests/compile/. At this point, running "racket tests.rkt compile <filename, excluding .scm extension>" will compile it to a binary named bin, as well as create a combined.ll file in LLVM IR.
To write a test for the compiler, again create a program in the input language with the extension .scm. Place it in one of the folders in tests/ besides compile/. Running "racket tests.rkt <filename, excluding .scm extension>" will evaluate the program before compilation, compile it, and then evaluate the program again after compilation to ensure the results are the same.
You can add more folders to tests/, just be sure to also add them to either the tests or to-compile functions in tests.rkt
Note: This language is a subset of Scheme, and all the forms below are the same as their Scheme counterparts except for defines. There is no support for top-level defines (i.e., all defines are treated as internal and desugared to letrec*).
e ::= (define x e)
| (define (x x ... defaultparam ...) e ...+)
| ( define (x x ... . x) e ...+)
| (letrec* ([x e] ...) e ...+)
| (letrec ([x e] ...) e ...+)
| (let* ([x e] ...) e ...+)
| (let ([x e] ...) e ...+)
| (let x ([x e] ...) e ...+)
| (lambda (x ... defaultparam ...) e ...+)
| (lambda x e ...+)
| (lambda (x ...+ . x) e ...+)
| (dynamic-wind e e e)
| (guard (x cond-clause ...) e ...+)
| (raise e)
| (delay e)
| (force e)
| (and e ...)
| (or e ...)
| (cond cond-clause ...)
| (case e case-clause ...)
| (if e e e)
| (when e e ...+)
| (unless e e ...+)
| (set! x e)
| (begin e ...+)
| (call/cc e)
| (let/cc x e)
| (apply e e)
| (e e ...)
| x
| op
| (quasiquote qq)
| (quote dat)
| char | nat | string | #t | #f
| e
- cond-clause ::= (e) | (e e e ...) | (else e e ...)
- case-clause ::= ((dat ...) e e ...) | (else e e ...)
- in all cases, else clauses must come last
- dat is a datum satisfying datum? from utils.rkt
- x is a variable (satisfies symbol?)
- defaultparam ::= (x e)
- op is a primitive from the list below.
- qq ::= e | dat | (unquote qq) | (unquote e) | (quasiquote qq) | (qq ...+) | (qq ...+ . qq)
Note: All primitives are supported both using standard application and with a variadic apply version.
Prim | Arguments | Returns | Description |
---|---|---|---|
= | Int Int ...+ | Bool | Returns #t if all the arguments are numerically equal, #f otherwise. |
> | Int Int ...+ | Bool | Returns #t if the arguments in the given order are strictly decreasing, #f otherwise. |
< | Int Int ...+ | Bool | Returns #t if the arguments in the given order are strictly increasing, #f otherwise. |
<= | Int Int ...+ | Bool | Returns #t if the arguments in the given order are non-decreasing, #f otherwise. |
>= | Int Int ...+ | Bool | Returns #t if the arguments in the given order are non-increasing, #f otherwise. |
+ | Int … | Int | Returns the sum of the arguments, adding pairwise from left to right. If no arguments are provided, the result is 0. |
- | Int Int ...+ | Int | Returns the subtraction of the latter arguments from first argument working pairwise from left to right. |
* | Int … | Int | Returns the product of the arguments, multiplying pairwise from left to right. If no arguments are provided, the result is 1. |
/ | Int ...+ | Int | If only one argument is provided, that argument is returned. Otherwise, returns the division of the first argument by the latter arguments working pairwise from left to right. Can cause a "RUNTIME ERROR: Cannot divide by zero" (string). |
cons? | Any | Bool | Returns #t if argument is a cons cell, #f otherwise. |
null? | Any | Bool | Returns #t if argument is the empty list (null), #f otherwise. |
cons | Any Any | Cons Cell | Returns a newly allocated cons cell whose first element is the first argument and second element is the second argument. |
car | Cons Cell | Any | Returns the first element of the cons-cell provided. |
cdr | Cons Cell | Any | Returns the second element of the cons-cell provided. |
list | Any … | List | Returns a newly allocated list containing the arguments provided as elements. |
first | List | Any | Returns the first element of the list provided. |
second | List | Any | Returns the second element of the list provided. |
third | List | Any | Returns the third element of the list provided. |
fourth | List | Any | Returns the fourth element of the list provided. |
length | List | Int | Returns the number of elements in the list provided. |
drop | List Int | Any | Returns the tail of the list provided in the first argument, from the index provided in the second argument until the end of the list. |
memv | Any List | List OR Bool | Locates the first element of the list provided in the second argument that is equal to the first argument. If such an element exists, the tail of the list starting with that element is returned. Otherwise, the result is #f. |
map | Proc List | List | Applies the procedure provided to the elements of the list, from the first element in to the last. The result is a list containing each result of the procedure in order. |
append | List … OR List … Any | List OR Any | When given all list arguments, the result is a list that contains all of the elements of the given lists in order. The last argument is used directly in the tail of the result. The last argument need not be a list, in which case the result is an "improper list." |
foldl | Proc Any List | Any | This applies a procedure to the elements of a list, and combines the return values in an arbitrary way determined by the procedure. The procedure must take 2 arguments. It is initially invoked with the first item of the list and the second argument to foldl. In subsequent invocations the second argument to the procedure is not the original second argument provided to foldl, but the return value from the previous invocation of the procedure. Input lists are traversed from left to right. The final result is the result of the last invocation of the procedure. |
foldr | Proc Any List | Any | The same as foldl (see above), but the list is traversed from right to left. |
vector? | Any | Bool | Returns #t if argument is a vector, #f otherwise. |
vector | Any … | Vector | Returns a newly allocated mutable vector with as many slots arguments provided, where the slots are initialized to contain the given arguments in order. |
make-vector | Int Any | Vector | Returns a mutable vector with whose size is given by the first argument, where all slots are initialized to contain the second argument. |
vector-ref | Vector Int | Any | Returns the element of the vector found in the slot is given by the second argument. The slots are 0 indexed. An attempt to access a slot larger than or equal to the vector's length causes a "RUNTIME ERROR: Attempt to use a vector index that is out of bounds" (string). |
vector-set! | Vector Int Any | Void | Updates the slot given by the second argument in the vector given by the first argument to be the element given by the third argument. An attempt to access a slot larger than or equal to the vector's length causes a "RUNTIME ERROR: Attempt to use a vector index that is out of bounds" (string). |
vector-length | Vector | Int | Returns the length of the vector |
list? | Any | Bool | Returns #t if v is a list: either the empty list, or a pair whose second element is a list. |
void? | Any | Bool | Returns #t if v is the constant #<void>, #f otherwise. |
promise? | Any | Bool | Returns #t if v is a promise, #f otherwise. |
procedure? | Any | Bool | Returns #t if v is a procedure, #f otherwise. |
number? | Any | Bool | Returns #t if v is a integer, #f otherwise. |
integer? | Any | Bool | Returns #t if v is a integer, #f otherwise. |
void | Void | Returns the constant #<void>. | |
Any | Void | Prints the argument. | |
halt | Any | Any | Ends the program, returning the argument provided. |
eq? | Any Any | Bool | Return #t if the first and second arguments refer to the same object, #f otherwise. |
eqv? | Any Any | Bool | Two values are eqv? if and only if they are eq?, except for Ints, Strings, Syms, and Chars. Chars are eqv? if they have the same ascii value. Ints are eqv? if they have the same numerical value. Strings are eqv? if they have the same characters in the same positions. Syms are eqv? If their string equivalents are eqv?. |
not | Any | Bool | Returns #t if the argument is #f, #f otherwise. |
substring | String Int [Int] | String | Returns a new mutable string that goes from the index given by the second argument, inclusive, to the index given by the third argument, exclusive. If no third argument is given, this just goes to the end of the list. |
string-append | String … | String | Returns a new mutable string that is as long as the sum of the given strings' lengths, and that contains the concatenated characters of the given strings. If no strings are provided, the result is a zero-length string. |
string-ref | String Int | Char | Returns the character at the position given by the second argument in the string given by the first argument. The first position corresponds to 0. |
string->list | String | List of Chars | Returns a new list of characters corresponding to the contents of the string. |
string | Char … | String | Returns a new mutable string whose length is the number of provided chars, and whose positions are initialized with the given chars. |
haltwithtoomanyerr | String | Ends the program, causes a "RUNTIME ERROR: Too many arguments provided to a function" (string). | |
haltwithtoofewerr | String | Ends the program, causes a "RUNTIME ERROR: Too few arguments provided to a function" (string). |
This compiler is fairly weak in the error handling department. However, there are several classs of errors I do handle. First of all, the primitives functions check that they are given appropriate types and raise an error message if they are not. Additionally:
- If one provides too many arguments to a function the program will halt with "RUNTIME ERROR: Too many arguments provided to a function" (string). This also works with primitive functions like map.
- If one provides too few arguments to a function the program will halt with "RUNTIME ERROR: Too few arguments provided to a function" (string). However, this does not work with primitive functions or variadic argument functions.
- If a non-function value is applied (e.g., ("Not a Function" 6) or (apply "hey" '(6))) then the program will halt with "RUNTIME ERROR: Cannot apply non-function value".
- If one attempts to divide by zero, then the program will halt with "RUNTIME ERROR: Cannot divide by zero" (string).
- If one attempts to access a vector index that is out of bounds, either by set! or ref, then the program will halt with a "RUNTIME ERROR: Attempt to use a vector index that is out of bounds" (string).
Tests for the 5 bulleted errors above can be found in the tests/exceptions/ folder. The testing framework accounts for these errors by recognizing when the evaluation of the original scheme was throwing a corresponding error, catching it, and then returning the text of the applicable error above. More error cases can be added to the testing framework in the same manner.
There are still many errors this compiler doesn't handle, or handle well. For instance, using string-ref or string set! to access an out of bounds index is not handled and will likely seg-fault. Trying to use an unitialized variable causes compilation to fail with a meaningless error message. There is no memory cap on programs that are compiled. Syntax errors typically fail to compile with a meaningless error. An attempt to use an unsupported primitive causes compilation to fail with an ugly, although comprehensible, message. etc.
First and foremost I would like to add pattern-matching. This is a key-feature in any functional programming language. I am not sure how feasible adding top-level defines would be, but that would be good to look into as well. Next I would like to add a better system for errors than just halting, and add more types of exceptions with more dynamic messages. After that I would like to add more data-structures. Currently there are lists and vectors, but that's it. Professor Gilray recommended adding functional hash-maps and/or hash-sets with HAMT, and it would also be good to add mutable hash-sets. Finally, it is important to work on efficiency, as I was primarily focused on getting this to work over anything else, and it shows in that there is a lot of bloat.
I pledge on my honor that I have not given or received any unauthorized assistance on this compiler.
- Solomon Kritz