Skip to content

SolomonKritz/scheme-compiler

Repository files navigation

Overview

This is a compiler for a subset of scheme. It was largely written by myself, Solomon Kritz, over the course of the Fall 2017 Semester as part of UMD's CMSC430 course, although the testing and utils frameworks originated with Javran Cheng and Professor Gilray.

Setup

  • Install Racket
  • Install llvm and clang++
  • Install the Boehm-Demers-Weiser Garbage Collector
  • Clone this repository
  • Change the variables libgc-path, gc-include-path, and clang++-path in utils.rkt to reflect the locations of libgc.a, the BDWGC include folder, and clang++.
  • If using Windows, you may need to make slight changes to the function(s) eval-llvm and/or compile-llvm in utils.rkt as I developed this on linux.

Usage

To compile a file, ensure it contains only forms from the input language below and has the extension .scm, then place it in the folder tests/compile/. At this point, running "racket tests.rkt compile <filename, excluding .scm extension>" will compile it to a binary named bin, as well as create a combined​.​ll file in LLVM IR.

To write a test for the compiler, again create a program in the input language with the extension .scm. Place it in one of the folders in tests/ besides compile/. Running "racket tests.rkt <filename, excluding .scm extension>" will evaluate the program before compilation, compile it, and then evaluate the program again after compilation to ensure the results are the same.

You can add more folders to tests/, just be sure to also add them to either the tests or to-compile functions in tests.rkt

Supported Input Language

Note: This language is a subset of Scheme, and all the forms below are the same as their Scheme counterparts except for defines. There is no support for top-level defines (i.e., all defines are treated as internal and desugared to letrec*).

e ::= (define x e)
      | (define (x x ... defaultparam ...) e ...+)
      | ( define (x x ... . x) e ...+)
      | (letrec* ([x e] ...) e ...+)
      | (letrec ([x e] ...) e ...+)
      | (let* ([x e] ...) e ...+)
      | (let ([x e] ...) e ...+)
      | (let x ([x e] ...) e ...+)
      | (lambda (x ... defaultparam ...) e ...+)
      | (lambda x e ...+)
      | (lambda (x ...+ . x) e ...+)
      | (dynamic-wind e e e)
      | (guard (x cond-clause ...) e ...+)
      | (raise e)
      | (delay e)
      | (force e)
      | (and e ...)
      | (or e ...)
      | (cond cond-clause ...)
      | (case e case-clause ...)
      | (if e e e)
      | (when e e ...+)
      | (unless e e ...+)
      | (set! x e)
      | (begin e ...+)
      | (call/cc e)
      | (let/cc x e)
      | (apply e e)
      | (e e ...)
      | x
      | op
      | (quasiquote qq)
      | (quote dat)
      | char | nat | string | #t | #f
      | e

  • cond-clause ::= (e) | (e e e ...) | (else e e ...)
  • case-clause ::= ((dat ...) e e ...) | (else e e ...)
    • in all cases, else clauses must come last
  • dat is a datum satisfying datum? from utils.rkt
  • x is a variable (satisfies symbol?)
  • defaultparam ::= (x e)
  • op is a primitive from the list below.
  • qq ::= e | dat | (unquote qq) | (unquote e) | (quasiquote qq) | (qq ...+) | (qq ...+ . qq)

Supported Primitives

Note: All primitives are supported both using standard application and with a variadic apply version.

Prim Arguments Returns Description
= Int Int ...+ Bool Returns #t if all the arguments are numerically equal, #f otherwise.
> Int Int ...+ Bool Returns #t if the arguments in the given order are strictly decreasing, #f otherwise.
< Int Int ...+ Bool Returns #t if the arguments in the given order are strictly increasing, #f otherwise.
<= Int Int ...+ Bool Returns #t if the arguments in the given order are non-decreasing, #f otherwise.
>= Int Int ...+ Bool Returns #t if the arguments in the given order are non-increasing, #f otherwise.
+ Int … Int Returns the sum of the arguments, adding pairwise from left to right. If no arguments are provided, the result is 0.
- Int Int ...+ Int Returns the subtraction of the latter arguments from first argument working pairwise from left to right.
* Int … Int Returns the product of the arguments, multiplying pairwise from left to right. If no arguments are provided, the result is 1.
/ Int ...+ Int If only one argument is provided, that argument is returned. Otherwise, returns the division of the first argument by the latter arguments working pairwise from left to right. Can cause a "RUNTIME ERROR: Cannot divide by zero" (string).
cons? Any Bool Returns #t if argument is a cons cell, #f otherwise.
null? Any Bool Returns #t if argument is the empty list (null), #f otherwise.
cons Any Any Cons Cell Returns a newly allocated cons cell whose first element is the first argument and second element is the second argument.
car Cons Cell Any Returns the first element of the cons-cell provided.
cdr Cons Cell Any Returns the second element of the cons-cell provided.
list Any … List Returns a newly allocated list containing the arguments provided as elements.
first List Any Returns the first element of the list provided.
second List Any Returns the second element of the list provided.
third List Any Returns the third element of the list provided.
fourth List Any Returns the fourth element of the list provided.
length List Int Returns the number of elements in the list provided.
drop List Int Any Returns the tail of the list provided in the first argument, from the index provided in the second argument until the end of the list.
memv Any List List OR Bool Locates the first element of the list provided in the second argument that is equal to the first argument. If such an element exists, the tail of the list starting with that element is returned. Otherwise, the result is #f.
map Proc List List Applies the procedure provided to the elements of the list, from the first element in to the last. The result is a list containing each result of the procedure in order.
append List … OR List … Any List OR Any When given all list arguments, the result is a list that contains all of the elements of the given lists in order. The last argument is used directly in the tail of the result. The last argument need not be a list, in which case the result is an "improper list."
foldl Proc Any List Any This applies a procedure to the elements of a list, and combines the return values in an arbitrary way determined by the procedure. The procedure must take 2 arguments. It is initially invoked with the first item of the list and the second argument to foldl. In subsequent invocations the second argument to the procedure is not the original second argument provided to foldl, but the return value from the previous invocation of the procedure. Input lists are traversed from left to right. The final result is the result of the last invocation of the procedure.
foldr Proc Any List Any The same as foldl (see above), but the list is traversed from right to left.
vector? Any Bool Returns #t if argument is a vector, #f otherwise.
vector Any … Vector Returns a newly allocated mutable vector with as many slots arguments provided, where the slots are initialized to contain the given arguments in order.
make-vector Int Any Vector Returns a mutable vector with whose size is given by the first argument, where all slots are initialized to contain the second argument.
vector-ref Vector Int Any Returns the element of the vector found in the slot is given by the second argument. The slots are 0 indexed. An attempt to access a slot larger than or equal to the vector's length causes a "RUNTIME ERROR: Attempt to use a vector index that is out of bounds" (string).
vector-set! Vector Int Any Void Updates the slot given by the second argument in the vector given by the first argument to be the element given by the third argument. An attempt to access a slot larger than or equal to the vector's length causes a "RUNTIME ERROR: Attempt to use a vector index that is out of bounds" (string).
vector-length Vector Int Returns the length of the vector
list? Any Bool Returns #t if v is a list: either the empty list, or a pair whose second element is a list.
void? Any Bool Returns #t if v is the constant #<void>, #f otherwise.
promise? Any Bool Returns #t if v is a promise, #f otherwise.
procedure? Any Bool Returns #t if v is a procedure, #f otherwise.
number? Any Bool Returns #t if v is a integer, #f otherwise.
integer? Any Bool Returns #t if v is a integer, #f otherwise.
void Void Returns the constant #<void>.
print Any Void Prints the argument.
halt Any Any Ends the program, returning the argument provided.
eq? Any Any Bool Return #t if the first and second arguments refer to the same object, #f otherwise.
eqv? Any Any Bool Two values are eqv? if and only if they are eq?, except for Ints, Strings, Syms, and Chars. Chars are eqv? if they have the same ascii value. Ints are eqv? if they have the same numerical value. Strings are eqv? if they have the same characters in the same positions. Syms are eqv? If their string equivalents are eqv?.
not Any Bool Returns #t if the argument is #f, #f otherwise.
substring String Int [Int] String Returns a new mutable string that goes from the index given by the second argument, inclusive, to the index given by the third argument, exclusive. If no third argument is given, this just goes to the end of the list.
string-append String … String Returns a new mutable string that is as long as the sum of the given strings' lengths, and that contains the concatenated characters of the given strings. If no strings are provided, the result is a zero-length string.
string-ref String Int Char Returns the character at the position given by the second argument in the string given by the first argument. The first position corresponds to 0.
string->list String List of Chars Returns a new list of characters corresponding to the contents of the string.
string Char … String Returns a new mutable string whose length is the number of provided chars, and whose positions are initialized with the given chars.
haltwithtoomanyerr String Ends the program, causes a "RUNTIME ERROR: Too many arguments provided to a function" (string).
haltwithtoofewerr String Ends the program, causes a "RUNTIME ERROR: Too few arguments provided to a function" (string).

Error Handling

This compiler is fairly weak in the error handling department. However, there are several classs of errors I do handle. First of all, the primitives functions check that they are given appropriate types and raise an error message if they are not. Additionally:

  • If one provides too many arguments to a function the program will halt with "RUNTIME ERROR: Too many arguments provided to a function" (string). This also works with primitive functions like map.
  • If one provides too few arguments to a function the program will halt with "RUNTIME ERROR: Too few arguments provided to a function" (string). However, this does not work with primitive functions or variadic argument functions.
  • If a non-function value is applied (e.g., ("Not a Function" 6) or (apply "hey" '(6))) then the program will halt with "RUNTIME ERROR: Cannot apply non-function value".
  • If one attempts to divide by zero, then the program will halt with "RUNTIME ERROR: Cannot divide by zero" (string).
  • If one attempts to access a vector index that is out of bounds, either by set! or ref, then the program will halt with a "RUNTIME ERROR: Attempt to use a vector index that is out of bounds" (string).

Tests for the 5 bulleted errors above can be found in the tests/exceptions/ folder. The testing framework accounts for these errors by recognizing when the evaluation of the original scheme was throwing a corresponding error, catching it, and then returning the text of the applicable error above. More error cases can be added to the testing framework in the same manner.

There are still many errors this compiler doesn't handle, or handle well. For instance, using string-ref or string set! to access an out of bounds index is not handled and will likely seg-fault. Trying to use an unitialized variable causes compilation to fail with a meaningless error message. There is no memory cap on programs that are compiled. Syntax errors typically fail to compile with a meaningless error. An attempt to use an unsupported primitive causes compilation to fail with an ugly, although comprehensible, message. etc.

Future Features

First and foremost I would like to add pattern-matching. This is a key-feature in any functional programming language. I am not sure how feasible adding top-level defines would be, but that would be good to look into as well. Next I would like to add a better system for errors than just halting, and add more types of exceptions with more dynamic messages. After that I would like to add more data-structures. Currently there are lists and vectors, but that's it. Professor Gilray recommended adding functional hash-maps and/or hash-sets with HAMT, and it would also be good to add mutable hash-sets. Finally, it is important to work on efficiency, as I was primarily focused on getting this to work over anything else, and it shows in that there is a lot of bloat.

Honor Pledge

I pledge on my honor that I have not given or received any unauthorized assistance on this compiler.
- Solomon Kritz

About

A basic compiler for a subset of scheme

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published