Bytecode interpreter for a small imperative Lisp-1 dialect.
Forms are evaluated according to the following rules:
-
(F ...)
whereF
is a special form is evaluated as per below. -
(M ARG...)
whereM
is a symbol globally bound to(F)
gets recursively expanded to the result of applyingF
toARG...
, before the top-level form is evaluated. -
(F ARG...)
first evaluatesF
and then applies the result to the result of evaluating each ofARG...
.Tail-call elimination in return-positions is guaranteed.
-
SYM
evaluates to the value of lexically-scoped variable namedSYM
. -
nil
and integers evaluate to themselves.
The language consists of the following special forms:
-
(fn (ARG...) BODY...)
Returns a new closure.
-
(let ([(SYM VAL)]...) BODY...)
Evaluates
(progn BODY...)
with eachSYM
bound to the value of its correspondingVAL
form, which may refer to earlierSYM
s. -
(set SYM VAL)
Sets the value of the lexically-scoped variable named
SYM
to the result of evaluatingVAL
. -
(if COND THEN ELSE...)
If
COND
evaluates to a non-nil value, thenTHEN
is evaluated, otherwise(progn ELSE...)
is evaluated. -
(quote ARG)
Returns
ARG
unevaluated. The reader shorthand'
exists, e.g.'a
is the same as(quote a)
. The mistake ofquote
returning reader constants is perpetuated.
Furthermore, currently the following set of built-in functions are defined:
print
, cons
, car
, cdr
, +
and <
.
This project is both an opportunity to learn about writing Lisp interpreters, and an exploration of the Emacs-esque Lisp environment design space. While I prefer immutable statically-typed languages in general, interactive programming, which is inherently mutable, has its upsides. The uniform syntax of Lisp is good at facilitating selective evaluation of subexpressions.
GNU Emacs, while a decent program, has a couple of aspects that I would like to see if they can be done differently:
-
GNU Emacs 28 introduced an AOT Lisp compiler to native code. However, given the dynamic nature of the Lisp environment there are very few "safe" optimizations it can perform. Indeed, it still does performs some naive optimizations such as inlining redefinable built-in functions. As such, the speedup mostly comes from making function calls cheaper. More interesting would be a JIT compiler, that was able to specialize on the observed types of runtime values.
-
"Symbols with position" were introduced to be able to include the source code span of the offending construct in error reports. This meant that all places in the code that interacted with symbols now has to have a branch on whether the symbol has a position attached... Instead, the standard solution used by all other interpreters is to just have the reader return Token × Span pairs (possibly interleaving the reader and the compiler to make it zero-cost), and storing a compressed map from bytecode to line numbers in Lisp function values.
-
GNU Emacs uses gap buffers to store buffer text which is a fine choice. However, that means line length caches must be maintained separately. If a rope was used instead it could easily be adapted to carry also that information.
-
Symbol properties should not be.
Instead "open closures," i.e. closures with plists, should be embraced, as they already have uses for documentation strings,
interactive
specifications and advices. -
Designing the interpreter from the start not to rely on global state, makes it easy to spin up unsafe parallel interpreter threads. It is terrible compared to the state-of-the-art parallelism offered by purely functional languages, but also a very Lispy solution.
-
To avoid duplication, this project does not implement a tree-walking interpreter in addition to the bytecode interpreter.
This means the bytecode compiler has to be written in C instead of Lisp. On the other hand, it will actually be fast.
-
The built-in package manager
package.el
manages to be simultaneously over-engineered and inadequate.My preferred solution would be to just provide a thin wrapper around Nix or similar, to avoid reinventing the wheel poorly.