-
Notifications
You must be signed in to change notification settings - Fork 0
Parser
A parser typically defines a process that converts a stream of tokens into an Abstract Syntax Tree. An Abstract Syntax Tree (AST) is the most convenient internal representation of a program.
The Lambda Calculus has an extremely simple syntax. In tree form we can represent any term as one of four cases: Variable, Literal, Lambda Abstraction, or Application. In our language we will represent these terms as follows.
(Var variable_name)
(Lit literal_value)
(Abs lhs rhs)
(App function argument)
We have also extended the core calculus to include global bindings. We may represent these as follows.
(Global global_name term_value)
Putting this together, the goal of parsing will be to take a stream of tokens like this
f := λx. y z
and turn it into something like this
(Global f (Abs
(Var x)
(App (Var y) (Var z))
))
Parsing can be implemented with a simple recursive descent function that nibbles on tokens.
parse-program := λtoks. (
(local program)
(local pme)
(while toks (
match toks (
((key (:= remainder)) (
(set pme (parse-many-expressions remainder))
(set program ((Global key (head pme)) program))
(set toks (tail pme))
))
(remainder (
(set pme (parse-many-expressions remainder))
(set program ((head pme) program))
(set toks (tail pme))
))
)
))
program
);
The parse-program function decides whether the prefix expression should be interpreted as a global binding or as a global expression.
The parse-many-expressions
function is then called to strip off the top-level expression, returning an expression and possibly some remaining suffix tokens.
It is important to note that our tokens are oriented to the right, which allows us to nibble tokens off the beginning of the stream without needing to copy or move the whole list.
A Token Stream can be thought of as having the following shape.
(tok1 (tok2 (tok3 etc.)))
To parse expressions we look for tokens up until a semicolon or right-parenthesis.
parse-many-expressions := λtoks. (
(local pme)
(local expr)
(local remainder)
(while toks (
(match toks (
((; tl) (
(set remainder tl)
(set toks ())
))
((\] tl) (
(set remainder tl)
(set toks ())
))
(_ (
(set pme (parse-one-expression toks))
(if expr (
(set expr (App expr (head pme)))
) (
(set expr (head pme))
))
(set toks (tail pme))
))
))
))
(expr remainder)
);
The parse-one-expression
rule tries to pull a single atomic expression from the token stream.
parse-one-expression := λtoks. (
(local pme)
(local remainder)
(local expr)
(match toks (
( () (
(set expr Nil)
(set remainder ())
))
( ( \l r ) (
(set expr (Lambda (parse-lambda r)))
(set remainder ())
))
( ( \] r ) (
(unexpect (head toks))
))
( ( \[ r ) (
(set pme (parse-many-expressions r))
(expect \] (tail pme))
(set expr (head pme))
(set remainder (tail pme))
))
( ( \\ (\' r) ) (
(set expr (Literal \'))
(set remainder r)
))
( ( \' (i r) ) (
(set expr (Literal i))
(set remainder r)
))
( (a r) (
(if (is-variable a) (
(set expr (Variable a))
) (
(set expr (Literal a))
))
(set remainder r)
))
))
(expr remainder)
);
The parse-lambda
rule is a special helper to parse the left-hand-side and right-hand-side of lambda expressions.
parse-lambda := λtoks. (
(local pme)
(set pme (parse-one-expression toks))
(local lmb)
(set lmb (head er))
(set toks (tail er))
(match lmb (
((Literal .) (
(set lmb (Nil (parse-many-expressions toks)))
(set toks ())
))
))
(while toks (
(set pme (parse-one-expression toks))
(match pme (
(((Literal .) r) (
(set lmb (lmb (parse-many-expressions r)))
(set toks ())
))
((e ()) (
(expect .)
))
((e r) (
(set lmb (App (lmb e)))
(set toks r)
))
))
))
lmb
);
The Bootstrap Book is released under the terms of the attached permissive MIT license. If you quote or copy the book, please make a reference. That is all I ask, please.