Skip to content
Andrew Johnson edited this page Feb 9, 2024 · 11 revisions

A parser typically defines a process that converts a stream of tokens into an Abstract Syntax Tree. An Abstract Syntax Tree (AST) is the most convenient internal representation of a program.

The Abstract Syntax Tree

The Lambda Calculus has an extremely simple syntax. In tree form we can represent any term as one of four cases: Variable, Literal, Lambda Abstraction, or Application. In our language we will represent these terms as follows.

(Var variable_name)

(Lit literal_value)

(Abs lhs rhs)

(App function argument)

We have also extended the core calculus to include global bindings. We may represent these as follows.

(Global global_name term_value)

Putting this together, the goal of parsing will be to take a stream of tokens like this

f := λx. y z

and turn it into something like this

(Global f (Abs
   (Var x)
   (App (Var y) (Var z))
))

The parse-program Function

Parsing can be implemented with a simple recursive descent function that nibbles on tokens.

parse-program := λtoks. (
   (local program)
   (local pme)
   (while toks (
      match toks (
         ((key (:= remainder)) (
            (set pme (parse-many-expressions remainder))
            (set program ((Global key (head pme)) program))
            (set toks (tail pme))
         ))
         (remainder (
            (set pme (parse-many-expressions remainder))
            (set program ((head pme) program))
            (set toks (tail pme))
         ))
      )
   ))
   program
);

The parse-program function decides whether the prefix expression should be interpreted as a global binding or as a global expression. The parse-many-expressions function is then called to strip off the top-level expression, returning an expression and possibly some remaining suffix tokens. It is important to note that our tokens are oriented to the right, which allows us to nibble tokens off the beginning of the stream without needing to copy or move the whole list. A Token Stream can be thought of as having the following shape.

(tok1 (tok2 (tok3 etc.)))

The parse-many-expressions Function

To parse expressions we look for tokens up until a semicolon or right-parenthesis.

parse-many-expressions := λtoks. (
   (local pme)
   (local expr)
   (local remainder)
   (while toks (
      (match toks (
         ((; tl) (
            (set remainder tl)
            (set toks ())
         ))
         ((\] tl) (
            (set remainder tl)
            (set toks ())
         ))
         (_ (
            (set pme (parse-one-expression toks))
            (if expr (
               (set expr (App expr (head pme)))
            ) (
               (set expr (head pme))
            ))
            (set toks (tail pme))
         ))
      ))
   ))
   (expr remainder)
);

The parse-one-expression Function

The parse-one-expression rule tries to pull a single atomic expression from the token stream.

parse-one-expression := λtoks. (
   (local pme)
   (local remainder)
   (local expr)
   (match toks (
      ( () (
         (set expr Nil)
         (set remainder ())
      ))
      ( ( \l r ) (
         (set expr (Lambda (parse-lambda r)))
         (set remainder ())
      ))
      ( ( \] r ) (
         (unexpect (head toks))
      ))
      ( ( \[ r ) (
         (set pme (parse-many-expressions r))
         (expect \] (tail pme))
         (set expr (head pme))
         (set remainder (tail pme))
      ))
      ( ( \\ (\' r) ) (
         (set expr (Literal \'))
         (set remainder r)
      ))
      ( ( \' (i r) ) (
         (set expr (Literal i))
         (set remainder r)
      ))
      ( (a r) (
         (if (is-variable a) (
            (set expr (Variable a))
         ) (
            (set expr (Literal a))
         ))
         (set remainder r)
      ))
   ))
   (expr remainder)
);

The parse-lambda Function

The parse-lambda rule is a special helper to parse the left-hand-side and right-hand-side of lambda expressions.

parse-lambda := λtoks. (
   (local pme)
   (set pme (parse-one-expression toks))
   (local lmb)
   (set lmb (head er))
   (set toks (tail er))
   (match lmb (
      ((Literal .) (
         (set lmb (Nil (parse-many-expressions toks)))
         (set toks ())
      ))
   ))
   (while toks (
      (set pme (parse-one-expression toks))
      (match pme (
         (((Literal .) r) (
            (set lmb (lmb (parse-many-expressions r)))
            (set toks ())
         ))
         ((e ()) (
            (expect .)
         ))
         ((e r) (
            (set lmb (App (lmb e)))
            (set toks r)
         ))
      ))
   ))
   lmb
);