# Building a Complete Interpreter with Lezer

In this notebook, we build a fully functional interpreter for a simple `C`-like language.
We will implement the Scanner and Parser using **Lezer**, transform the result into a clean **AST**, and finally write an **Interpreter** that executes the code.

## Imports

To build our interpreter, we rely on a specialized set of tools. These imports handle everything from reading source files and generating the parser to visualizing the results.

### The Parsing Engine (`Lezer`)
Lezer is our primary tool for lexical and syntactic analysis.
- **`buildParser`**: The compiler-compiler. It takes our `grammarString` and generates a fully functional `LRParser`.
- **`TreeCursor`**: Allows us to navigate the **Concrete Syntax Tree (CST)** efficiently during the transformation phase.
- **`LRParser`**: The class type for the generated parser.

### File System & Visualization
- **`readFileSync`**: A core Node.js function used to load our source code files (e.g., `sum.sl`) from the disk into the interpreter.
- **`display`**: A utility from the `tslab` kernel. It allows us to render rich content like SVG images directly in the notebook output cells.
- **`viz-js/viz`**: The rendering engine. It converts DOT strings into SVG diagrams so we can visualize our ASTs.

In [None]:
import { readFileSync } from "fs";
import { buildParser } from "@lezer/generator";
import { Tree, TreeCursor } from "@lezer/common";
import { LRParser } from "@lezer/lr";
import { instance } from "@viz-js/viz";
import { display } from "tslab";
const viz = await instance();

## The Language Specification

Our target language supports arithmetic, variables, control flow (`if`, `while`), and function calls.
Formally, the grammar is defined as follows:

```ebnf
program
    : ùúÜ 
    | stmnt program
    
stmnt 
    : IF '(' bool_expr ')' stmnt                 
    | WHILE '(' bool_expr ')' stmnt
    | '{' program '}' 
    | IDENTIFIER ':=' expr ';'  
    | expr ';'       

bool_expr 
    : expr '==' expr     
    | expr '!=' expr     
    | expr '<=' expr     
    | expr '>=' expr     
    | expr '<'  expr      
    | expr '>'  expr     
 
expr: expr '+' product                 
    | expr '-' product
    | product
              
product
    : product '*' factor               
    | product '/' factor
    | product '%' factor 
    | factor

factor
    : '(' expr ')' 
    | NUMBER
    | IDENTIFIER
    | IDENTIFIER '(' expr_list ')'

expr_list
    : ùúÜ 
    | ne_expr_list

ne_expr_list
    : expr
    | expr ',' ne_expr_list
```

## Grammar Definition (Lezer)

We implement the grammar exactly as defined in the EBNF above. 
To maintain a **1:1 structural mapping**, we avoid Lezer's built-in repetition operators (like `*` or `+`) and instead model lists using explicit recursion. This ensures that the resulting tree structure mirrors the formal derivation steps.

### Structural Nuances: Lambda and Recursion

* **The Empty Production ($\lambda$):**
  In formal grammar, $\lambda$ (or $\epsilon$) denotes an empty match. Lezer does not have a specific keyword for this; instead, we use an empty string literal `""`.
  * **EBNF:** `program : Œª`
  * **Lezer:** `Program { "" }`
  This allows the parser to complete a program successfully when no further statements follow.

* **Recursive Lists without Extra Nodes:**
  We use the `Program` rule itself recursively: 
  `Program { Stmnt Program | "" }`
  This creates a "right-leaning" chain in the Concrete Syntax Tree (CST), where each program consists of a statement followed by another program, or terminates with an empty match. This is ideal for demonstrating how recursive derivation works in practice.

### Token Specialization (Keywords)

A common issue in language design is that keywords like `if` or `while` are technically valid `Identifiers`. To ensure the parser distinguishes them correctly, we use `@specialize`:

```javascript
IF { @specialize<Identifier, "if"> }
```

This instructs the tokenizer: "First, read an Identifier. If the text is exactly 'if', convert the token into a `IF`." This prevents conflicts and ensures that variables such as `if_counter` are still recognized as normal identifiers.

### Tokenization and the "Slash Conflict"

The character `/` is ambiguous in our language as it serves three purposes:
1.  **Line Comment** (`//`)
2.  **Block Comment** (`/*`)
3.  **Division Operator** (`/`)

Without explicit guidance, the tokenizer might see the first `/`, match it as a division operator, and fail on the subsequent characters. We resolve this using the `@precedence` block within the `@tokens` section:

```javascript
@precedence { LineComment, BlockComment, "/" }
```

**The Logic:**
By defining this order, we tell Lezer: "If you encounter a slash, check if it forms a comment first (since these are longer matches). Only if it doesn't fit a comment pattern, treat it as a division operator." This is essential for the tokenizer to correctly "skip" comments.

In [None]:
const grammarString = `
    @top Script { Program }

    @tokens {
        Identifier { $[a-zA-Z] $[a-zA-Z0-9_]* }
        Number     { "0" | $[1-9] $[0-9]* }

        "+" "-" "*" "/" "%"
        ":=" "==" "!=" "<=" ">=" "<" ">"
        "(" ")" "{" "}"
        ";" "," 

        space { $[ \t\n\r]+ }
        LineComment { "//" ![\n]* }
        BlockComment { "/*" ( ![*] | "*" + ![*/] )* "*"+ "/" }
        @precedence { LineComment, BlockComment, "/" }
    }

    @skip { space | LineComment | BlockComment }

    IF    { @specialize<Identifier, "if"> }
    WHILE { @specialize<Identifier, "while"> }

    Program {
        Stmnt Program | 
        ""
    }

    Stmnt {
        IF "(" BoolExpr ")" Stmnt    |
        WHILE "(" BoolExpr ")" Stmnt |
        "{" Program "}"              |
        Identifier ":=" Expr ";"     |
        Expr ";"
    }

    BoolExpr {
        Expr "==" Expr |
        Expr "!=" Expr |
        Expr "<=" Expr |
        Expr ">=" Expr |
        Expr "<"  Expr |
        Expr ">"  Expr
    }

    Expr {
        Expr "+" Product |
        Expr "-" Product |
        Product
    }

    Product {
        Product "*" Factor |
        Product "/" Factor | 
        Product "%" Factor |
        Factor
    }

    Factor {
        "(" Expr ")"                |
        Number                      |
        Identifier                  |
        Identifier "(" ExprList ")"
    }

    ExprList {
        NeExprList | 
        ""
    }

    NeExprList {
        Expr |
        Expr "," NeExprList
    }
`;

In [None]:
const parser = buildParser(grammarString);

The type `AST` represents an *abstract syntax tree*.  We assume that any operator takes at most four arguments.

In [None]:
type Operator = string;
type AST = string | number | 
           [Operator, AST]                |
           [Operator, AST, AST]           |
           [Operator, AST, AST, AST]      |
           [Operator, AST, AST, AST, AST];

The function `cst2ast` takes four arguments:
 * `cursor` is a pointer into the concrete syntax tree that is generated by the parser.
 * `input` is the string that has been parsed.
 * `operators` is a list of those strings that should appear as operators in the `AST` that is returned.
 *  `listVars` is a list of those grammar variables that represent lists of 
    various kinds, e.g. lists of statements or lists of expressions.
    A list is represenetd in the `AST` as a *dotted list*.  For example,
    the list `[1, 2, 3]` is represented as the `AST` `['.', 1, ['.', 2, ['.', 3, '']]]`.

In [None]:
import { cst2ast } from './CST2AST';

In [None]:
const myOperators = [
    "IF", "WHILE", ":=",              
    "+", "-", "*", "/", "%",
    "==", "!=", "<", ">", "<=", ">="
];

In [None]:
const myListVars = ["Program", "NeExprList"];

This is the main parsing function. It takes a filename and returns an abstract syntax tree.

In [None]:
function parse(fileName: string): AST {
    const source = readFileSync(fileName, "utf8");
    const tree   = parser.parse(source);
    return cst2ast(tree.cursor(), source, myOperators, myListVars);
}

In [None]:
const astSum = parse("sum.sl");
console.dir(astSum, { depth: null });

In [None]:
const astFact = parse("factorial.sl");
console.dir(astFact, { depth: null });

## Graphical Visualization (Graphviz)

We use a custom `ast2dot` function to render the tree. The visualizer transforms our AST into a Directed Graph using the DOT language.

**Key Feature: List Flattening**
A raw AST using cons-lists (`.`) creates very deep, right-leaning trees. Our visualizer applies a "transparency optimization": it hides the `.` nodes and attaches statements directly to their containing block, making the graph much easier to read.

**Visual Guide:**
* **Green Boxes:** Code Blocks `{ ... }` (Scopes).
* **Blue Diamonds:** Control Flow (`IF`, `WHILE`).
* **Red Octagons:** Function Calls (`print`, `read`).
* **Circles:** Operators (`+`, `:=`, `<`).
* **Yellow Ellipses:** Leaf nodes (Variables, Numbers).

In [None]:
import { ast2dot } from "./AST2Dot";

In [None]:
const dotSum = ast2dot(astSum);
display.html(viz.renderString(dotSum, { format: "svg" }));

In [None]:
const dotFact = ast2dot(astFact);
display.html(viz.renderString(dotFact, { format: "svg" }));

## Interpreter

The interpreter brings our **Abstract Syntax Tree** to life. It recursively traverses the tree structure and executes the operations described by each node.

We divide the implementation into three specialized functions:
1. **`execute`**: Handles **Statements** (side effects like assignments, loops, conditionals)
2. **`evaluate`**: Handles **Expressions** and returns a `number`
3. **`evaluateBool`**: Handles **Conditions** and returns a `boolean`

### State Management

We use a simple `Map` to represent the program's memory (environment):
- **Keys:** Variable names (`string`)
- **Values:** Current values (`number`)

We also simulate **Input/Output** using a global `inputStream` array acting as STDIN.

In [None]:
type Environment = Map<string, number>;
let inputStream: string[] = [];

In [None]:
let execute:      (node: AST, env: Environment) => void;
let executeList:  (list: AST, env: Environment) => void;
let evaluate:     (node: AST, env: Environment) => number;
let evaluateBool: (node: AST, env: Environment) => boolean;

### Statement Execution

The `executeList` function processes statement sequences. Our AST represents sequences as cons-lists with the `'.'` tag, where each node contains a head (current statement) and tail (remaining statements).

In [None]:
executeList = (list: AST, env: Environment) => {
    if (list === "") return;
    
    if (Array.isArray(list) && list[0] === '.') {
        const head = list[1];
        const tail = list[2];

        if (head !== undefined && tail !== undefined) {
            execute(head, env);
            executeList(tail, env);
        }
    }
};

The `execute` function dispatches based on the AST node tag. It handles:
- **Blocks**: Execute all statements inside
- **Assignments** (`:=`): Evaluate the right-hand side and store in environment
- **IF statements**: Execute body only if condition is true
- **WHILE loops**: Repeatedly execute body while condition holds
- **Function calls**: Delegate to `evaluate` for side effects


In [None]:
execute = (node: AST, env: Environment): void => {    
    if (!Array.isArray(node)) return;
    const tag = node[0];

    if (tag === '.') {
        executeList(node, env);
        return;
    }

    if (tag === "block") {
        const content = node[1];
        if (content !== undefined) {
            executeList(content, env);
        }
    }
    else if (tag === ":=") {
        const name = node[1];
        const expr = node[2];
        if (typeof name === "string" && expr !== undefined) {
            env.set(name, evaluate(expr, env));
        }
    }
    else if (tag === "IF") {
        const cond = node[1];
        const body = node[2];
        if (cond !== undefined && body !== undefined) {
            if (evaluateBool(cond, env)) execute(body, env);
        }
    }
    else if (tag === "WHILE") {
        const cond = node[1];
        const body = node[2];
        if (cond !== undefined && body !== undefined) {
            while (evaluateBool(cond, env)) execute(body, env);
        }
    }
    else if (tag === "Call") {
        evaluate(node, env);
    }
};

### Expression Evaluation

The `evaluate` function computes numeric values. It handles:
- **Literals**: Return number directly
- **Variables**: Lookup in environment
- **Arithmetic operators**: `+`, `-`, `*`, `/`, `%` (integer division and modulo)
- **Built-in functions**:
  - `print(expr)`: Output to console and return 0
  - `read()`: Read next value from input stream

Note: Division and modulo by zero throw an error.

In [None]:
evaluate = (node: AST, env: Environment): number => {
    if (typeof node === "number") return node;
    if (typeof node === "string") {
        const val = env.get(node);
        if (val === undefined) throw new Error(`Undefined var ${node}`);
        return val;
    }

    if (Array.isArray(node)) {
        const tag = node[0];

        if (["+", "-", "*", "/", "%"].includes(tag)) {
             const leftNode = node[1];
             const rightNode = node[2];

             if (leftNode !== undefined && rightNode !== undefined) {
                 const l = evaluate(leftNode, env);
                 const r = evaluate(rightNode, env);
                 switch(tag) {
                     case "+": return l + r;
                     case "-": return l - r;
                     case "*": return l * r;
                     case "/": 
                         if (r === 0) throw new Error("Division by zero");
                         return Math.floor(l / r);
                     case "%": 
                         if (r === 0) throw new Error("Modulo by zero");
                         return l % r;
                 }
             }
        }

        if (tag === "Call") {
             const fn = node[1];
             const argsNode = node[2]; 

             if (fn === "print" && argsNode !== undefined) {
                 if (Array.isArray(argsNode) && argsNode[0] === '.') {
                     const actualArg = argsNode[1];
                     if (actualArg !== undefined) {
                        console.log(">> STDOUT:", evaluate(actualArg, env));
                     }
                 }
                 return 0;
             }
             if (fn === "read") {
                 const input = inputStream.shift();
                 const val = input !== undefined ? Number(input) : 0;
                 console.log("<< STDIN:", val);
                 return val;
             }
        }
    }
    throw new Error(`Invalid expression node: ${JSON.stringify(node)}`);
};

### Boolean Evaluation

The `evaluateBool` function evaluates comparison operators: `==`, `!=`, `<`, `>`, `<=`, `>=`.

According to our grammar, every `BoolExpr` **must** contain a comparison operator, there is no implicit truthiness conversion.

In [None]:
evaluateBool = (node: AST, env: Environment): boolean => {
    if (Array.isArray(node)) {
        const tag = node[0];
        if (["==", "!=", "<", ">", "<=", ">="].includes(tag)) {
             const leftNode = node[1];
             const rightNode = node[2];

             if (leftNode !== undefined && rightNode !== undefined) {
                 const l = evaluate(leftNode, env);
                 const r = evaluate(rightNode, env);
                 switch(tag) {
                     case "==": return l == r;
                     case "!=": return l != r;
                     case "<":  return l < r;
                     case ">":  return l > r;
                     case "<=": return l <= r;
                     case ">=": return l >= r;
                 }
             }
        }
    }
    throw new Error("Invalid boolean expression");
};

### Testing the Interpreter

We create a helper function to run programs with simulated input. It parses the source file and executes the resulting AST with a fresh environment.

In [None]:
function runProgram(fileName: string, inputs: string[] = []) {
    console.log(`\n--- Executing File: ${fileName} ---`);
    console.log(`--- Inputs: [${inputs.join(", ")}] ---`);
    inputStream = [...inputs];
    console.log(inputStream);
    try {
        const ast = parse(fileName);
        execute(ast, new Map());
    } catch (e) { console.error(e); }
}

Let's test with two example programs:

In [None]:
runProgram('sum.sl', ["6"]);

In [None]:
runProgram('factorial.sl', ["6"]);