In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

# Building a Complete Interpreter with Lezer

In this notebook, we build a fully functional interpreter for a simple `C`-like language.
We will implement the Scanner and Parser using **Lezer**, transform the result into a clean **AST**, and finally write an **Interpreter** that executes the code.

## Imports

To build our interpreter, we rely on a specialized set of tools. These imports handle everything from reading source files and generating the parser to visualizing the results.

### File System Operations (`fs`)
* **`readFileSync` / `writeFileSync`**: Standard Node.js utilities. We use them to load our source code files (e.g., `sum.sl`) and save generated outputs or debug logs.

### The Parsing Engine (`Lezer`)
Lezer is our primary tool for lexical and syntactic analysis.
* **`buildParser`**: The compiler-compiler. It takes our `grammarString` and generates a fully functional `LRParser`.
* **`Tree` & `TreeCursor`**: These represent the **Concrete Syntax Tree (CST)**. The cursor allows us to navigate this detailed tree efficiently during the transformation phase.
* **`LRParser`**: The class type for the generated parser, which follows the "Left-to-Right, Rightmost derivation" strategy.



### AST Structure & Transformation (`./AST2Dot`)
This internal module contains our custom language definitions and the mapping engine.
* **AST Nodes**: We import specialized classes like `NumNode`, `IfNode`, `BinaryExpr`, and others. These are the building blocks of our logical program structure.
* **`ListNode`**: Our "secret weapon" for type-safety. It temporarily holds recursive list data (like statements or arguments) before they are flattened into arrays.
* **`cstToAST`**: The transformation engine that takes the CST and produces a clean, executable AST using our `ParserConfig`.
* **`ast2dot`**: A helper that converts our tree into the DOT language for visualization.

### 4. State Management & Visualization
* **`RecursiveMap`**: Unlike standard JavaScript objects, this map provides **value semantics**. This is crucial for our interpreter's memory (environment), as it allows us to compare different program states based on their content rather than their memory address.
* **`viz-js/viz`**: The rendering engine. It takes the DOT strings generated by `ast2dot` and turns them into SVG diagrams so we can see our ASTs directly in the notebook.

In [None]:
import { readFileSync, writeFileSync } from "fs";
import { buildParser } from "@lezer/generator";
import { Tree, TreeCursor } from "@lezer/common";
import { LRParser } from "@lezer/lr";
import { 
    AST, cstToAST, ast2dot, ParserConfig, NumNode, VarNode,
    ExprStmtNode, BlockNode, AssignNode, BinaryExpr, WhileNode,
    IfNode, CallNode, NilNode, ListNode, getVarName,
    getBlockStmts, ASTNode } from "./AST2Dot";
import { RecursiveMap } from "recursive-set";
import { instance } from "@viz-js/viz";
const viz = await instance();

## The Language Specification

Our target language supports arithmetic, variables, control flow (`if`, `while`), and function calls.
Formally, the grammar is defined as follows:

```ebnf
program
    : ùúÜ 
    | stmnt program
    
stmnt 
    : IF '(' bool_expr ')' stmnt                 
    | WHILE '(' bool_expr ')' stmnt
    | '{' program '}' 
    | IDENTIFIER ':=' expr ';'  
    | expr ';'       

bool_expr 
    : expr '==' expr     
    | expr '!=' expr     
    | expr '<=' expr     
    | expr '>=' expr     
    | expr '<'  expr      
    | expr '>'  expr     
 
expr: expr '+' product                 
    | expr '-' product
    | product
              
product
    : product '*' factor               
    | product '/' factor
    | product '%' factor 
    | factor

factor
    : '(' expr ')' 
    | NUMBER
    | IDENTIFIER
    | IDENTIFIER '(' expr_list ')'

expr_list
    : ùúÜ 
    | ne_expr_list

ne_expr_list
    : expr
    | expr ',' ne_expr_list
```

## Grammar Definition (Lezer)

We implement the grammar exactly as defined in the EBNF above. To maintain a **1:1 structural mapping**, we avoid Lezer's built-in repetition operators (like `*` or `+`) and instead model lists using explicit recursion. This ensures that the resulting tree structure mirrors the formal derivation steps.

### Structural Nuances: Lambda and Recursion

* **The Empty Production ($\lambda$):**
  In formal grammar, $\lambda$ (or $\epsilon$) denotes an empty match. Lezer does not have a specific keyword for this; instead, we use an empty string literal `""`.
  * **EBNF:** `program : Œª`
  * **Lezer:** `program { "" }`
  This tells the parser: "If no statement follows, matching 'nothing' is a valid way to complete a program."

* **Recursive Lists (`Cons`):**
  We use a "Head-Tail" approach. A `Cons` node consists of one `statement` and the rest of the `program`. This creates a deeply nested structure that is perfect for demonstrating how recursive descent works.



### Tokenization and the "Slash Conflict"

While the grammar rules handle the logic, the tokenizer must first break the raw text into meaningful chunks. A classic problem in lexer design is the **Slash Ambiguity**. The character `/` is ambiguous because it could be the start of three different things:
1.  A **Line Comment** (`//`)
2.  A **Block Comment** (`/*`)
3.  A **Division Operator** (`/`)

Without guidance, a tokenizer might just see the first `/`, match it as `OpDivide`, and then get confused by the next character. To fix this, we use the `@precedence` block:

```javascript
@precedence { LineComment, BlockComment, OpDivide }
```



**Why this order?**
By placing `LineComment` and `BlockComment` before `OpDivide`, we tell the tokenizer: *"If you see a slash, check if it's the start of a comment first. Only if those don't match, treat it as a division operator."* This ensures that comments are swallowed as a single block instead of being misinterpreted as a sequence of mathematical operators.

In [None]:
const grammarString = `
    @top Script { program }
    program {
        Cons { statement program } |
        "" 
    }
    
    KwIf    { @specialize<Identifier, "if"> }
    KwWhile { @specialize<Identifier, "while"> }

    statement {
        IfStatement    { KwIf LParen BoolExpr RParen statement } |
        WhileStatement { KwWhile LParen BoolExpr RParen statement } |
        Block          { LBrace program RBrace } |
        Assignment     { Identifier OpAssign Expr Semi } |
        ExprStatement  { Expr Semi }
    }

    BoolExpr {
        Compare { Expr OpEq Expr } |
        Compare { Expr OpNe Expr } |
        Compare { Expr OpLe Expr } |
        Compare { Expr OpGe Expr } |
        Compare { Expr OpLt Expr } |
        Compare { Expr OpGt Expr }
    }

    Expr {
        BinaryExpr { Expr OpPlus Product } |
        BinaryExpr { Expr OpMinus Product } |
        Product
    }

    Product {
        BinaryExpr { Product OpTimes Factor } |
        BinaryExpr { Product OpDivide Factor } |
        BinaryExpr { Product OpModulo Factor } |
        Factor
    }

    Factor {
        ParenExpr { LParen Expr RParen } |
        Number |
        Identifier |
        Call { Identifier LParen ExprList RParen }
    }

    ExprList {
        NeExprList |
        ""
    }

    NeExprList {
        Recurse { Expr Comma NeExprList } |
        Expr
    }

    @tokens {
        space { $[ \t\n\r]+ }
        LineComment { "//" ![\n]* }
        BlockComment { "/*" blockCommentRest }
        blockCommentRest { ![*] blockCommentRest | "*" blockCommentAfterStar }
        blockCommentAfterStar { "/" | "*" blockCommentAfterStar | ![/*] blockCommentRest }

        Identifier { $[a-zA-Z] $[a-zA-Z0-9_]* }
        Number { "0" | $[1-9] $[0-9]* }

        OpAssign { ":=" }

        OpEq { "==" } OpNe { "!=" }
        OpLe { "<=" } OpGe { ">=" }
        OpLt { "<" }  OpGt { ">" }

        OpPlus { "+" } OpMinus { "-" } 
        OpTimes { "*" } OpDivide { "/" } OpModulo { "%" }

        LParen { "(" } RParen { ")" }
        LBrace { "{" } RBrace { "}" }
        Semi { ";" } Comma { "," }

        @precedence { LineComment, BlockComment, OpDivide }
    }

    @skip { space | LineComment | BlockComment }
`;

In [None]:
const parser : LRParser = buildParser(grammarString);
"Parser generated successfully.";

## Testing the Scanner (Token Verification)

Since Lezer combines tokenization and parsing, we do not get a flat list of tokens automatically. However, we can simulate a scanner check by extracting the leaves of the resulting **Concrete Syntax Tree (CST)**.

**Formal Definition:**

Let $S$ be the input string and $T$ be the Syntax Tree generated by the parser.
We define the sequence of Tokens $\mathcal{T}$ as the ordered sequence of all leaf nodes in $T$.

A node $n \in T$ is considered a leaf if and only if it has no children:
$$\text{isLeaf}(n) \iff \text{degree}^{+}(n) = 0$$

The function `testScanner` performs a Depth-First Traversal over $T$. For every node $n$ visited by the cursor $C$, we execute the following logic:

$$
\text{Output}(n) =
\begin{cases}
    \texttt{print}(n.\text{type}, S[n.\text{from} \dots n.\text{to}]) & \text{if } \neg C.\text{firstChild}() \\
    \text{continue} & \text{otherwise}
\end{cases}
$$

In [None]:
import { Tree, TreeCursor } from "@lezer/common";

function testScanner(fileName: string): void {
    const input: string = readFileSync(fileName, "utf8");
    
    console.log(`--- Scanning ${fileName} ---`);
    console.log(input);
    console.log("Tokens:");
    
    const tree: Tree = parser.parse(input);
    const cursor: TreeCursor = tree.cursor();
    
    do {
        if (!cursor.firstChild()) {
            const tokenText = input.slice(cursor.from, cursor.to);
            const safeText = tokenText.replace(/\n/g, "\\n");
            console.log(`[${cursor.name}]`.padEnd(15) + `: ${safeText}`);
        }
    } while (cursor.next());
}

In [None]:
testScanner('sum.sl');

In [None]:
testScanner('factorial.sl');

## The CST-to-AST Transformation

Lezer generates a **Concrete Syntax Tree (CST)**. This tree contains every single detail of the source code, including whitespace, comments, parentheses, and semicolons. While perfect for syntax highlighting, this structure is too noisy for interpretation.

We need to transform this CST into an **Abstract Syntax Tree (AST)**. An AST focuses purely on the logical structure of the program (e.g., "This is an assignment") rather than the syntactic sugar (e.g., "There is a semicolon here").

The `ParserConfig` object defines the transformation layer of the compiler. It bridges the gap between the raw token stream and the structured **Abstract Syntax Tree (AST)**.


### Syntactic Noise Reduction (`ignore`)

To simplify the resulting tree, tokens that only serve as "scaffolding" are discarded. These are necessary for the grammar to parse correctly but carry no semantic data for the execution phase.

* **Delimiters:** $L = \{ \text{'(', ')', '{', '}', ',', ';'} \}$
* **Static Keywords:** `if`, `while`
* **Operators:** `:=` (The `Assignment` rule handle the logic, making the operator token itself redundant).

---

### Transformation Rules (`rules`)

The `rules` define how each grammar production is reduced into an AST Node.

#### Terminal & Operator Mapping
Atomic values and operators are converted into specific nodes or strings.

| Token | Reduction Logic | Resulting Node |
| :--- | :--- | :--- |
| `Number` | `parseInt(text)` | `NumNode` |
| `Identifier` | `text` | `VarNode` |
| `Operators` | Maps token type to string (e.g., `OpPlus` $\rightarrow$ `"+"`) | `VarNode` |

#### Recursive List Flattening
The parser uses a recursive approach to handle sequences (like lists of statements or arguments). This configuration flattens these into a linear array to avoid deep, lopsided trees.

For a recursive list $L$ defined by an element $e$ and a tail $T$:
$$L_{items} = \begin{cases} [e] + T.items & \text{if } T \text{ is a ListNode} \\ [e] & \text{otherwise} \end{cases}$$

This logic is implemented in `Cons` (for program statements) and `Recurse` (for expression lists).

---

### Structural Semantics

#### Control Flow & Scoping
* **`IfStatement` / `WhileStatement`:** Captures the condition and the body to create branching logic.
* **`Block`:** Converts a `ListNode` of statements into a single `BlockNode`, defining a new scope.

#### Expressions & Safety
* **Pass-through Rules:** Rules like `Expr` or `Factor` act as transparent bridges. They return their child directly, preventing unnecessary "wrapper" nodes in the tree.
* **Binary Expressions:** Validates that the operator is present and constructs a `BinaryExpr` node:
  $$\text{BinaryExpr}(Left, Op, Right)$$
* **Assignments:** Includes a safety check ensuring the left-hand side is a valid `VarNode` (identifier), throwing an error otherwise to prevent invalid code like `5 = x`.

#### Function Calls
The `Call` rule extracts the function name and flattens the `ExprList` into a clean array of arguments passed to the `CallNode`.

In [None]:
const config: ParserConfig = {
    ignore: new Set([
        "LParen", "RParen",
        "LBrace", "RBrace",
        "Semi", "Comma",
        "OpAssign",
        "KwIf", "KwWhile"
    ]),
    
    rules: {
        // --- Terminals ---
        "Number": (_, text) => new NumNode(parseInt(text)),
        "Identifier": (_, text) => new VarNode(text),

        // --- Operators ---
        "OpPlus": () => new VarNode("+"), "OpMinus": () => new VarNode("-"),
        "OpTimes": () => new VarNode("*"), "OpDivide": () => new VarNode("/"), 
        "OpModulo": () => new VarNode("%"),
        "OpEq": () => new VarNode("=="), "OpNe": () => new VarNode("!="),
        "OpLe": () => new VarNode("<="), "OpGe": () => new VarNode(">="),
        "OpLt": () => new VarNode("<"),  "OpGt": () => new VarNode(">"),

        // --- Recursive Lists ---
        "Cons": ([statement, tailNode]) => {
            const tail = (tailNode instanceof ListNode) ? tailNode.items : [];
            return new ListNode([statement, ...tail]);
        },
        
        "program": ([listNode]) => {
            return (listNode instanceof ListNode) ? listNode : new ListNode([]);
        },
        
        "Script": ([programNode]) => {
            const stmts = (programNode instanceof ListNode) ? programNode.items : [];
            return new BlockNode(stmts);
        },

        "Recurse": ([expr, tailNode]) => {
            const tail = (tailNode instanceof ListNode) ? tailNode.items : [];
            return new ListNode([expr, ...tail]);
        },
        
        "NeExprList": ([child]) => {
            return (child instanceof ListNode) ? child : new ListNode([child]);
        },
        
        "ExprList": ([listNode]) => {
             return (listNode instanceof ListNode) ? listNode : new ListNode([]);
        },


        // --- Statements ---
        "IfStatement": ([condition, body]) => new IfNode(condition, body),
        "WhileStatement": ([condition, body]) => new WhileNode(condition, body),

        "Block": ([programList]) => {
            const stmts = (programList instanceof ListNode) ? programList.items : [];
            return new BlockNode(stmts);
        },
        
        "Assignment": ([target, expr]) => {
            if (target instanceof VarNode) return new AssignNode(target.name, expr);
            throw new Error("Invalid Assignment Target");
        },

        "ExprStatement": ([expr]) => new ExprStmtNode(expr),


        // --- Expressions (Pass-through) ---
        "Expr":      ([child]) => child,
        "Product":   ([child]) => child,
        "Factor":    ([child]) => child,
        "ParenExpr": ([child]) => child,
        "BoolExpr":  ([child]) => child,

        "BinaryExpr": ([left, op, right]) => {
             if (op instanceof VarNode) return new BinaryExpr(left, op.name, right);
             throw new Error("Missing Operator");
        },
        
        "Compare": ([left, op, right]) => {
             if (op instanceof VarNode) return new BinaryExpr(left, op.name, right);
             throw new Error("Missing Operator");
        },

        "Call": ([functionName, argsNode]) => {
            if (functionName instanceof VarNode) {
                const args = (argsNode instanceof ListNode) ? argsNode.items : [];
                return new CallNode(functionName.name, args); 
            }
            throw new Error("Invalid Function Call");
        }
    }
};

In [None]:
function parse(fileName: string): AST {
    const source = readFileSync(fileName, "utf8");
    const tree = parser.parse(source);
    return cstToAST(tree.cursor(), source, config);
}

### Visualizing the AST

To verify that our configuration correctly transforms the CST into the intended AST structure, we utilize our `AST2Dot` library. This renders the tree structure, showing the **Program** root, the statements, and the nested expressions.

In [None]:
const astSum = parse("sum.sl");
const dotSum = ast2dot(astSum);
display.html(viz.renderString(dotSum, { format: "svg" }));

In [None]:
const astFact = parse("factorial.sl");
const dotFact = ast2dot(astFact);
display.html(viz.renderString(dotFact, { format: "svg" }));

## The Interpreter

The interpreter breathes life into our **Abstract Syntax Tree (AST)**. It recursively traverses the tree structure and performs the operations described by the nodes.

We divide the implementation into three specialized functions, mirroring the structure of our AST:

1.  **`execute`**: Handles **Statements** (side effects). It modifies the program state but returns nothing (`void`).
2.  **`evaluate`**: Handles **Arithmetic Expressions**. It calculates and returns a `number`.
3.  **`evaluateBool`**: Handles **Conditions**. It returns a `boolean` to decide control flow paths.

### State Management

Instead of plain JavaScript objects (which rely on reference identity), we use `RecursiveMap` from the `recursive-set` library. This ensures that our environment itself has **Value Semantics**.

* **Keys**: Variable names (`string`).
* **Values**: The stored numbers (`number`).
* **API**: We use `.get(key)` to retrieve values and `.set(key, val)` to update the state.

In [None]:
type Variables = RecursiveMap<string, number>;
let inputStream: string[] = [];
let execute: (node: AST, values: Variables) => void;
let evaluate: (node: AST, values: Variables) => number;
let evaluateBool: (node: AST, values: Variables) => boolean;

### Executing Statements (`execute`)

The `execute` function serves as the control center. It accepts an `AST` node and the current `values` (memory).

**The Logic:**
Instead of a switch-statement on strings, we now use **Type Guards** (`instanceof`) to determine the specific class of the node. This allows TypeScript to narrow down the type safely.

* **`BlockNode`**: Iterates through its `statements` array and recursively calls `execute` for each one.
* **`AssignNode`**: Computes the right-hand side using `evaluate` and updates the `values` map.
* **`IfNode` / `WhileNode`**: Control flow structures relying on `evaluateBool`.

In [None]:
execute = (node: AST, values: Variables): void => {    
    if (node instanceof BlockNode) {
        for (const stmt of node.statements) {
            execute(stmt, values);
        }
        return;
    }
    if (node instanceof AssignNode) {
        const result = evaluate(node.expr, values);
        values.set(node.id, result); 
        return;
    }
    if (node instanceof ExprStmtNode) {
        evaluate(node.expr, values);
        return;
    }
    if (node instanceof IfNode) {
        if (evaluateBool(node.cond, values)) {
            execute(node.thenB, values);
        } else if (!(node.elseB instanceof NilNode)) {
            execute(node.elseB, values);
        }
        return;
    }
    if (node instanceof WhileNode) {
        while (evaluateBool(node.cond, values)) {
            execute(node.body, values);
        }
        return;
    }
    if (node instanceof NilNode) return;
    throw new Error(`Runtime Error: Cannot execute node: ${node}`);
}

### Evaluating Conditions (`evaluateBool`)

**The Logic:**
1.  **Comparisons (`BinaryExpr`)**: We check the operator. If it is a relational operator (e.g. `<`), we evaluate both operands and return the boolean result.
2.  **C-Style Truthiness**: If the node is not an explicit comparison (e.g. just a number or an arithmetic term), we follow the convention: `0` is `false`, everything else is `true`.

In [None]:
evaluateBool = (node: AST, values: Variables): boolean => {
    if (node instanceof BinaryExpr) {
        const l = evaluate(node.left, values);
        const r = evaluate(node.right, values);
        switch (node.op) {
            case '==': return l === r;
            case '!=': return l !== r;
            case '<':  return l < r;
            case '>':  return l > r;
            case '<=': return l <= r;
            case '>=': return l >= r;
        }
    }
    const val = evaluate(node, values);
    return val !== 0;
};

### Evaluating Expressions (`evaluate`)

This function computes the actual numerical values.

**The Logic:**
* **Leaves**:
    * **Numbers**: Returned "as is".
    * **Variables (Strings)**: Looked up in the `values` map. We throw an error if the variable is undefined.
* **Binary Expressions**: We recursively evaluate `left` and `right` operands and apply the math operator. Note that `/` performs **integer division** (`Math.floor`) to keep things simple.
* **Function Calls**:
    * **`print`**: Evaluates its argument, logs it to the console (simulating `stdout`), and returns `0`.
    * **`read`**: Simulates reading from `stdin`. It takes the next value from the global `inputStream` queue. If the queue is empty, it throws a Runtime Error.

In [None]:
evaluate = (node: AST, values: Variables): number => {
    if (node instanceof NumNode) {
        return node.value;
    }
    if (node instanceof VarNode) {
        const val = values.get(node.name);
        if (val === undefined) {
            throw new Error(`Runtime Error: Undefined variable '${node.name}'`);
        }
        return val;
    }
    if (node instanceof BinaryExpr) {
        const l = evaluate(node.left, values);
        const r = evaluate(node.right, values);
        switch (node.op) {
            case '+': return l + r;
            case '-': return l - r;
            case '*': return l * r;
            case '/': return Math.floor(l / r); 
            case '%': return l % r;
            default: throw new Error(`Runtime Error: Operator '${node.op}' yields boolean, expected number.`);
        }
    }
    if (node instanceof CallNode) {
        if (node.fn === "print") {
            const val = node.args.length > 0 ? evaluate(node.args[0], values) : 0;
            console.log(">> STDOUT:", val);
            return 0;
        }
        if (node.fn === "read") {
            const input = inputStream.shift();
            if (input === undefined) {
                throw new Error("Runtime Error: 'read()' called but Input Stream is empty!");
            }
            console.log(`<< STDIN: Read '${input}'`);
            const num = parseInt(input, 10);
            if (isNaN(num)) {
                throw new Error(`Runtime Error: Input '${input}' is not a number.`);
            }
            return num;
        }

        throw new Error(`Runtime Error: Unknown function '${node.fn}'`);
    }
    throw new Error(`Runtime Error: Cannot evaluate expression from node type: ${node.constructor.name}`);
};


## Execution

Finally, we define a `main` function that puts everything together:
1.  Read the source file.
2.  Parse it into an AST.
3.  Initialize an empty memory.
4.  Execute the AST.

In [None]:
function main(fileName: string, inputs: string[] = []) {
    console.log(`\n--- Executing ${fileName} with inputs [${inputs.join(", ")}] ---`);
    inputStream = [...inputs]; 
    
    try {
        const ast = parse(fileName);
        const values: Variables = new RecursiveMap();
        
        execute(ast, values);
        
        console.log("Final Memory State:", values);
    } catch (e) {
        if (e instanceof Error) {
            console.error(e.message);
        } else {
            console.error("Unbekannter Fehler:", String(e));
        }
    }
}

In [None]:
main('sum.sl', ["5"]);

In [None]:
main('sum.sl');

In [None]:
main('sum.sl', ["6"]);

In [None]:
main('factorial.sl', ["10"]);

In [None]:
main('factorial.sl', ["hello"]);