In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

# Building a Complete Interpreter with Lezer

In this notebook, we build a fully functional interpreter for a simple `C`-like language.
We will implement the Scanner and Parser using **Lezer**, transform the result into a clean **AST**, and finally write an **Interpreter** that executes the code.

## Imports

To build our interpreter, we rely on a specialized set of tools. These imports handle everything from reading source files and generating the parser to visualizing the results.

### The Parsing Engine (`Lezer`)
Lezer is our primary tool for lexical and syntactic analysis.
* **`buildParser`**: The compiler-compiler. It takes our `grammarString` and generates a fully functional `LRParser`.
* **`TreeCursor`**: These represent the **Concrete Syntax Tree (CST)**. The cursor allows us to navigate this detailed tree efficiently during the transformation phase.
* **`LRParser`**: The class type for the generated parser, which follows the "Left-to-Right, Rightmost derivation" strategy.


### State Management & Visualization
* **`viz-js/viz`**: The rendering engine. It takes the DOT strings generated by `ast2dot` and turns them into SVG diagrams so we can see our ASTs directly in the notebook.

In [None]:
import { buildParser } from "@lezer/generator";
import { Tree, TreeCursor } from "@lezer/common";
import { LRParser } from "@lezer/lr";
import { instance } from "@viz-js/viz";
const viz = await instance();

## The Language Specification

Our target language supports arithmetic, variables, control flow (`if`, `while`), and function calls.
Formally, the grammar is defined as follows:

```ebnf
program
    : ùúÜ 
    | stmnt program
    
stmnt 
    : IF '(' bool_expr ')' stmnt                 
    | WHILE '(' bool_expr ')' stmnt
    | '{' program '}' 
    | IDENTIFIER ':=' expr ';'  
    | expr ';'       

bool_expr 
    : expr '==' expr     
    | expr '!=' expr     
    | expr '<=' expr     
    | expr '>=' expr     
    | expr '<'  expr      
    | expr '>'  expr     
 
expr: expr '+' product                 
    | expr '-' product
    | product
              
product
    : product '*' factor               
    | product '/' factor
    | product '%' factor 
    | factor

factor
    : '(' expr ')' 
    | NUMBER
    | IDENTIFIER
    | IDENTIFIER '(' expr_list ')'

expr_list
    : ùúÜ 
    | ne_expr_list

ne_expr_list
    : expr
    | expr ',' ne_expr_list
```

## Grammar Definition (Lezer)

We implement the grammar exactly as defined in the EBNF above. To maintain a **1:1 structural mapping**, we avoid Lezer's built-in repetition operators (like `*` or `+`) and instead model lists using explicit recursion. This ensures that the resulting tree structure mirrors the formal derivation steps.

### Structural Nuances: Lambda and Recursion

* **The Empty Production ($\lambda$):**
  In formal grammar, $\lambda$ (or $\epsilon$) denotes an empty match. Lezer does not have a specific keyword for this; instead, we use an empty string literal `""`.
  * **EBNF:** `program : Œª`
  * **Lezer:** `Program { "" }`
  This allows the parser to complete a program successfully when no further statements follow.

* **Recursive Lists without Extra Nodes:**
  We use the `Program` rule itself recursively: 
  `Program { Stmnt Program | "" }`
  This creates a "right-leaning" chain in the Concrete Syntax Tree (CST), where each program consists of a statement followed by another program, or terminates with an empty match. This is ideal for demonstrating how recursive derivation works in practice.



### Token Specialization (Keywords)

A common issue in language design is that keywords like `if` or `while` are technically valid `Identifiers`. To ensure the parser distinguishes them correctly, we use `@specialize`:

```javascript
KwIf { @specialize<Identifier, "if"> }
```

This instructs the tokenizer: "First, read an Identifier. If the text is exactly 'if', convert the token into a `KwIf`." This prevents conflicts and ensures that variables such as `if_counter` are still recognized as normal identifiers.

### Tokenization and the "Slash Conflict"

The character `/` is ambiguous in our language as it serves three purposes:
1.  **Line Comment** (`//`)
2.  **Block Comment** (`/*`)
3.  **Division Operator** (`/`)

Without explicit guidance, the tokenizer might see the first `/`, match it as a division operator, and fail on the subsequent characters. We resolve this using the `@precedence` block within the `@tokens` section:

```javascript
@precedence { LineComment, BlockComment, OpDiv }
```

**The Logic:**
By defining this order, we tell Lezer: "If you encounter a slash, check if it forms a comment first (since these are longer matches). Only if it doesn't fit a comment pattern, treat it as a division operator." This is essential for the tokenizer to correctly "skip" comments.

In [None]:
const grammarString = `
    @top Script { Program }

    @tokens {
        Identifier { $[a-zA-Z] $[a-zA-Z0-9_]* }
        Number     { "0" | $[1-9] $[0-9]* }

        "+" "-" "*" "/" "%"
        ":=" "==" "!=" "<=" ">=" "<" ">"
        "(" ")" "{" "}"
        ";" "," 

        space { $[ \t\n\r]+ }
        LineComment { "//" ![\n]* }
        BlockComment { "/*" ( ![*] | "*" + ![*/] )* "*"+ "/" }
        @precedence { LineComment, BlockComment, "/" }
    }

    @skip { space | LineComment | BlockComment }

    KwIf    { @specialize<Identifier, "if"> }
    KwWhile { @specialize<Identifier, "while"> }

    Program {
        Stmnt Program | 
        ""
    }

    Stmnt {
        KwIf "(" BoolExpr ")" Stmnt |
        KwWhile "(" BoolExpr ")" Stmnt |
        "{" Program "}" |
        Identifier ":=" Expr ";" |
        Expr ";"
    }

    BoolExpr {
        Expr "==" Expr |
        Expr "!=" Expr |
        Expr "<=" Expr |
        Expr ">=" Expr |
        Expr "<"  Expr |
        Expr ">"  Expr
    }

    Expr {
        Expr "+" Product |
        Expr "-" Product |
        Product
    }

    Product {
        Product "*" Factor |
        Product "/" Factor | 
        Product "%" Factor |
        Factor
    }

    Factor {
        "(" Expr ")" |
        Number |
        Identifier |
        Identifier "(" ExprList ")"
    }

    ExprList {
        NeExprList | 
        ""
    }

    NeExprList {
        Expr |
        Expr "," NeExprList
    }
`;

In [None]:
const parser : LRParser = buildParser(grammarString);
"Parser generated successfully.";

In [None]:
function testScanner(fileName: string): void {
    const input: string = readFileSync(fileName, "utf8");
    console.log(input); 
    console.log(`--- Scanning ${fileName} ---`);
    console.log("Tokens:");
    
    const tree: Tree = parser.parse(input);
    const cursor: TreeCursor = tree.cursor();
        loop: while (true) {
        if (cursor.firstChild()) {
            continue;
        }
        const tokenText = input.slice(cursor.from, cursor.to);
        const safeText = tokenText.replace(/\n/g, "\\n");
        console.log(`[${cursor.name}]`.padEnd(15) + `: ${safeText}`);
        if (cursor.nextSibling()) {
            continue;
        }
        while (cursor.parent()) {
            if (cursor.nextSibling()) {
                continue loop;
            }
        }
        break;
    }
}

In [None]:
testScanner('sum.sl');

## CST to AST

While the **Concrete Syntax Tree (CST)** generated by Lezer contains every syntactical detail (including whitespace, comments, and specific nesting rules required by the grammar), it is too complex for direct evaluation.

We need to transform it into an **Abstract Syntax Tree (AST)**. Our AST design adopts a **Recursive Linked List** structure (often called *Cons-Cells* in functional programming). This mirrors the recursive nature of our grammar (e.g., a `Program` is a `Statement` followed by another `Program`).

* **Discriminated Unions (Tuples):** Nodes are represented as tuples where the first element is a "tag" (e.g., `['if', cond, body]`).
* **Recursive Lists (`ASTList`):** Blocks and function arguments are stored as nested pairs `[Head, Tail]`, ending with `null`.
* **Strict Types:** We define explicit types for Operators, Variables, and Literals to ensure type safety.

In [None]:
type Variable = string;
type Literal  = number;
type Operator = "+" | "-" | "*" | "/" | "%" | "==" | "!=" | "<" | ">" | "<=" | ">=";
type ASTList  = null | [AST, ASTList];

type BinOp      = [Operator, AST, AST];
type Assignment = [":=", Variable, AST];
type IfStmt     = ["if", AST, AST];
type WhileStmt  = ["while", AST, AST];
type Block      = ["block", ASTList];
type Call       = ["call", Variable, ASTList];

type AST = 
    | Variable 
    | Literal 
    | BinOp 
    | Assignment 
    | IfStmt 
    | WhileStmt 
    | Block 
    | Call;

### Type Guards & Helpers

Since our AST is built using **Discriminated Unions** (tuples), TypeScript needs help distinguishing between them at runtime.

We define a `helpers` object containing **Type Guards**. These functions serve two purposes:
1.  **Runtime Check:** They verify the structure of a node (e.g., is the first element `":="`?).
2.  **Compile-Time Narrowing:** They tell the TypeScript compiler exactly what type a node is. For example, `isOperator` verifies that a string is a valid mathematical or comparison operator, removing the need for unsafe casting (`as ...`).

In [None]:
const mathOps = new Set(["+", "-", "*", "/", "%"]);
const compOps = new Set(["==", "!=", "<", ">", "<=", ">="]);
const allOps  = new Set([...mathOps, ...compOps]);

const helpers = {
    isLiteral: (node: AST): node is number => typeof node === "number",
    isVariable: (node: AST): node is string => typeof node === "string",
    isOperator: (op: string): op is Operator => allOps.has(op),
    isAssignment: (node: AST): node is Assignment => 
        Array.isArray(node) && node[0] === ":=",
    isBlock: (node: AST): node is Block => 
        Array.isArray(node) && node[0] === "block",
    isIf: (node: AST): node is IfStmt => 
        Array.isArray(node) && node[0] === "if",
    isWhile: (node: AST): node is WhileStmt => 
        Array.isArray(node) && node[0] === "while",
    isCall: (node: AST): node is Call => 
        Array.isArray(node) && node[0] === "call",
    isMathOp: (node: AST): node is BinOp => 
        Array.isArray(node) && typeof node[0] === "string" && mathOps.has(node[0]),
    isCompOp: (node: AST): node is BinOp => 
        Array.isArray(node) && typeof node[0] === "string" && compOps.has(node[0])
};

### Forward Declarations

We need to declare our transformation functions before implementing them. This is necessary because our AST transformation is **mutually recursive**:
* Expressions can contain Function Calls.
* Function Calls contain Argument Lists.
* Argument Lists contain Expressions.

By declaring the function signatures first, we allow them to call each other regardless of definition order.

In [None]:
let transformAST: (cursor: TreeCursor, doc: string) => AST;
let transformExprList: (cursor: TreeCursor, doc: string) => ASTList;
let transformNeExprList: (cursor: TreeCursor, doc: string) => ASTList;

function getNodeText(cursor: TreeCursor, doc: string): string {
    return doc.slice(cursor.from, cursor.to);
}

function isNoise(name: string): boolean {
    return name === "LineComment" || name === "BlockComment";
}

function skipToNextMeaningful(cursor: TreeCursor): boolean {
    while (cursor.nextSibling()) {
        if (!isNoise(cursor.name)) return true;
    }
    return false;
}

### Recursive Transformation: Program Structure

Our grammar defines a program recursively (`Program -> Stmnt Program`). We preserve the recursive structure in our AST.

The `transformProgram` function traverses the CST and builds a **Linked List** (`ASTList`):
* It transforms the current statement (the **Head**).
* It recursively calls itself for the next sibling to get the rest of the list (the **Tail**).
* If no siblings remain, the tail is `null`.

In [None]:
function transformProgram(cursor: TreeCursor, doc: string): ASTList {
    const inner = cursor.node.cursor();
    if (!inner.firstChild()) return null;
    function build(c: TreeCursor): ASTList {
        if (c.name === "Stmnt") {
            const head = transformAST(c.node.cursor(), doc);
            if (c.nextSibling()) return [head, build(c)];
            return [head, null];
        }
        if (c.name === "Program") {
            const child = c.node.cursor();
            if (child.firstChild()) return build(child);
        }
        if (c.nextSibling()) return build(c);
        return null;
    }
    return build(inner);
}

### Recursive Transformation: Argument Lists

Function arguments (`ExprList`) follow the same recursive pattern as the program structure.

We implement helper functions to transform the nested `Expr , NeExprList` structure from the CST into our clean `ASTList` format (`[Expr, [Expr, null]]`).

In [None]:
transformExprList = (cursor: TreeCursor, doc: string): ASTList => {
    const inner = cursor.node.cursor();
    if (!inner.firstChild()) return null;
    if (inner.name === "NeExprList") {
        return transformNeExprList(inner.node.cursor(), doc);
    }
    return null;
};

In [None]:
transformNeExprList = (cursor: TreeCursor, doc: string): ASTList => {
    const inner = cursor.node.cursor(); // Enter NeExprList
    if (!inner.firstChild()) return null;
    const head = transformAST(inner.node.cursor(), doc);
    while (inner.nextSibling()) {
        if (inner.name === "NeExprList") {
            return [head, transformNeExprList(inner.node.cursor(), doc)];
        }
    }
    return [head, null];
};

### The Main Transformer

This function is the core of our parser. It traverses the CST using a `TreeCursor` and switches on the `nodeName` to determine the corresponding AST node.

**Key Transformations:**
1.  **Strict Operator Checking:** When parsing expressions like `BinOp`, we validate strings against our allowed `Operator` type using our helper to ensure type safety.
2.  **Control Flow:** constructs like `if` and `while` are mapped to their specific tuple formats.
3.  **Recursion:** Every child node is processed by recursively calling `transformAST`.

In [None]:
transformAST = (cursor: TreeCursor, doc: string): AST => {
    const nodeName = cursor.name;
    switch (nodeName) {
        case "Script":
        case "Program": {
            const stmts = transformProgram(cursor, doc);
            return ["block", stmts];
        }
        case "Stmnt": {
            cursor.firstChild();
            
            while (isNoise(cursor.name)) {
                if (!cursor.nextSibling()) break;
            }

            const first = cursor.name;
            
            if (first === "KwIf" || first === "KwWhile") {
                 const type = first === "KwIf" ? "if" : "while";
                 skipToNextMeaningful(cursor); 
                 skipToNextMeaningful(cursor); 
                 const cond = transformAST(cursor.node.cursor(), doc);
                 skipToNextMeaningful(cursor); 
                 skipToNextMeaningful(cursor); 
                 const body = transformAST(cursor.node.cursor(), doc);
                 return [type, cond, body];
            }

            if (first === "{") {
                let found = false;
                while (cursor.nextSibling()) {
                    if (cursor.name === "Program") {
                        found = true;
                        break;
                    }
                }
                return found ? transformAST(cursor.node.cursor(), doc) : ["block", null];
            }
            if (first === "Identifier") {
                const id = getNodeText(cursor, doc);
                if (cursor.nextSibling() && cursor.name === ":=") {
                    cursor.nextSibling(); 
                    const val = transformAST(cursor.node.cursor(), doc);
                    return [":=", id, val];
                }
                return transformAST(cursor.node.cursor(), doc);
            }
            return transformAST(cursor.node.cursor(), doc);
        }
        case "BoolExpr":
        case "Expr":
        case "Product": {
            const current = cursor.node.cursor();
            if (!current.firstChild()) return ""; 
            const left = transformAST(current.node.cursor(), doc);
            if (current.nextSibling()) {
                const opStr = getNodeText(current, doc);
                if (!helpers.isOperator(opStr)) {
                    throw new Error(`Transformer Error: Invalid operator '${opStr}'`);
                }
                const op = opStr;
                current.nextSibling();
                const right = transformAST(current.node.cursor(), doc);
                return [op, left, right];
            }
            return left;
        }
        case "Factor": {
            cursor.firstChild();  
            if (cursor.name === "(") {
                cursor.nextSibling(); 
                return transformAST(cursor.node.cursor(), doc);
            }
            if (cursor.name === "Number") {
                return Number(getNodeText(cursor, doc));
            }
            if (cursor.name === "Identifier") {
                const id = getNodeText(cursor, doc);
                const checkCall = cursor.node.cursor();
                if (checkCall.nextSibling() && checkCall.name === "(") {
                    checkCall.nextSibling(); 
                    const args = transformExprList(checkCall, doc);
                    return ["call", id, args];
                }
                return id;
            }
            return "";
        }
        default:
            if (isNoise(nodeName)) return "";
            throw new Error(`Transformer Error: Unknown node type '${nodeName}' at ${cursor.from}`);
    }
};

### The Parsing Wrapper

Finally, we combine the file reading, the Lezer parser (which produces the CST), and our `transformAST` function into a single `parse` utility.

In [None]:
function parse(fileName: string): AST {
    const source = readFileSync(fileName, "utf8");
    const tree = parser.parse(source);
    return transformAST(tree.cursor(), source);
}

## Visualizing the AST

To verify that our recursive structure is correct, we inspect the AST both textually and visually.

### 1. Raw Tuple Inspection
Using `console.dir`, we can see the raw **Linked List** structure. Notice the "staircase" pattern:
`['block', [stmt1, [stmt2, [stmt3, null]]]]`.
This confirms we are strictly following our recursive type definitions.

In [None]:
const astSum = parse("sum.sl");
console.dir(astSum, { depth: null })

In [None]:
const astFact = parse("factorial.sl");
console.dir(astFact, { depth: null })

### 2. Graphical Visualization (Graphviz)

We use our `AST2Dot` library to render the tree. The visualizer traverses the `ASTList` structure recursively, drawing edges from parent blocks to their children in the chain.

**Visual Guide:**
* **Boxes:** Structural nodes (Statements, Blocks).
* **Circles:** Leaf nodes (Variables, Numbers).
* **Edges:** Represent the flow of data or the sequence of statements.

In [None]:
import { astToDot } from "./AST2Dot";

In [None]:
const dotFact = astToDot(astFact);
display.html(viz.renderString(dotFact, { format: "svg" }));

In [None]:
const dotSum = astToDot(astSum);
display.html(viz.renderString(dotSum, { format: "svg" }));

## The Interpreter

The interpreter breathes life into our **Abstract Syntax Tree**. It recursively traverses the tree structure and performs the operations described by the nodes.

We divide the implementation into three specialized functions:
1.  **`execute`**: Handles **Statements** (side effects). It modifies the environment.
2.  **`evaluate`**: Handles **Expressions**. It returns a `number`.
3.  **`evaluateBool`**: Handles **Conditions**. It returns a `boolean`.

### State Management

We use a simple `Map` to represent the program's memory (Environment).
* **Keys:** Variable names (`string`).
* **Values:** Current values (`number`).

We also simulate **Input/Output** using a global `inputStream` array (acting as STDIN).

In [None]:
type Environment = Map<string, number>;
let inputStream: string[] = [];

let execute: (node: AST, env: Environment) => void;
let evaluate: (node: AST, env: Environment) => number;
let evaluateBool: (node: AST, env: Environment) => boolean;

### Executing Statements (`execute`)

The `execute` function manages the control flow. Since we are using linked lists, we use a recursive helper `executeList`.

**The Logic:**
* **Blocks:** `executeList` processes the **Head** (current statement) and then recursively calls itself with the **Tail** (rest of the statements).
* **Assignments:** Computes the value and updates the `env`.
* **Control Flow:** `if` and `while` delegate logic to `evaluateBool` and recursively call `execute` for their bodies.

In [None]:
const executeList = (list: ASTList, env: Environment) => {
    if (list === null) return;
    const [head, tail] = list;
    execute(head, env);
    executeList(tail, env);
};

execute = (node: AST, env: Environment): void => {    
    if (helpers.isLiteral(node) || helpers.isVariable(node)) return;

    if (helpers.isBlock(node)) {
        executeList(node[1], env);
    } else if (helpers.isAssignment(node)) {
        const id = node[1];
        const val = evaluate(node[2], env);
        env.set(id, val);
    } else if (helpers.isIf(node)) {
        if (evaluateBool(node[1], env)) {
            execute(node[2], env);
        }
    } else if (helpers.isWhile(node)) {
        while (evaluateBool(node[1], env)) {
            execute(node[2], env);
        }
    } else if (helpers.isCall(node) || helpers.isMathOp(node)) {
        evaluate(node, env);
    }
};

### Evaluating Conditions (`evaluateBool`)

This function determines truth values.
* **Comparisons:** It handles standard operators (`<`, `==`, etc.) by evaluating both sides.
* **Truthiness:** If a plain number is used as a condition, we follow C-style rules (`0` is false, everything else is true).

In [None]:
evaluateBool = (node: AST, env: Environment): boolean => {
    if (helpers.isCompOp(node)) {
        const op = node[0];
        const l = evaluate(node[1], env);
        const r = evaluate(node[2], env);

        switch (op) {
            case "==": return l === r;
            case "!=": return l !== r;
            case "<":  return l < r;
            case ">":  return l > r;
            case "<=": return l <= r;
            case ">=": return l >= r;
        }
    }
    return evaluate(node, env) !== 0;
};

### Evaluating Expressions (`evaluate`)

This function computes numerical values and handles Input/Output.

**Key Features:**
* **Recursion:** Math operations recursively evaluate their left and right operands.
* **I/O Logging:**
    * **`read`**: Fetches from `inputStream` and logs `<< STDIN`.
    * **`print`**: Evaluates the first argument in the list (the Head), logs it to `>> STDOUT`, and returns the value.
* **Safety:** We include exhaustive error handling for unknown nodes or invalid operations (like division by zero).

In [None]:
evaluate = (node: AST, env: Environment): number => {
    if (helpers.isLiteral(node)) return node;
    
    if (helpers.isVariable(node)) {
        const val = env.get(node);
        if (val === undefined) throw new Error(`Runtime Error: Undefined variable '${node}'`);
        return val;
    }

    if (helpers.isCall(node)) {
        const [_, fnName, args] = node;

        if (fnName === "print") {
            if (args !== null) {
                const [argNode, _] = args; 
                const val = evaluate(argNode, env);
                console.log(">> STDOUT:", val);
                return val;
            }
            console.log(">> STDOUT: 0");
            return 0;
        }

        if (fnName === "read") {
            const input = inputStream.shift();
            if (input === undefined) throw new Error("Runtime Error: STDIN empty!");
            
            const num = Number(input);
            if (isNaN(num)) throw new Error(`Runtime Error: Invalid input '${input}'`);
            console.log("<< STDIN: ", num);
            return num;
        }
        throw new Error(`Runtime Error: Unknown function '${fnName}'`);
    }

    if (helpers.isMathOp(node)) {
        const [op, left, right] = node;
        const l = evaluate(left, env);
        const r = evaluate(right, env);

        switch (op) {
            case "+": return l + r;
            case "-": return l - r;
            case "*": return l * r;
            case "/": 
                if (r === 0) throw new Error("Runtime Error: Division by zero");
                return Math.floor(l / r);
            case "%": return l % r;
            default:
                throw new Error(`Interpreter Error: Invalid Math Op '${op}'`);
        }
    }
    
    throw new Error(`Interpreter Error: Unhandled node ${JSON.stringify(node)}`);
};

## Execution

We combine everything into a `runProgram` function. It reads the source file, parses it into our recursive AST, initializes the environment, and triggers the recursive execution.

In [None]:
function runProgram(fileName: string, inputs: string[] = []) {
    console.log(`\n--- Executing File: ${fileName} ---`);
    console.log(`--- Inputs: [${inputs.join(", ")}] ---`);
    
    inputStream = [...inputs];

    try {
        const sourceCode = readFileSync(fileName, "utf8");
        const tree = parser.parse(sourceCode);
        const ast = transformAST(tree.cursor(), sourceCode);
        const env: Environment = new Map();

        console.log(">> Runtime Log:");
        execute(ast, env);

        console.log("\n--- Final Memory State ---");
        if (env.size === 0) {
            console.log("(empty)");
        } else {
            const sortedKeys = Array.from(env.keys()).sort();
            for (const key of sortedKeys) {
                const val = env.get(key);
                console.log(`  ${key.padEnd(10)} : ${val}`);
            }
        }

    } catch (e) {
        if (e instanceof Error) {
            console.error(`\n[Error] ${e.message}`);
        } else {
            console.error("\n[Unknown Error]", JSON.stringify(e));
        }
    }
}

In [None]:
runProgram('sum.sl', ["5+2"]);

In [None]:
runProgram('sum.sl');

In [None]:
runProgram('sum.sl', ["6", "2"]);

In [None]:
runProgram('factorial.sl', ["5"]);

In [None]:
runProgram('factorial.sl', ["hello"]);