In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

## Imports and Library Setup

In [None]:
import { buildParser } from '@lezer/generator';
import { Tree, TreeCursor } from '@lezer/common';
import { LRParser } from '@lezer/lr';
import { RecursiveSet, RecursiveMap, Tuple } from "recursive-set";

# A Symbolic Calculator with Lezer \& RecursiveSet

This notebook demonstrates the implementation of a **symbolic calculator** that parses, analyzes, and evaluates assignments and arithmetic expressions.

The distinctive feature of this implementation is the use of **value semantics** for the Abstract Syntax Tree (AST) and the environment. Instead of ordinary JavaScript objects, we utilize the `recursive-set` library to enable structural equality and hash-based lookup.

## Language Grammar

The grammar for our calculator language is formally defined as follows:

$$
\begin{array}{lcl}
\texttt{stmnt} & \rightarrow & \texttt{IDENTIFIER}\; \texttt{':='}\; \texttt{expr}\; \texttt{';'} \\
& \mid & \texttt{expr}\; \texttt{';'} \\[0.3cm]
\texttt{expr} & \rightarrow & \texttt{expr}\; \texttt{'+'}\; \texttt{product} \\
& \mid & \texttt{expr}\; \texttt{'-'}\; \texttt{product} \\
& \mid & \texttt{product} \\[0.3cm]
\texttt{product} & \rightarrow & \texttt{product}\; \texttt{'*'}\; \texttt{factor} \\
& \mid & \texttt{product}\; \texttt{'/'}\; \texttt{factor} \\
& \mid & \texttt{factor} \\[0.3cm]
\texttt{factor} & \rightarrow & \texttt{'('}\; \texttt{expr}\; \texttt{')'} \\
& \mid & \texttt{'+'}\; \texttt{factor} \\
& \mid & \texttt{'-'}\; \texttt{factor} \\
& \mid & \texttt{NUMBER} \\
& \mid & \texttt{IDENTIFIER}
\end{array}
$$

This grammar ensures proper operator precedence (multiplication and division before addition and subtraction) through its hierarchical structure. Left recursion in the $\texttt{expr}$ and $\texttt{product}$ rules guarantees left-associative evaluation.

## Architecture & Design Philosophy

The interpreter pipeline consists of four distinct stages:

1. **Grammar Definition:** We declaratively specify the language syntax using **Lezer**.
2. **Runtime Parsing:** `buildParser` generates an efficient LR parser from the grammar.
3. **AST Transformation (Generic Mapper):** The flat Lezer tree is transformed into a typed AST using a declarative mapping configuration (`cstToAST`).
4. **Semantic Analysis & Evaluation:** We employ `RecursiveSet` for variable analysis and `RecursiveMap` as a variable store.

### Why Tuple and RecursiveSet?

In classical AST implementations, two nodes `new BinOp("+", 1, 2)` and `new BinOp("+", 1, 2)` are **not equal** under JavaScript's default reference equality.

Our `ASTNode` base class solves this by wrapping the data in a `Tuple`, providing **Structural Equality**:
- **Identity:** Two AST nodes with identical content are logically equivalent.
- **Hashing:** AST nodes can serve as keys in maps or elements in sets.

In [None]:
import { buildParser } from '@lezer/generator';
import { TreeCursor } from '@lezer/common';
import { LRParser } from '@lezer/lr'
import { RecursiveSet, RecursiveMap } from "recursive-set";

## Domain Types (The AST)

We define the language constructs as classes inheriting from `ASTNode`. This ensures type safety while maintaining the structural equality properties provided by the underlying `Tuple`.

In [None]:
import { 
    AST, NumNode, VarNode, BinaryExpr, AssignNode, ExprStmtNode, 
    cstToAST, ParserConfig, ast2dot 
} from "./AST2Dot";

type Expr = NumNode | VarNode | BinaryExpr;
type Stmnt = AssignNode | ExprStmtNode;

**Type Hierarchy:**

$$
\begin{array}{lcl}
\texttt{Expr} & ::= & \texttt{Num} \mid \texttt{Var} \mid \texttt{BinOp} \\
\texttt{Stmnt} & ::= & \texttt{Assign} \mid \texttt{ExprStmnt}
\end{array}
$$

## Grammar Definition (Lezer)

The grammar is specified declaratively using Lezer's syntax. Uppercase rule names (e.g., `Expr`, `Factor`) ensure that Lezer creates dedicated nodes in the parse tree, simplifying subsequent traversal.

- **Left Recursion Resolution:** Achieved through repetition patterns `Product ((Plus | Minus) Product)*`
- **Precedence Rules:** Multiplication/division precedence is enforced through nesting of `Expr` (additive operations) and `Product` (multiplicative operations)

In [None]:
const grammarDefinition = `
    @top Program { statement+ }

    statement {
        Assignment { Identifier ":=" Expr ";" } |
        ExpressionStatement { Expr ";" }
    }

    Expr {
        Product ((Plus | Minus) Product)*
    }

    Product {
        Factor ((Mul | Div) Factor)*
    }

    Factor {
        ParenExpr { "(" Expr ")" } |
        UnaryExpr { (Plus | Minus) Factor } |
        Number |
        Identifier
    }

    @tokens {
        Number { (("0" | $[1-9] $[0-9]*) ("." $[0-9]*)?) }
        Identifier { $[a-zA-Z] $[a-zA-Z0-9_]* }
        Plus { "+" }
        Minus { "-" }
        Mul { "*" }
        Div { "/" }
        space { $[ \t\r]+ }
        "(" ")" ";" ":="
    }

    @skip { space }
`;

const parser: LRParser = buildParser(grammarDefinition);

### Tokenization Helper

In Lezer, tokenization is integrated with parsing. For debugging purposes, `tokenizeCalc` extracts tokens by traversing the parse tree and filtering leaf nodes.

In [None]:
function tokenizeCalc(input: string): string[] {
    const tree = parser.parse(input);
    const cursor = tree.cursor();
    const tokens: string[] = [];
    const structuralNodes = new Set([
        "Program",
        "statement",
        "Assignment",
        "ExpressionStatement",
        "Expr",
        "Product",
        "Factor",
        "UnaryExpr",
        "ParenExpr",
    ]);
    do {
        if (!structuralNodes.has(cursor.name)) {
            const content = input.slice(cursor.from, cursor.to);
            let name = cursor.name;
            if (name === ":=") name = "Assign";
            if (name === ";") name = "Semicolon";

            tokens.push(`${name}('${content}')`);
        }
    } while (cursor.next());
    return tokens;
}


Example: 

In [None]:
console.log(tokenizeCalc("x := 3 + 4 * 2;"));

## Parser Logic (Generic CST to AST Transformation)

Instead of a monolithic switch-statement, we use the generic `cstToAST` mapper. The transformation logic is defined declaratively in a `ParserConfig` object.

**Key Features:**

- **Declarative Rules:** We map Lezer node names (e.g., "Assignment") directly to constructor calls (e.g., `new AssignNode(...)`).
- **Left Associativity (`reduceBinary`):**
  Arithmetic expressions like `1 + 2 + 3` are parsed by Lezer as a flat list: `[Expr, "+", Expr, "+", Expr]`.
  The helper function `reduceBinary` folds this list from left to right into a proper tree structure: `((1 + 2) + 3)`.
- **Type Safety:** We use `instanceof` guards (e.g., in `Assignment`) to ensure that structural requirements are met at runtime.

In [None]:
function reduceBinary(children: AST[], _text: string): AST {
    let left = children[0];
    for (let i = 1; i < children.length; i += 2) {
        const opNode = children[i];
        if (!(opNode instanceof VarNode)) throw new Error("Expected Operator");
        
        const right = children[i + 1];
        left = new BinaryExpr(left, opNode.name, right);
    }
    return left;
}

const config: ParserConfig = {
    ignore: new Set(["(", ")", ";", ":="]),
    
    rules: {
        "Number": (_, text) => new NumNode(parseFloat(text)),
        "Identifier": (_, text) => new VarNode(text),
        "Plus": () => new VarNode("+"), "Minus": () => new VarNode("-"),
        "Mul": () => new VarNode("*"),  "Div": () => new VarNode("/"),
        "Assignment": (children) => {
            const idNode = children[0];
            if (idNode instanceof VarNode) {
                return new AssignNode(idNode.name, children[1]);
            }
            throw new Error("Invalid Assignment Target");
        },
        "ExpressionStatement": (children) => new ExprStmtNode(children[0]),
        "Program": (children) => children[0], 
        "Expr": reduceBinary,
        "Product": reduceBinary,
        "UnaryExpr": (children) => {
            const opNode = children[0];
            const val = children[1];
            
            if (opNode instanceof VarNode && opNode.name === "-") {
                return new BinaryExpr(new NumNode(-1), "*", val);
            }
            return val;
        }
    }
};

function parseStmnt(input: string): Stmnt {
    const tree = parser.parse(input);
    const ast = cstToAST(tree.cursor(), input, config);
    if (ast instanceof AssignNode || ast instanceof ExprStmtNode) {
        return ast;
    }
    throw new Error("Erwarte ein Statement (Zuweisung oder Ausdruck)");
}

## Static Analysis

Instead of a simple `Record<string, number>`, we use `RecursiveMap` for the environment.

**Advantages:**

- **Efficiency:** Optimized hashing for string keys
- **Deterministic Output:** `env.toString()` produces sorted, deterministic output of the storage state

Additionally, we implement `collectVars`, which collects all used variables from an expression into a `RecursiveSet<string>`. This demonstrates how easily set operations can be performed with the library.

In [None]:
function collectVars(node: AST): RecursiveSet<string> {
    const vars = new RecursiveSet<string>();
    
    function visit(n: AST) {
        if (n instanceof VarNode) vars.add(n.name);
        else if (n instanceof AssignNode) { vars.add(n.id); visit(n.expr); }
        else if (n instanceof ExprStmtNode) visit(n.expr);
        else if (n instanceof BinaryExpr) { visit(n.left); visit(n.right); }
    }
    
    visit(node);
    return vars;
}

## Operational Semantics

The evaluation phase defines the semantic meaning of our AST nodes through mathematical operations on the abstract machine state.

### Expression Evaluation

The evaluation function $\mathcal{E}: \texttt{Expr} \times \texttt{Env} \to \mathbb{R}$ is defined recursively:

$$
\begin{array}{lcl}
\mathcal{E}(\texttt{Num}(n), \rho) & = & n \\[0.2cm]
\mathcal{E}(\texttt{Var}(x), \rho) & = & \rho(x) \\[0.2cm]
\mathcal{E}(\texttt{BinOp}(\oplus, e_1, e_2), \rho) & = & \mathcal{E}(e_1, \rho) \oplus \mathcal{E}(e_2, \rho)
\end{array}
$$

where $\rho: \texttt{String} \to \mathbb{R}$ represents the environment and $\oplus \in \{+, -, \times, \div\}$.

In [None]:
type Env = RecursiveMap<string, number>;

function evalExpr(e: AST, env: Env): number {
    if (e instanceof NumNode) return e.value;
    
    if (e instanceof VarNode) {
        const val = env.get(e.name);
        if (val === undefined) throw new Error(`Undefined variable: ${e.name}`);
        return val;
    }
    
    if (e instanceof BinaryExpr) {
        const l = evalExpr(e.left, env);
        const r = evalExpr(e.right, env);
        switch (e.op) {
            case "+": return l + r;
            case "-": return l - r;
            case "*": return l * r;
            case "/": return l / r;
            default: throw new Error(`Unknown op: ${e.op}`);
        }
    }
    throw new Error(`Cannot evaluate node type: ${e.constructor.name}`);
}

### Statement Execution

The execution function $\mathcal{S}: \texttt{Stmnt} \times \texttt{Env} \to \mathbb{R}$ produces a result and updates the environment via side effects:

$$
\begin{array}{lcl}
\mathcal{S}(\texttt{Assign}(x, e), \rho) & = & v \quad \text{where } v = \mathcal{E}(e, \rho) \text{ and } \rho \text{ is updated to } \rho[x \mapsto v] \\[0.2cm]
\mathcal{S}(\texttt{ExprStmnt}(e), \rho) & = & \mathcal{E}(e, \rho)
\end{array}
$$

In [None]:
function exec(s: Stmnt, env: Env): number {
    if (s instanceof AssignNode) {
        const val = evalExpr(s.expr, env);
        env.set(s.id, val);
        return val;
    }
    if (s instanceof ExprStmtNode) {
        return evalExpr(s.expr, env);
    }
    return 0;
}

Set `inputProgram`for the Calculator:

In [None]:
const inputProgram = `
x := 10 + 5 * 2;
y := (x - 5) / 3;
z := -y * 2;
z + 100;
`;

## Main Execution Loop

The main function orchestrates the Read-Eval-Print Loop (REPL):

In [None]:
const env = new RecursiveMap<string, number>();

const lines = inputProgram
    .split("\n")
    .map((l) => l.trim())
    .filter((l) => l.length > 0);

console.log("--- Calculation Start ---");

for (const line of inputProgram.split("\n")) {
    const l = line.trim();
    if (!l) continue;

    try {
        const ast = parseStmnt(l);
        const res = exec(ast, env);
        const vars = collectVars(ast);
        
        // Dank ASTNode.toString() ist der Output hier wundersch√∂n formatiert
        console.log(`Code:   ${l}`);
        console.log(`AST:    ${ast}`);
        console.log(`Vars:   ${vars}`);
        console.log(`Result: ${res}`);
        console.log("-".repeat(30));
        
    } catch (e) {
        console.error(`Error: ${(e as Error).message}`);
    }
}

console.log("\n--- Final Environment ---");
console.log(env.toString());