In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

# AST to DOT

## Foundational Dependencies

To support the architectural requirements of the AST—specifically **Structural Equality** and efficient **Tree Traversal**—we integrate two specialized libraries.

### Value Semantics (`recursive-set`)
Standard JavaScript objects and arrays utilize **Reference Equality** ($A = B \iff \text{address}(A) = \text{address}(B)$). However, in compiler construction, we often require **Structural Equality**, where two nodes are considered identical if their contents are isomorphic, regardless of their memory location.

* **`Tuple`**: An immutable, hashable sequence primitive. Unlike a standard Array, two Tuples containing the same elements produce the same hash code and are considered equal. We use this as the internal storage mechanism for all AST nodes.
* **`Structural`**: An interface that enforces the implementation of `equals(other)` and `hashCode()`, allowing our custom AST nodes to be used efficiently in Sets and Maps (e.g., for memoization or common subexpression elimination).

### Syntax Tree Traversal (`@lezer/common`)
The Lezer parser produces a highly optimized, flat binary buffer rather than a traditional object tree.

* **`TreeCursor`**: To navigate this structure, we import the `TreeCursor`. This is a **stateful iterator** that traverses the Concrete Syntax Tree (CST) in a depth-first, pre-order sequence. It provides low-level access to the current node's type and position without the performance overhead of recursive object allocation.

In [None]:
import {Structural, Tuple } from 'recursive-set'
import { TreeCursor } from "@lezer/common";

## Operator
The `Operator` type defines the set of permissible arithmetic and logical operations within our language. We model this as a **String Union Type**, restricting valid values to a specific subset of literals. This ensures compile-time safety when constructing expressions.

Let $\mathcal{O}$ be the set of valid operators:
$$
\mathcal{O} = \{ "+", "-", "*", "/", "\%", "==", "!=", "<", ">", "<=", ">=" \}
$$

In our TypeScript implementation, this type acts as a constraint for the `BinaryExpr` node, ensuring that only defined operations can be represented in the syntax tree.

In [None]:
type Operator = "+" | "-" | "*" | "**" | "/" | "%" | "==" | "!=" | "<" | ">" | "<=" | ">=";

## ASTNode (Structural Equality)

The `ASTNode` serves as the abstract base class for all nodes in our syntax tree. Unlike traditional object-oriented implementations that rely on reference equality, this implementation enforces **Structural Equality**.

By implementing the `Structural` interface and wrapping internal data in a `Tuple`, two AST nodes are considered equal if and only if their content is identical, regardless of their memory address.

$$
Node_A \equiv Node_B \iff \text{Data}(Node_A) = \text{Data}(Node_B)
$$

**Key Characteristics:**
* **Immutability:** The internal state is stored in a read-only `Tuple`.
* **Hashability:** Nodes generate a unique hash code based on their content, allowing AST subtrees to be used as keys in Sets or Maps (useful for optimization passes like Common Subexpression Elimination).

In [None]:
abstract class ASTNode<T extends Structural> implements Structural {
    constructor(protected readonly data: T) {}
    get hashCode(): number { return this.data.hashCode; }
    equals(other: unknown): boolean {
        if (!(other instanceof ASTNode)) return false;
        if (this.constructor !== other.constructor) return false;
        return this.data.equals(other.data);
    }
    abstract toString(): string;
}

## Atomic Nodes (Terminals)

These nodes represent the leaves of the Abstract Syntax Tree. They contain raw data values and do not have child nodes that require recursive traversal.

* **`NumNode`**: Wraps a numeric literal.
    * *Data:* `Tuple<[number]>`
    * *Example:* `42`

* **`VarNode`**: Represents a variable identifier.
    * *Data:* `Tuple<[string]>`
    * *Example:* `x`, `counter`

* **`NilNode`**: Represents the empty set or a non-existent branch.
    * *Data:* Empty Tuple `[]`
    * *Usage:* Used as a sentinel value, for example, to represent the absence of an `else` block in an `IfNode`.

In [None]:
class NumNode extends ASTNode<Tuple<[number]>> {
    constructor(val: number) { super(new Tuple(val)); }
    get value(): number { return this.data.get(0); }
    toString() { return this.value.toString(); }
}
class VarNode extends ASTNode<Tuple<[string]>> {
    constructor(name: string) { super(new Tuple(name)); }
    get name(): string { return this.data.get(0); }
    toString() { return this.name; }
}
class NilNode extends ASTNode<Tuple<[]>> {
    constructor() { super(new Tuple()); }
    toString() { return "∅"; }
}

## The AST Union Type

We define the **Recursive Union Type** `AST`. This type acts as the **closure** over all possible node variations.

$$
\text{AST} = \text{Leaves} \cup \text{Composites}
$$

By defining `AST` as the union of all concrete classes (`NumNode | BinaryExpr | ...`), TypeScript allows us to use these types recursively in class definitions (e.g., `BinaryExpr` contains `AST`). This union is the central type used throughout the interpreter's traversal algorithms.

### BinaryExpr

The `BinaryExpr` node models a binary operation $\mathcal{B}$ comprising a left operand, an operator, and a right operand.

$$
\mathcal{B} = (\text{left}, \text{op}, \text{right})
$$

* **`left`** ($\text{AST}$): The first operand.
* **`op`** ($\text{Operator}$): The operation symbol (e.g., `+`).
* **`right`** ($\text{AST}$): The second operand.

This recursive structure allows us to build complex expression trees, such as $3 + (x * 5)$, by nesting nodes.

### Assignment and Expressions

* **`AssignNode`**: Represents a side-effect where a value is bound to a variable.
    * Structure: $(\text{Identifier}, \text{Expression})$
    * Logic: $id \leftarrow expr$

* **`ExprStmtNode`**: A wrapper node that lifts an expression (like a function call or assignment) into a statement context. This distinguishes between "calculating a value" and "executing a line of code".

### Control Flow Nodes

These nodes direct the execution flow of the program based on boolean conditions.

* **`WhileNode`**: Represents a loop.
    * Structure: $(\text{Condition}, \text{Body})$
    * Logic: Execute `Body` repeatedly as long as `Condition` evaluates to true.

* **`IfNode`**: Represents conditional branching.
    * Structure: $(\text{Condition}, \text{ThenBranch}, \text{ElseBranch})$
    * Note: The `ElseBranch` is mandatory in the data structure. If the source code lacks an `else` clause, this field is populated with a `NilNode`.
    
### Grouping and Calls

* **`BlockNode`**: Represents a scope or a sequence of statements enclosed in braces `{ ... }`.
    * Data: A list of AST nodes $[s_1, s_2, \dots, s_n]$.

* **`CallNode`**: Represents a function invocation.
    * Structure: $(\text{FunctionName}, \text{Arguments})$
    * *Implementation Detail:* To maintain structural equality, the arguments array is spread into a nested `Tuple`.

In [None]:
type AST = 
    | NumNode | VarNode | NilNode 
    | BinaryExpr | AssignNode | IfNode | WhileNode 
    | CallNode | ExprStmtNode | BlockNode;

class BinaryExpr extends ASTNode<Tuple<[AST, string, AST]>> {
    constructor(left: AST, op: string, right: AST) { 
        super(new Tuple(left, op, right)); 
    }
    get left(): AST { return this.data.get(0); }
    get op(): string { return this.data.get(1); }
    get right(): AST { return this.data.get(2); }
    toString() { return `(${this.left} ${this.op} ${this.right})`; }
}

class AssignNode extends ASTNode<Tuple<[string, AST]>> {
    constructor(id: string, expr: AST) { super(new Tuple(id, expr)); }
    get id(): string { return this.data.get(0); }
    get expr(): AST { return this.data.get(1); }
    toString() { return `${this.id} := ${this.expr}`; }
}

class IfNode extends ASTNode<Tuple<[AST, AST, AST]>> {
    constructor(cond: AST, thenB: AST, elseB?: AST) {
        super(new Tuple(cond, thenB, elseB ?? new NilNode())); 
    }
    get cond(): AST { return this.data.get(0); }
    get thenB(): AST { return this.data.get(1); }
    get elseB(): AST { return this.data.get(2); }
    toString() { return "if"; }
}

class WhileNode extends ASTNode<Tuple<[AST, AST]>> {
    constructor(cond: AST, body: AST) { super(new Tuple(cond, body)); }
    get cond(): AST { return this.data.get(0); }
    get body(): AST { return this.data.get(1); }
    toString() { return "while"; }
}

class CallNode extends ASTNode<Tuple<[string, Tuple<AST[]>]>> {
    constructor(fn: string, args: AST[]) { super(new Tuple(fn, new Tuple(...args))); }
    get fn(): string { return this.data.get(0); }
    get args(): AST[] { return [...this.data.get(1)]; }
    toString() { return `${this.fn}(...)`; }
}

class ExprStmtNode extends ASTNode<Tuple<[AST]>> {
    constructor(expr: AST) { super(new Tuple(expr)); }
    get expr(): AST { return this.data.get(0); }
    toString() { return "stmt"; }
}

class BlockNode extends ASTNode<Tuple<AST[]>> {
    constructor(stmts: AST[]) { super(new Tuple(...stmts)); }
    get statements(): AST[] { return [...this.data]; }
    toString() { return "{...}"; }
}

## Type Guards and Assertions

While TypeScript provides static type safety, the `AST` union type requires runtime verification when accessed in contexts where a specific node type is expected (e.g., ensuring a function body is a `BlockNode`).

We define specific **Assertion Functions** $P(n)$ that validate the type of a node $n$ at runtime.

$$
P(n) = \begin{cases} 
n.\text{property} & \text{if } n \in \text{ExpectedType} \\
\text{Error}(\dots) & \text{otherwise}
\end{cases}
$$

* **`getVarName(node)`**: Ensures the node is a `VarNode` and returns the identifier string. This is crucial for "L-Value" resolution during assignment.
* **`getBlockStmts(node)`**: Ensures the node is a `BlockNode` and retrieves the statement list. This is used when a control flow construct (like `If` or `While`) expects a compound statement body.

These helpers replace unsafe type casting (e.g., `node as VarNode`), preventing silent failures by throwing explicit errors if the structure of the AST is malformed.

In [None]:
function getVarName(node: AST): string {
    if (node instanceof VarNode) return node.name;
    throw new Error(`AST Error: Expected VarNode, got ${node.constructor.name}`);
}

function getBlockStmts(node: AST): AST[] {
    if (node instanceof BlockNode) return node.statements;
    throw new Error(`AST Error: Expected BlockNode, got ${node.constructor.name}`);
}

## Tree Visualization (`ast2dot`)

To verify the structural integrity of the AST, we implement a generator that serializes the tree into the **DOT** graph description language. This allows us to render the abstract structure as a visual diagram $G = (V, E)$.

**The Algorithm:**
The function performs a **Depth-First Pre-Order Traversal** of the AST. We maintain a global counter `idCounter` to assign a unique integer ID to every visited node $v \in V$.

For every node $n$ visited:
1.  **Node Declaration**: A uniquely identifier vertex `n{id}` is created. Visual attributes (color, shape, label) are assigned based on the node's class (e.g., `IfNode` is colored red, `NumNode` yellow).
2.  **Edge Generation**: Directed edges $(n, c)$ are drawn to all children $c$.
    * Edges are labeled to indicate semantic roles (e.g., "cond" for the condition branch of a loop, "L" and "R" for binary operands).
3.  **Recursion**: The function calls itself recursively for every child node, returning the child's ID to link it to the parent.

**Helper: HTML Escaping**
Since DOT labels allow HTML-like formatting, special characters in operators (like `<` or `>`) must be escaped to their entity equivalents (`&lt;`, `&gt;`) to prevent syntax errors in the graph definition.

In [None]:
function escapeHtml(s: string): string {
    return s.replace(/&/g, "&amp;").replace(/</g, "&lt;").replace(/>/g, "&gt;");
}

function ast2dot(root: AST): string {
    const lines: string[] = [
        'digraph AST {',
        '  node [fontname="Helvetica", fontsize=10, style="filled"];',
        '  edge [fontname="Helvetica", fontsize=9];',
        '  splines=false;',
        ''
    ];

    let idCounter = 0;

    function traverse(node: AST): number {
        const id = idCounter++;
        const myName = `n${id}`;
        
        let label = "";
        let color = "#ffffff";
        let shape = "box"; 
        let extraAttrs = ""; 
        let isBold = true;
        
        const edges: { target: number, label?: string }[] = [];

        if (node instanceof NumNode) {
            label = node.value.toString();
            color = "#ffffff"; 
            shape = "circle";
            extraAttrs = ', width=0.5, fixedsize=true';
            isBold = false;
        } 
        else if (node instanceof VarNode) {
            label = node.name;
            color = "#ffffff";
            shape = "circle";
            extraAttrs = ', width=0.5, fixedsize=true';
            isBold = false;
        }
        else if (node instanceof NilNode) {
            label = "∅";
            color = "#ffffff";
            shape = "circle";
            extraAttrs = ', width=0.5, fixedsize=true';
            isBold = false;
        }

        else if (node instanceof BinaryExpr) {
            label = escapeHtml(node.op);
            color = "#d1e7dd"; 
            edges.push({ target: traverse(node.left), label: "L" });
            edges.push({ target: traverse(node.right), label: "R" });
        }
        else if (node instanceof AssignNode) {
            label = ":=";
            color = "#cfe2ff"; 
            const idLeaf = idCounter++;
            lines.push(`  n${idLeaf} [label=<${node.id}>, shape=circle, fillcolor="white", width=0.5, fixedsize=true];`);
            
            edges.push({ target: idLeaf, label: "id" });
            edges.push({ target: traverse(node.expr), label: "val" });
        }
        else if (node instanceof IfNode) {
            label = "IF";
            color = "#f8d7da"; 
            edges.push({ target: traverse(node.cond), label: "cond" });
            edges.push({ target: traverse(node.thenB), label: "then" });
            if (!(node.elseB instanceof NilNode)) {
                edges.push({ target: traverse(node.elseB), label: "else" });
            }
        }
        else if (node instanceof WhileNode) {
            label = "WHILE";
            color = "#f8d7da"; 
            edges.push({ target: traverse(node.cond), label: "cond" });
            edges.push({ target: traverse(node.body), label: "do" });
        }
        else if (node instanceof CallNode) {
            label = `${node.fn}()`;
            color = "#e0cffc"; 
            node.args.forEach((arg, i) => {
                edges.push({ target: traverse(arg), label: `${i}` });
            });
        }
        else if (node instanceof ExprStmtNode) {
            label = "expr";
            color = "#fdfdfe"; 
            edges.push({ target: traverse(node.expr) });
        }
        else if (node instanceof BlockNode) {
            label = "{...}";
            color = "#f8f9fa"; 
            node.statements.forEach((stmt, i) => {
                edges.push({ target: traverse(stmt), label: `${i}` });
            });
        }

        const finalLabel = isBold ? `<b>${label}</b>` : label;
        lines.push(`  ${myName} [label=<${finalLabel}>, fillcolor="${color}", shape="${shape}"${extraAttrs}];`);
        for (const e of edges) {
            const attr = e.label ? ` [label="${e.label}"]` : "";
            lines.push(`  ${myName} -> n${e.target}${attr};`);
        }

        return id;
    }

    traverse(root);
    lines.push('}');
    return lines.join('\n');
}

## CST to AST Mapping

The `cstToAST` function acts as the bridge between the raw parsing stage and the semantic analysis stage. It transforms the **Concrete Syntax Tree (CST)**—which contains parsing artifacts like whitespace and comments—into our clean **Abstract Syntax Tree (AST)**.

### Configuration Types
The transformation is data-driven, defined by a `ParserConfig` object:
* **`ignore`**: A set of CST node names to discard (e.g., `"Comment"`, `"("`, `")"`).
* **`rules`**: A mapping table $\mathcal{R}$ where keys are CST node names and values are transformation functions `TransformRule`.

$$
\text{Rule}: (\text{ChildNodes}[], \text{RawText}) \to \text{ASTNode}
$$

In [None]:
type TransformRule = (children: AST[], text: string) => AST;
interface ParserConfig {
    ignore: Set<string>;
    rules: Record<string, TransformRule>;
}

### The Recursive Algorithm
The transformation function $\mathcal{T}(cursor)$ proceeds as follows:

1.  **Traversal & Filtering**: The cursor iterates over the children of the current CST node. If a child's name is in the `ignore` set, it is skipped. Otherwise, $\mathcal{T}$ is called recursively.
2.  **Rule Application**:
    * If a rule exists for the current node type in $\mathcal{R}$, it is executed with the processed children.
    * *Example:* A `BinaryExpression` CST node is transformed into a `BinaryExpr` AST node using its three children (left, op, right).
3.  **Auto-Unwrapping (Fallback)**:
    * If no rule is defined and the node has exactly one child, the function returns that child directly. This effectively collapses unnecessary nesting (e.g., `Statement -> ExprStmt -> BinaryExpr` becomes just `BinaryExpr`).
    * If no rule matches and the structure cannot be unwrapped, a fatal error is raised, indicating a gap in the grammar definition.

In [None]:
function cstToAST(
    cursor: TreeCursor,
    source: string,
    config: ParserConfig
): AST {
    const kind = cursor.name;
    const text = source.slice(cursor.from, cursor.to);
    const children: AST[] = [];
    if (cursor.firstChild()) {
        do {
            if (!config.ignore.has(cursor.name)) {
                children.push(cstToAST(cursor, source, config));
            }
        } while (cursor.nextSibling());
        cursor.parent();
    }
    const rule = config.rules[kind];
    if (rule) return rule(children, text);
    if (children.length === 1) return children[0];
    throw new Error(`Parsing failed: No rule or unwrap path for CST node '${kind}' ("${text}")`);
}