In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

# An EBNF based Parser for Arithmetic Expressions with AST Visualization

In this notebook we implement an <span style="font-variant:small-caps;">Ebnf</span> recursive-descend parser for arithmetic expressions. This parser implements the following <span style="font-variant:small-caps;">Ebnf</span> grammar:

$$
\begin{eqnarray*}
\mathrm{expr}    & \rightarrow & \mathrm{product}\;\;\bigl((\texttt{'+'}\;|\;\texttt{'-'})\;\; \mathrm{product}\bigr)^* \\[0.2cm]
\mathrm{product} & \rightarrow & \mathrm{factor} \;\;\bigl((\texttt{'*'}\;|\;\texttt{'/'})\;\; \mathrm{factor}\bigr)^* \\[0.2cm]   
\mathrm{factor}  & \rightarrow & \texttt{'('} \;\;\mathrm{expr} \;\;\texttt{')'}                                     \\
                 & \mid        & \texttt{NUMBER} \\
                 & \mid        & \texttt{VARIABLE}
\end{eqnarray*}
$$

Unlike the previous notebook which computed results immediately, this parser constructs an **Abstract Syntax Tree (AST)**.
To ensure compatibility with our visualization tool (`AST2Dot`), we construct **Typed Objects** (Nodes) instead of simple tuples.

## The Scanner

The scanner is extended to also recognize variables (starting with a letter).

**Input:**
* `s`: A string containing the arithmetic expression.

**Output:**
* Returns a `string[]` of tokens, filtering out whitespace.

In [None]:
function tokenize(s: string): string[] {
    const lexSpec = /[a-z][a-z0-9]*|[1-9][0-9]*|0|[-+*/()]/g;
    const tokenList = s.match(lexSpec) || [];
    return tokenList.filter(t => t.trim() !== '');
}

In [None]:
tokenize('12 * x + y1 * 4 / 6 - z3');

## Defining the Abstract Syntax Tree (AST)

We import the strictly typed AST definitions from our `AST2Dot` library.
Instead of using primitive types or simple objects, we now use **Classes** that extend a common `ASTNode`. This ensures that every node in our tree has a unique structural identity.

The AST is defined as a recursive union of these classes:

1.  **Leafs**: 
    * `NumNode` wraps a `number`.
    * `VarNode` wraps a `string` (variable name).
2.  **Composites**: 
    * `BinaryExpr` represents operations like `Left + Right`.

In [None]:
import { AST, BinaryExpr, NumNode, VarNode, Operator } from "./AST2Dot";

In [None]:
type TokenList = string[];
type ParseResult = [AST, TokenList];

let parseExpr: (TL: TokenList) => ParseResult;
let parseProduct: (TL: TokenList) => ParseResult;
let parseFactor: (TL: TokenList) => ParseResult;

The function `parse` serves as the entry point.

**Input:**
* `s`: The input string.

**Output:**
* Returns the root `AST` node.

It ensures the entire string is consumed.

In [None]:
function parse(s: string): AST {
    const TL = tokenize(s);
    const [result, rest] = parseExpr(TL);
    if (rest.length > 0)
        throw new Error(`Parse Error: could not parse remaining tokens: ${rest}`);
    return result;
}

The function `parseExpr` constructs the AST for expressions (addition/subtraction).

**Input:**
* `TL`: TokenList.

**Output:**
* `[AST, TokenList]`: The constructed tree node and remaining tokens.

**Logic:**
It parses a `product` first. Then, using a **while loop**, it consumes `+` or `-` operators.
Inside the loop, it constructs a new **`BinaryExpr` object**. This creates a **left-associative** tree structure.

In [None]:
parseExpr = function(TL: TokenList): ParseResult {
    let [result, rest] = parseProduct(TL);

    while (rest.length > 0 && (rest[0] === '+' || rest[0] === '-')) {
        const operator : Operator = rest[0];
        const [arg, nextRest] = parseProduct(rest.slice(1));
        result = new BinaryExpr(result, operator, arg);
        rest = nextRest;
    }

    return [result, rest];
};

The function `parseProduct` constructs the AST for terms (multiplication/division).

**Input:**
* `TL`: TokenList.

**Output:**
* `[AST, TokenList]`

**Logic:**
Similar to `parseExpr`, but binds tighter. It consumes `*` or `/` operators and constructs `BinaryExpr` objects, growing the tree from the bottom up (left-associative).

In [None]:
parseProduct = function(TL: TokenList): ParseResult {
    let [result, rest] = parseFactor(TL);
    while (rest.length > 0 && (rest[0] === '*' || rest[0] === '/')) {
        const operator : Operator = rest[0];
        const [arg, nextRest] = parseFactor(rest.slice(1));

        result = new BinaryExpr(result, operator, arg);
        rest = nextRest;
    }
    return [result, rest];
};

The function `parseFactor` handles atomic elements.

**Input:**
* `TL`: TokenList.

**Output:**
* `[AST, TokenList]`

**Logic:**
* `(`: Recursively parses an expression.
* `Number`: Wraps the value in a **`NumNode`**.
* `Variable`: Wraps the name in a **`VarNode`**.

In [None]:
parseFactor = function(TL: TokenList): ParseResult {
    const [head, ...RL] = TL;

    if (head === '(') {
        const [expr, rest] = parseExpr(RL);
        if (rest[0] !== ')') {
            throw new Error(`ERROR: ')' expected, got ${rest[0]}`);
        }
        return [expr, rest.slice(1)];
    } 
    else if (!isNaN(Number(head))) {
        return [new NumNode(parseFloat(head)), RL];
    } 
    else {
        return [new VarNode(head), RL];
    }
};

## Drawing Abstract Syntax Trees with GraphViz

We use `@viz-js/viz` to render the AST directly in the notebook without needing external file system calls.

The function `ast2dot` converts our recursive AST tuple into a DOT language string.

In [None]:
import { ast2dot } from "./AST2Dot";
import { instance } from "@viz-js/viz";

const viz = await instance();

## Testing

The `visualize` function parses an expression and renders the resulting SVG immediately.

In [None]:
function visualize(s: string): void {
    const tree: AST = parse(s);
    const dotString : string = ast2dot(tree);
    display.html(viz.renderString(dotString, { format: "svg" }));
}

In [None]:
visualize('12 * y * x + 14 * z / 6 - x');

In [None]:
visualize('2 * x + y * y - z / (x * x + y * y) - 3');