# EBNF Parser with AST Vizualization

Before we dive into automated parser generators, it is crucial to understand how parsing works "under the hood".

In this notebook, we implement a **Recursive Descent Parser** for arithmetic expressions manually. This gives us full control over the parsing process and helps us understand the relationship between **Grammar Rules** and **Recursive Functions**.

We implement the following <span style="font-variant:small-caps;">Ebnf</span> grammar:

$$
\begin{eqnarray*}
\mathrm{expr}    & \rightarrow & \mathrm{product}\;\;\bigl((\texttt{'+'}\;|\;\texttt{'-'})\;\; \mathrm{product}\bigr)^* \\[0.2cm]
\mathrm{product} & \rightarrow & \mathrm{factor} \;\;\bigl((\texttt{'*'}\;|\;\texttt{'/'})\;\; \mathrm{factor}\bigr)^* \\[0.2cm]   
\mathrm{factor}  & \rightarrow & \texttt{'('} \;\;\mathrm{expr} \;\;\texttt{')'}                                     \\
                 & \mid        & \texttt{NUMBER} \\
                 & \mid        & \texttt{VARIABLE}
\end{eqnarray*}
$$

Instead of just calculating the result, this parser constructs an **Abstract Syntax Tree (AST)**.

## The Scanner

The scanner is extended to also recognize variables (starting with a letter).

**Input:**
* `s`: A string containing the arithmetic expression.

**Output:**
* Returns a `string[]` of tokens, filtering out whitespace.

In [None]:
function tokenize(s: string): string[] {
    const lexSpec = /[a-z][a-z0-9]*|[1-9][0-9]*|0|[-+*/()]/g;
    const tokenList = s.match(lexSpec) || [];
    return tokenList.filter(t => t.trim() !== '');
}

In [None]:
tokenize('12 * x + y1 * 4 / 6 - z3');

## Defining the Universal AST Structure

To ensure compatibility with our visualization tools and future, more complex interpreters, we define a **universal AST type**.
Instead of using Classes, we use a functional approach based on **Arrays (Tuples)**.

This definition might look slightly "overpowered" for simple arithmetic (which only needs binary operators), but it is designed to support future language features like loops (`FOR`, `WHILE`) which require up to 4 child nodes.

*   **Leaf Nodes:** Numbers (`number`) and Variables (`string`).
*   **Inner Nodes:** Tuples starting with an `Operator` (Tag), followed by 1 to 4 child nodes.

In [None]:
type Operator = string;
type AST = 
    | string                          // Variable
    | number                          // Literal
    | [Operator, AST]                 // Unary (e.g. -x)
    | [Operator, AST, AST]            // Binary (e.g. x + y)
    | [Operator, AST, AST, AST]       // Ternary (e.g. IF/ELSE)
    | [Operator, AST, AST, AST, AST]; // Quaternary (e.g. FOR loops)

## The Recursive Parser

We implement one function for each non-terminal in our grammar (`expr`, `product`, `factor`).

In [None]:
type TokenList = string[];
type ParseResult = [AST, TokenList];

let parseExpr: (TL: TokenList) => ParseResult;
let parseProduct: (TL: TokenList) => ParseResult;
let parseFactor: (TL: TokenList) => ParseResult;

In [None]:
function parse(s: string): AST {
    const TL = tokenize(s);
    if (TL.length === 0) return ""; 
    const [result, rest] = parseExpr(TL);
    if (rest.length > 0)
        throw new Error(`Parse Error: remaining tokens: ${rest.join(" ")}`);    
    return result;
}


### The Product & Expression Parsers
These functions handle binary operations.
When we find an operator, we wrap the result in a Tuple `[op, left, right]`. This matches the `[Operator, AST, AST]` case of our universal Type definition.

In [None]:
parseProduct = function(TL: TokenList): ParseResult {
    let [result, rest] = parseFactor(TL);

    while (rest.length > 0 && (rest[0] === '*' || rest[0] === '/')) {
        const operator = rest[0];
        const [right, nextRest] = parseFactor(rest.slice(1));
        result = [operator, result, right];
        rest = nextRest;
    }
    return [result, rest];
};

parseExpr = function(TL: TokenList): ParseResult {
    let [result, rest] = parseProduct(TL);

    while (rest.length > 0 && (rest[0] === '+' || rest[0] === '-')) {
        const operator = rest[0];
        const [right, nextRest] = parseProduct(rest.slice(1));
        result = [operator, result, right];
        rest = nextRest;
    }
    return [result, rest];
};

### The Factor Parser (Atoms)
The `factor` rule handles the base cases. Note how we return a tuple `[expr, ...]` (where `expr` fits our `AST` type) and the remaining tokens.

In [None]:
parseFactor = function(TL: TokenList): ParseResult {
    const [head, ...RL] = TL;

    if (head === undefined) throw new Error("Unexpected end of input");

    // Case 1: Parentheses
    if (head === '(') {
        const [expr, rest] = parseExpr(RL);
        if (rest[0] !== ')') {
            throw new Error(`ERROR: ')' expected, got ${rest[0]}`);
        }
        return [expr, rest.slice(1)];
    } 
    // Case 2: Number
    else if (!isNaN(Number(head))) {
        return [parseFloat(head), RL];
    } 
    // Case 3: Variable
    else {
        return [head, RL];
    }
};

## Drawing Abstract Syntax Trees with GraphViz

We use `@viz-js/viz` to render the AST directly in the notebook without needing external file system calls.

The function `ast2dot` converts our recursive AST tuple into a DOT language string.

In [None]:
import { ast2dot } from "./AST2Dot";
import { instance } from "@viz-js/viz";
import { display } from "tslab";
const viz = await instance();

## Testing

The `visualize` function parses an expression and renders the resulting SVG immediately.

In [None]:
function visualize(s: string): void {
    try {
        const tree: AST = parse(s);
        const dotString : string = ast2dot(tree);
        display.html(viz.renderString(dotString, { format: "svg" }));
    } catch (e) {
        console.error(e);
    }
}

In [None]:
visualize('12 * y * x + 14 * z / 6 - x');

In [None]:
visualize('2 * x + y * y - z / (x * x + y * y) - 3');