In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

# An EBNF based Parser for Arithmetic Expressions

In this notebook we implement an <span style="font-variant:small-caps;">Ebnf</span> recursive-descend parser for arithmetic expressions.  This parser implements the following <span style="font-variant:small-caps;">Ebnf</span> grammar:

$$
\begin{eqnarray*}
\mathrm{expr}    & \rightarrow & \mathrm{product}\;\;\bigl((\texttt{'+'}\;|\;\texttt{'-'})\;\; \mathrm{product}\bigr)^* \\[0.2cm]
\mathrm{product} & \rightarrow & \mathrm{factor} \;\;\bigl((\texttt{'*'}\;|\;\texttt{'/'})\;\; \mathrm{factor}\bigr)^* \\[0.2cm]   
\mathrm{factor}  & \rightarrow & \texttt{'('} \;\;\mathrm{expr} \;\;\texttt{')'}                                     \\
                 & \mid        & \texttt{NUMBER}
\end{eqnarray*}
$$

## The Scanner

We implement a scanner using standard Regular Expressions.

The function `tokenize` receives a string `s` as argument and returns a list of tokens.
The string `s` is supposed to represent an arithmetical expression.

**Note:**
 - We use a global Regular Expression to identify numbers, operators, parenthesis.
 - We filter out empty strings/whitespace explicitly.

In [None]:
function tokenize(s: string): string[] {
    const lexSpec = /[1-9][0-9]*|0|[-+*/()]/g;
    const tokenList = s.match(lexSpec) || [];
    return tokenList.filter(t => t.trim() !== '');
}

In [None]:
tokenize('12 * 13 + 14 * 4 / 6 - 7');

## Implementing the Recursive Descend Parser

To leverage the TypeScript type system and handle the mutual recursion (specifically for parentheses where `factor` calls `expr`), we define our types and **forward declare** the function signatures first.

### Type Definitions & Forward Declarations
* **`TokenList`**: Represents the sequence of tokens yet to be consumed.
* **`ParseResult`**: A tuple `[value, Rest]` containing the numeric result and remaining tokens.

In [None]:
type TokenList = string[];
type ParseResult = [number, TokenList];

let parseExpr: (TL: TokenList) => ParseResult;
let parseProduct: (TL: TokenList) => ParseResult;
let parseFactor: (TL: TokenList) => ParseResult;

The function `parse` takes a string `s` as input and parses this string according to the recursive grammar shown above.

**Input:**
* `s`: A string containing the arithmetic expression.

**Output:**
* Returns a `number` representing the evaluated result.

**Behavior:**
1. Tokenizes the string.
2. parses the expression.
3. **Asserts** that no tokens remain.

In [None]:
function parse(s: string): number {
    const TL = tokenize(s);
    const [result, rest] = parseExpr(TL);

    if (rest.length > 0) {
        throw new Error(`Parse Error: could not parse remaining tokens: ${rest}`);
    }

    return result;
}

The function `parseExpr` implements the EBNF grammar rule:
$$\mathrm{expr} \;\rightarrow\; \mathrm{product}\;\;\bigl((\texttt{'+'}\;|\;\texttt{'-'})\;\; \mathrm{product}\bigr)^*$$

**Input:**
* `TL`: The `TokenList` to be parsed.

**Output:**
* Returns a `ParseResult` tuple `[value, Rest]`.

**Logic:**
It parses an initial `product`. Then, using a **while loop**, it continuously checks if the next token is `+` or `-`. If so, it consumes the operator, parses the next `product`, and updates the running total.

In [None]:
parseExpr = function(TL: TokenList): ParseResult {
    let [result, rest] = parseProduct(TL);

    while (rest.length > 0 && (rest[0] === '+' || rest[0] === '-')) {
        const operator = rest[0];
        const [arg, nextRest] = parseProduct(rest.slice(1));

        if (operator === '+') {
            result += arg;
        } else {
            result -= arg;
        }

        rest = nextRest;
    }

    return [result, rest];
};

The function `parseProduct` implements the EBNF grammar rule:
$$\mathrm{product} \;\rightarrow\; \mathrm{factor} \;\;\bigl((\texttt{'*'}\;|\;\texttt{'/'})\;\; \mathrm{factor}\bigr)^*$$

**Input:**
* `TL`: The `TokenList` to be parsed.

**Output:**
* Returns a `ParseResult` tuple `[value, Rest]`.

**Logic:**
Similar to `parseExpr`, it parses an initial `factor`. Then, using a **while loop**, it consumes `*` or `/` operators and subsequent factors, updating the product accordingly.

In [None]:
parseProduct = function(TL: TokenList): ParseResult {
    let [result, rest] = parseFactor(TL);

    while (rest.length > 0 && (rest[0] === '*' || rest[0] === '/')) {
        const operator = rest[0];
        const [arg, nextRest] = parseFactor(rest.slice(1));

        if (operator === '*') {
            result *= arg;
        } else {
            result /= arg;
        }

        rest = nextRest;
    }

    return [result, rest];
};

The function `parseFactor` implements the atomic grammar rules:
$$
\begin{eqnarray*}
\mathrm{factor}       & \;\rightarrow\; & \texttt{'('} \;\;\mathrm{expr} \;\;\texttt{')'}                 \\
                      & \;\mid          & \;\texttt{NUMBER}
\end{eqnarray*}
$$

**Input:**
* `TL`: The `TokenList` to be parsed.

**Output:**
* Returns a `ParseResult` tuple `[value, Rest]`.

**Logic:**
* If the first token is `(`, it recurses back to `parseExpr` and expects a closing `)`.
* Otherwise, it parses the token as a floating point number.

In [None]:
parseFactor = function(TL: TokenList): ParseResult {
    const [head, ...RL] = TL;

    if (head === '(') {
        const [expr, rest] = parseExpr(RL);

        if (rest[0] !== ')') {
            throw new Error(`ERROR: ')' expected, got ${rest[0]}`);
        }

        return [expr, rest.slice(1)];
    } else {
        return [parseFloat(head), RL];
    }
};

## Testing

We define a test function that compares our parser's result against the JavaScript `eval()` function.

In [None]:
function test(s: string): number {
    const r1 = parse(s);
    const r2 = eval(s);
    if (r1 !== r2) {
        throw new Error(`Assertion Failed: ${r1} != ${r2}`);
    }
    return r1;
}

In [None]:
test('12 * 13 + 14 * 4 / 6 - 7');

In [None]:
test('11+22*(33-44)/(5-10*5/(4-3))');

In [None]:
test('0*11+22*(33-44)/(5-10*5/(4-3))');