# A Recursive Parser for Arithmetic Expressions

In this notebook we implement a simple *recursive descend* parser for arithmetic expressions.
This parser will implement the following grammar:

$$
\begin{eqnarray*}
\mathrm{expr}         & \rightarrow & \mathrm{product}\;\;\mathrm{exprRest}           \\[0.2cm]
\mathrm{exprRest}     & \rightarrow & \texttt{'+'} \;\;\mathrm{product}\;\;\mathrm{exprRest}   \\
                      & \mid        & \texttt{'-'} \;\;\mathrm{product}\;\;\mathrm{exprRest}   \\
                      & \mid        & \lambda                                     \\[0.2cm]
\mathrm{product}      & \rightarrow & \mathrm{factor}\;\;\mathrm{productRest}           \\[0.2cm]
\mathrm{productRest} & \rightarrow & \texttt{'*'} \;\;\mathrm{factor}\;\;\mathrm{productRest} \\
                      & \mid        & \texttt{'/'} \;\;\mathrm{factor}\;\;\mathrm{productRest} \\
                      & \mid        & \lambda                                     \\[0.2cm]
\mathrm{factor}       & \rightarrow & \texttt{'('} \;\;\mathrm{expr} \;\;\texttt{')'}                 \\
                      & \mid        & \texttt{NUMBER}
\end{eqnarray*}
$$

## Implementing a Scanner

We implement a scanner using standard Regular Expressions.

The function `tokenize` receives a string `s` as argument and returns a list of string tokens. The string `s` is supposed to represent an arithmetical expression.

**Note:**
1. We use a global Regular Expression to identify numbers, operators, parenthesis, and whitespace.
2. We filter out empty strings and whitespace explicitly after matching to produce a clean list of valid tokens.

In [None]:
function tokenize(s: string): string[] {
    const lexSpec = /\s+|[1-9][0-9]*|0|[-+*/()]/g;
    const tokenList = s.match(lexSpec) || [];
    return tokenList.filter(t => t.trim() !== '');
}

In [None]:
tokenize('123 + (234 +  345 - 2**0)/7')

## Implementing the Recursive Descend Parser

To leverage the TypeScript type system and ensure type safety across the notebook cells, we introduce strict type definitions and handle the recursion structure explicitly.

### Type Definitions
We define the following types to model the parser state:

* **`TokenList`**: Represents the sequence of tokens $T = [t_1, t_2, \dots, t_n]$ that are yet to be consumed.
* **`ParseResult`**: A tuple $(v, R)$ where:
    * $v \in \mathbb{R}$ is the numerical result of the parsed expression.
    * $R$ is the remaining `TokenList` after parsing.

In [None]:
type TokenList = string[];
type ParseResult = [number, TokenList];

### Forward Declarations for Mutual Recursion
The grammar rules defined above are **mutually recursive**. For example:
* `parseExpr` calls `parseProduct`
* `parseProduct` calls `parseFactor`
* `parseFactor` calls `parseExpr` (for parenthesized expressions)

In a Jupyter Notebook environment, code is executed cell-by-cell. To ensure that functions can call each other regardless of the cell order or definition time, we **forward declare** the function signatures. This informs the TypeScript compiler about the existence and type signature of these functions before their actual implementation is assigned.

In [None]:
let parseExpr: (TL: TokenList) => ParseResult;
let parseExprRest: (sum: number, TL: TokenList) => ParseResult;
let parseProduct: (TL: TokenList) => ParseResult;
let parseProductRest: (product: number, TL: TokenList) => ParseResult;
let parseFactor: (TL: TokenList) => ParseResult;

The function `parse` serves as the entry point for the parser. It takes a raw string `s` representing the arithmetic expression.

**Input:**
* `s`: A string containing the arithmetic expression (e.g., `"3 + 4 * 5"`).

**Output:**
* Returns a `number` representing the result of the evaluated expression.

**Behavior:**
1. Tokenizes the input string.
2. Calls `parseExpr` to evaluate the expression.
3. **Asserts** that the list of remaining tokens is empty (End-of-Input check). If tokens remain, it throws an error.

In [None]:
function parse(s: string): number {
    const TL = tokenize(s);
    const [result, rest] = parseExpr(TL);

    if (rest.length > 0) {
        throw new Error(`Parse Error: could not parse remaining tokens: ${rest}`);
    }

    return result;
}

The function `parseExpr` implements the grammar rule:
$$\mathrm{expr} \rightarrow \;\mathrm{product}\;\;\mathrm{exprRest}$$

**Input:**
* `TL`: The current `TokenList` (list of strings) to be parsed.

**Output:**
* Returns a `ParseResult` tuple: `[value, Rest]`, where:
    * `value`: The numerical result of the expression.
    * `Rest`: The list of tokens remaining after parsing the expression.

In [None]:
parseExpr = function(TL: TokenList): ParseResult {
    const [product, rest] = parseProduct(TL);
    return parseExprRest(product, rest);
};

The function `parseExprRest` processes the remainder of an expression, handling addition and subtraction. It implements:

$$
\begin{eqnarray*}
\mathrm{exprRest}     & \rightarrow & \texttt{'+'} \;\;\mathrm{product}\;\;\mathrm{exprRest}   \\
                      & \mid        & \texttt{'-'} \;\;\mathrm{product}\;\;\mathrm{exprRest}   \\
                      & \mid        & \;\lambda                                    \\[0.2cm]
\end{eqnarray*}
$$

**Input:**
* `sum`: The numerical value accumulated so far (from the left-hand side).
* `TL`: The `TokenList` of remaining tokens.

**Output:**
* Returns a `ParseResult` tuple `[value, Rest]`.

**Logic:**
* If the first token is `+` or `-`, it parses the next product, updates the `sum`, and recursively calls `parseExprRest`.
* If the token does not match (or list is empty), it returns the current `sum` and the unchanged list (Lambda production).

In [None]:
parseExprRest = function(sum: number, TL: TokenList): ParseResult {
    if (TL.length === 0) {
        return [sum, []];
    }

    const [head, ...RL] = TL;

    switch (head) {
        case '+': {
            const [product, rest] = parseProduct(RL);
            return parseExprRest(sum + product, rest);
        }
        case '-': {
            const [product, rest] = parseProduct(RL);
            return parseExprRest(sum - product, rest);
        }
        default:
            return [sum, TL];
    }
};

The function `parseProduct` parses a term produced by factors multiplied or divided. It implements:
$$\mathrm{product} \rightarrow \;\mathrm{factor}\;\;\mathrm{productRest}$$

**Input:**
* `TL`: The `TokenList` to be parsed.

**Output:**
* Returns a `ParseResult` tuple `[value, Rest]`.

It first parses a single factor and then delegates the rest to `parseProductRest`.

In [None]:
parseProduct = function(TL: TokenList): ParseResult {
    const [factor, rest] = parseFactor(TL);
    return parseProductRest(factor, rest);
};

The function `parseProductRest` handles the continuation of a product (multiplication and division). It implements:

$$
\begin{eqnarray*}
\mathrm{productRest} & \rightarrow & \texttt{'*'} \;\;\mathrm{factor}\;\;\mathrm{productRest} \\
                      & \mid        & \texttt{'/'} \;\;\mathrm{factor}\;\;\mathrm{productRest} \\
                      & \mid        & \;\lambda    \\
\end{eqnarray*}
$$

**Input:**
* `product`: The numerical value of the product calculated so far.
* `TL`: The `TokenList` of remaining tokens.

**Output:**
* Returns a `ParseResult` tuple `[value, Rest]`.

**Logic:**
* Matches `*` or `/`, updates the `product` with the next factor, and recurses.
* Returns current `product` if no operator matches (Lambda production).

In [None]:
parseProductRest = function(product: number, TL: TokenList): ParseResult {
    if (TL.length === 0) {
        return [product, []];
    }

    const [head, ...RL] = TL;

    switch (head) {
        case '*': {
            const [factor, rest] = parseFactor(RL);
            return parseProductRest(product * factor, rest);
        }
        case '/': {
            const [factor, rest] = parseFactor(RL);
            return parseProductRest(product / factor, rest);
        }
        default:
            return [product, TL];
    }
};

The function `parseFactor` handles the atomic units of the expression: parenthesized sub-expressions or numbers. It implements:

$$
\begin{eqnarray*}
\mathrm{factor}       & \rightarrow & \texttt{'('} \;\;\mathrm{expr} \;\;\texttt{')'}                 \\
                      & \mid        & \;\texttt{NUMBER}
\end{eqnarray*}
$$

**Input:**
* `TL`: The `TokenList` to be parsed.

**Output:**
* Returns a `ParseResult` tuple `[value, Rest]`.

**Logic:**
1. If the first token is `(`, it recursively calls `parseExpr` to evaluate the content inside the brackets and checks for the closing `)`.
2. Otherwise, it attempts to parse the token as a floating point number.

In [None]:
parseFactor = function(TL: TokenList): ParseResult {
    const [head, ...RL] = TL;

    if (head === '(') {
        const [expr, rest] = parseExpr(RL);

        if (rest[0] !== ')') {
            throw new Error('Parse Error: expected ")"');
        }

        return [expr, rest.slice(1)];
    } else {
        return [parseFloat(head), RL];
    }
};

## Testing

We define a helper function `test` that parses a string using our `parse` function and compares it to the JavaScript native `eval` function to verify correctness.

In [None]:
function test(s: string): number {
    const r1 : number = parse(s);
    const r2 : number = eval(s);

    if (r1 !== r2) {
        throw new Error(`Assertion failed: parsed ${r1} !== eval ${r2}`);
    }

    return r1;
}

In [None]:
test('11+22*(33-44)/(5-10*5/(4-3))')

In [None]:
test('0*11+22*(33-44)/(5-10*5/(4-3))')

In [None]:
test('5-3+2')