In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

# A Recursive Parser for Arithmetic Expressions

In this notebook, we implement a parser for arithmetic expressions. We use a **recursive-descent** approach, leveraging the *Chevrotain* parser combinator library. The parser not only evaluates the mathematical result but also collects all unique numbers encountered during the computation into a `RecursiveSet`.

## Architectural Overview

The parsing process follows three main steps:

1.  **Scanner (Lexer):** Tokenizes the input string into significant units (numbers, operators, parentheses).
2.  **Parser (CST):** Validates the token stream against the formal grammar and builds a **Concrete Syntax Tree (CST)**. We use an **LL(k)** grammar structure.
3.  **Visitor (Evaluator):** Traverses the CST to compute the numerical result and collect metadata (unique numbers).

## Grammar Specification

The grammar describes standard arithmetic expressions with addition, subtraction, multiplication, division, and parenthesized grouping.

To align with the iterative implementation in Chevrotain, recursive rules are expressed using iteration (Kleene star `*`):

$$
\begin{array}{lcl}
  \mathrm{expr}    & \rightarrow & \mathrm{product} \; \bigl( (\mathtt{'+'} \mid \mathtt{'-'}) \; \mathrm{product} \bigr)^* \\
  \mathrm{product} & \rightarrow & \mathrm{factor} \; \bigl( (\mathtt{'*'} \mid \mathtt{'/'}) \; \mathrm{factor} \bigr)^* \\
  \mathrm{factor}  & \rightarrow & \mathtt{'('} \; \mathrm{expr} \;\mathtt{')'} \\
                   & \mid        & \mathtt{NUMBER}
\end{array}
$$

## Importing Libraries

We use `Chevrotain` for parsing and lexing, and `RecursiveSet` to collect the unique numbers found in the expressions.

In [None]:
import { createToken, Lexer, CstParser, IToken, ILexingResult, TokenType, CstNode } from "chevrotain";
import { RecursiveSet } from "recursive-set";

## 1. Implementing the Scanner

We define the vocabulary of our language using tokens. The scanner converts the raw input string into a stream of these tokens.

**Token Definitions:**

| Token Name | Pattern | Description |
| :--- | :--- | :--- |
| `WhiteSpace` | `[ \t]+` | Spaces and tabs (skipped). |
| `NumberToken`| `0\|[1-9][0-9]*` | Integers (including 0). |
| `Operators` | `+`, `-`, `*`, `/` | Basic arithmetic operators. |
| `Parentheses`| `(`, `)` | Grouping symbols. |

In [None]:
interface IResult {
  value: number;
  numbers: RecursiveSet<number>;
}

The `ArithmeticLexer` is initialized with our token definitions. We configure `positionTracking: "onlyOffset"` because we are parsing simple single-line strings and do not need full line/column tracking (which avoids warnings about missing line break definitions).

In [None]:
const WhiteSpace: TokenType = createToken({
  name: "WhiteSpace",
  pattern: /[ \t]+/,
  group: Lexer.SKIPPED,
  line_breaks: true,
});

const NumberToken: TokenType = createToken({
  name: "NumberToken",
  pattern: /[1-9][0-9]*|0/,
});

const Plus: TokenType = createToken({ name: "Plus", pattern: /\+/ });
const Minus: TokenType = createToken({ name: "Minus", pattern: /-/ });
const Multi: TokenType = createToken({ name: "Multi", pattern: /\*/ });
const Div: TokenType = createToken({ name: "Div", pattern: /\// });
const LParen: TokenType = createToken({ name: "LParen", pattern: /\(/ });
const RParen: TokenType = createToken({ name: "RParen", pattern: /\)/ });

const allTokens: TokenType[] = [
  WhiteSpace,
  NumberToken,
  Plus,
  Minus,
  Multi,
  Div,
  LParen,
  RParen,
];

const ArithmeticLexer = new Lexer(allTokens, {
  positionTracking: "onlyOffset",
});

### Helper Function `tokenize`

A simple wrapper around the lexer for debugging purposes.

**Input:**
* `s`: The input string ($\texttt{string}$).

**Output:**
* A list of token images ($\texttt{string[]}$).

**Error Handling:**
Throws an error if the input contains invalid characters.

In [None]:
function tokenize(s: string): string[] {
  const lexingResult: ILexingResult = ArithmeticLexer.tokenize(s);

  if (lexingResult.errors.length > 0) {
    throw new Error(`Lexing errors: ${lexingResult.errors[0].message}`);
  }

  return lexingResult.tokens.map((token: IToken) => token.image);
}

In [None]:
tokenize('123 + (234 +  345 - 2**0)/7');

## 2. Implementing the Recursive Descent Parser

We implement the parser by extending the `CstParser` class. The grammar rules are mapped to methods that construct the **Concrete Syntax Tree (CST)**.

**Implementation Details:**
* **`expr`**: Handles addition and subtraction. It parses a `product` followed by zero or more `+` or `-` operations.
* **`product`**: Handles multiplication and division. It parses a `factor` followed by zero or more `*` or `/` operations.
* **`factor`**: Handles atomic elements (numbers) and parenthesized expressions.

This structure enforces standard operator precedence (multiplication before addition).

**Input:**
A token vector from the Lexer.

**Output:**
A `CstNode` representing the root of the parse tree.

In [None]:
class ArithmeticParser extends CstParser {
  constructor() {
    super(allTokens);
    this.performSelfAnalysis();
  }

  // expr -> product ( ('+'|'-') product )*
  public expr = this.RULE("expr", () => {
    this.SUBRULE(this.product);
    this.MANY(() => {
      this.OR([
        { ALT: () => this.CONSUME(Plus) },
        { ALT: () => this.CONSUME(Minus) },
      ]);
      this.SUBRULE2(this.product);
    });
  });

  // product -> factor ( ('*'|'/') factor )*
  public product = this.RULE("product", () => {
    this.SUBRULE(this.factor);
    this.MANY(() => {
      this.OR([
        { ALT: () => this.CONSUME(Multi) },
        { ALT: () => this.CONSUME(Div) },
      ]);
      this.SUBRULE2(this.factor);
    });
  });

  // factor -> '(' expr ')' | NUMBER
  public factor = this.RULE("factor", () => {
    this.OR([
      {
        ALT: () => {
          this.CONSUME(LParen);
          this.SUBRULE(this.expr);
          this.CONSUME(RParen);
        },
      },
      { ALT: () => this.CONSUME(NumberToken) },
    ]);
  });
}

const parser = new ArithmeticParser();
const BaseCstVisitor = parser.getBaseCstVisitorConstructor();

## 3. Visitor: Evaluation and Data Collection

The `ArithmeticVisitor` traverses the CST to compute the result. It implements the **Visitor Pattern**.

### Algorithm: Mixed Operator Evaluation

Since the grammar allows mixed operators (e.g., `1 + 2 - 3`) in a single rule iteration, we must ensure strict left-to-right evaluation.

**Algorithm Sketch (for `expr` and `product`):**

Let $T = [t_0, t_1, \dots, t_n]$ be the operand nodes (e.g., products).
Let $Ops = \{op_0, \dots, op_{n-1}\}$ be the operator tokens.

1.  **Collect & Sort:** Gather all operator tokens (e.g., `Plus` and `Minus`) and sort them by their textual position (`startOffset`) to reconstruct the original sequence.
2.  **Fold Left:** Initialize $result \leftarrow \text{visit}(t_0)$.
3.  **Iterate:** For $i$ from $0$ to $n-1$:
    * Let $op$ be the operator at sorted index $i$.
    * Let $operand$ be the result of visiting $t_{i+1}$.
    * Update $result$ by applying $op$ to $result$ and $operand$.

**Output:**
* Returns the computed numerical value ($\mathbb{R}$).
* Side effect: Adds found numbers to the `foundNumbers` set.

In [None]:
class ArithmeticVisitor extends BaseCstVisitor {
  public foundNumbers: RecursiveSet<number>;

  constructor() {
    super();
    this.foundNumbers = new RecursiveSet<number>();
    this.validateVisitor();
  }

  public expr(ctx: {
    product: CstNode[];
    Plus?: IToken[];
    Minus?: IToken[];
  }): number {
    let result: number = this.visit(ctx.product[0]) as number;

    if (ctx.product.length > 1) {
      const pluses = ctx.Plus || [];
      const minuses = ctx.Minus || [];
      const allOps = [...pluses, ...minuses].sort((a, b) => a.startOffset - b.startOffset);

      for (let i = 1; i < ctx.product.length; i++) {
        const operand: number = this.visit(ctx.product[i]) as number;
        const operator = allOps[i - 1];

        if (operator.tokenType.name === "Plus") {
          result += operand;
        } else {
          result -= operand;
        }
      }
    }
    return result;
  }

  public product(ctx: {
    factor: CstNode[];
    Multi?: IToken[];
    Div?: IToken[];
  }): number {
    let result: number = this.visit(ctx.factor[0]) as number;

    if (ctx.factor.length > 1) {
      const multis = ctx.Multi || [];
      const divs = ctx.Div || [];
      const allOps = [...multis, ...divs].sort((a, b) => a.startOffset - b.startOffset);

      for (let i = 1; i < ctx.factor.length; i++) {
        const operand: number = this.visit(ctx.factor[i]) as number;
        const operator = allOps[i - 1];

        if (operator.tokenType.name === "Multi") {
          result *= operand;
        } else {
          result /= operand;
        }
      }
    }
    return result;
  }

  public factor(ctx: { expr?: CstNode[]; NumberToken?: IToken[] }): number {
    if (ctx.expr) {
      return this.visit(ctx.expr[0]) as number;
    }
    const token: IToken = ctx.NumberToken![0];
    const val: number = parseFloat(token.image);
    this.foundNumbers.add(val);
    return val;
  }
}

## 4. Main Parsing Interface

### Function `parse`

This function orchestrates the entire pipeline: Lexing $\rightarrow$ Parsing $\rightarrow$ Visiting.

**Input:**
* `s`: The arithmetic expression string ($\texttt{string}$).

**Output:**
* An `IResult` object containing:
    * `value`: The computed result ($\mathbb{R}$).
    * `numbers`: A set of all unique numbers found in the expression.

**Error Handling:**
Catches and re-throws lexical and parsing errors with descriptive messages.

In [None]:
function parse(s: string): IResult {
  const lexingResult: ILexingResult = ArithmeticLexer.tokenize(s);

  if (lexingResult.errors.length > 0) {
    throw new Error(`Lexing Errors: ${lexingResult.errors[0].message}`);
  }

  parser.input = lexingResult.tokens;
  const cst: CstNode = parser.expr();

  if (parser.errors.length > 0) {
    throw new Error(`Parsing Errors: ${parser.errors[0].message}`);
  }

  const visitor = new ArithmeticVisitor();
  const value = visitor.visit(cst) as number;

  return {
    value,
    numbers: visitor.foundNumbers,
  };
}

## 5. Testing

### Function `test`

Helper function to run test cases and print results.

**Input:**
* `s`: Expression string to test.

**Output:**
* Prints the input, the calculated result, and the set of found numbers to the console.

In [None]:
function test(s: string): void {
  try {
    const result: IResult = parse(s);
    console.log(`Input: ${s}`);
    console.log(`Result: ${result.value}`);
    console.log(`Numbers: ${result.numbers.toString()}`);
    console.log("------------------------------------------------");
  } catch (e) {
    console.error(`Error parsing '${s}':`, e);
  }
}

In [None]:
test('11+22*(33-44)/(5-10*5/(4-3))')

In [None]:
test('0*11+22*(33-44)/(5-10*5/(4-3))')

In [None]:
test('5-3+2')