In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

# A Recursive Parser for Arithmetic Expressions

In this notebook we implement a simple *recursive descend* parser for arithmetic expressions.
This parser will implement the following grammar:
$$
  \begin{eqnarray*}
  \mathrm{expr}        & \rightarrow & \mathrm{product}\;\;\mathrm{exprRest}            \\[0.2cm]
  \mathrm{exprRest}    & \rightarrow & \texttt{'+'} \;\;\mathrm{product}\;\;\mathrm{exprRest}   \\
                       & \mid        & \texttt{'-'} \;\;\mathrm{product}\;\;\mathrm{exprRest}   \\
                       & \mid        & \lambda                                      \\[0.2cm]
  \mathrm{product}     & \rightarrow & \mathrm{factor}\;\;\mathrm{productRest}          \\[0.2cm]
  \mathrm{productRest} & \rightarrow & \texttt{'*'} \;\;\mathrm{factor}\;\;\mathrm{productRest} \\
                       & \mid        & \texttt{'/'} \;\;\mathrm{factor}\;\;\mathrm{productRest} \\
                       & \mid        & \lambda                                      \\[0.2cm]
  \mathrm{factor}      & \rightarrow & \texttt{'('} \;\;\mathrm{expr} \;\;\texttt{')'}                \\
                       & \mid        & \texttt{NUMBER} 
  \end{eqnarray*}
$$

Additionally, we use a `RecursiveSet` to track all unique numbers encountered during evaluation.

**Note**: While the grammar is defined recursively above, Chevrotain allows us to implement the `Rest` rules more efficiently using iterative loops (`MANY`), which we will see in the parser implementation.

## Importing Libraries

We use `Chevrotain` for parsing and lexing, and `RecursiveSet` to collect the unique numbers found in the expressions.

In [None]:
import { createToken, Lexer, CstParser, IToken, ILexingResult, TokenType, CstNode } from "chevrotain";
import { RecursiveSet } from "recursive-set";

## Implementing a Scanner


The scanner (lexer) transforms the input string into a stream of tokens. In Chevrotain, we define tokens using `createToken`.

We define the following tokens:

- **WhiteSpace**: Matches spaces and tabs (ignored by the parser).
- **NumberToken**: Matches integers (including 0).
- **Operators**: `+`, `-`, `*`, `/`, `(`, `)`.

In [None]:
interface IResult {
  value: number;
  numbers: RecursiveSet<number>;
}

The `ArithmeticLexer` is initialized with our token definitions. We configure `positionTracking: "onlyOffset"` because we are parsing simple single-line strings and do not need full line/column tracking (which avoids warnings about missing line break definitions).

In [None]:
const WhiteSpace: TokenType = createToken({
  name: "WhiteSpace",
  pattern: /[ \t]+/,
  group: Lexer.SKIPPED,
  line_breaks: true,
});

// Entspricht: [1-9][0-9]*|0 (numbers)
const NumberToken: TokenType = createToken({
  name: "NumberToken",
  pattern: /[1-9][0-9]*|0/,
});

const Plus: TokenType = createToken({ name: "Plus", pattern: /\+/ });
const Minus: TokenType = createToken({ name: "Minus", pattern: /-/ });
const Multi: TokenType = createToken({ name: "Multi", pattern: /\*/ });
const Div: TokenType = createToken({ name: "Div", pattern: /\// });
const LParen: TokenType = createToken({ name: "LParen", pattern: /\(/ });
const RParen: TokenType = createToken({ name: "RParen", pattern: /\)/ });

const allTokens: TokenType[] = [
  WhiteSpace, // Wird erkannt, aber Ã¼bersprungen
  NumberToken,
  Plus,
  Minus,
  Multi,
  Div,
  LParen,
  RParen,
];

const ArithmeticLexer = new Lexer(allTokens, {
  positionTracking: "onlyOffset",
});

In [None]:
function tokenize(s: string): string[] {
  const lexingResult: ILexingResult = ArithmeticLexer.tokenize(s);

  if (lexingResult.errors.length > 0) {
    throw new Error(`Lexing errors: ${lexingResult.errors[0].message}`);
  }

  return lexingResult.tokens.map((token: IToken) => token.image);
}

In [None]:
tokenize('123 + (234 +  345 - 2**0)/7');

## Implementing the Recursive Descend Parser

We implement the grammar using a `CstParser`.

The grammar rules are mapped to class properties using `this.RULE`.

- **`expr`**: Parses a product, followed by zero or more additions/subtractions.
- **`product`**: Parses a factor, followed by zero or more multiplications/divisions.
- **`factor`**: Parses a parenthesized expression or a number.

Unlike the Python implementation which used recursive functions (`exprRest`), we use Chevrotain's `this.MANY` to handle the repetition iteratively. This creates a Concrete Syntax Tree (CST).

In [None]:
class ArithmeticParser extends CstParser {
  constructor() {
    super(allTokens);
    this.performSelfAnalysis();
  }

  // expr -> product ( ('+'|'-') product )*
  public expr = this.RULE("expr", () => {
    this.SUBRULE(this.product);
    this.MANY(() => {
      this.OR([
        { ALT: () => this.CONSUME(Plus) },
        { ALT: () => this.CONSUME(Minus) },
      ]);
      this.SUBRULE2(this.product);
    });
  });

  // product -> factor ( ('*'|'/') factor )*
  public product = this.RULE("product", () => {
    this.SUBRULE(this.factor);
    this.MANY(() => {
      this.OR([
        { ALT: () => this.CONSUME(Multi) },
        { ALT: () => this.CONSUME(Div) },
      ]);
      this.SUBRULE2(this.factor);
    });
  });

  // factor -> '(' expr ')' | NUMBER
  public factor = this.RULE("factor", () => {
    this.OR([
      {
        ALT: () => {
          this.CONSUME(LParen);
          this.SUBRULE(this.expr);
          this.CONSUME(RParen);
        },
      },
      { ALT: () => this.CONSUME(NumberToken) },
    ]);
  });
}

const parser = new ArithmeticParser();
const BaseCstVisitor = parser.getBaseCstVisitorConstructor();

To compute the result, we use the **Visitor** pattern. The visitor traverses the CST created by the parser.

- **Evaluation**: It calculates the numeric result of the expression.
- **Data Collection**: It collects all unique numbers encountered in the expression into a `RecursiveSet`.

In [None]:
class ArithmeticVisitor extends BaseCstVisitor {
  public foundNumbers: RecursiveSet<number>;

  constructor() {
    super();
    this.foundNumbers = new RecursiveSet<number>();
    this.validateVisitor();
  }

  public expr(ctx: {
    product: CstNode[];
    Plus?: IToken[];
    Minus?: IToken[];
  }): number {
    let result: number = this.visit(ctx.product[0]) as number;

    if (ctx.product.length > 1) {
      // FIX: Alle Operatoren sammeln und nach Position sortieren
      const pluses = ctx.Plus || [];
      const minuses = ctx.Minus || [];
      const allOps = [...pluses, ...minuses].sort((a, b) => a.startOffset - b.startOffset);

      for (let i = 1; i < ctx.product.length; i++) {
        const operand: number = this.visit(ctx.product[i]) as number;
        const operator = allOps[i - 1];

        if (operator.tokenType.name === "Plus") {
          result += operand;
        } else {
          result -= operand;
        }
      }
    }
    return result;
  }

  public product(ctx: {
    factor: CstNode[];
    Multi?: IToken[];
    Div?: IToken[];
  }): number {
    let result: number = this.visit(ctx.factor[0]) as number;

    if (ctx.factor.length > 1) {
      // FIX: Auch hier Operatoren sortieren
      const multis = ctx.Multi || [];
      const divs = ctx.Div || [];
      const allOps = [...multis, ...divs].sort((a, b) => a.startOffset - b.startOffset);

      for (let i = 1; i < ctx.factor.length; i++) {
        const operand: number = this.visit(ctx.factor[i]) as number;
        const operator = allOps[i - 1];

        if (operator.tokenType.name === "Multi") {
          result *= operand;
        } else {
          result /= operand;
        }
      }
    }
    return result;
  }

  public factor(ctx: { expr?: CstNode[]; NumberToken?: IToken[] }): number {
    if (ctx.expr) {
      return this.visit(ctx.expr[0]) as number;
    }
    const token: IToken = ctx.NumberToken![0];
    const val: number = parseFloat(token.image);
    this.foundNumbers.add(val);
    return val;
  }
}

This function orchestrates the entire process:

1. **Lexing**: Converts string to tokens.
2. **Parsing**: Converts tokens to CST.
3. **Visiting**: Evaluates the CST and collects numbers.

It returns an `IResult` object containing both the calculated value and the set of numbers.

In [None]:
function parse(s: string): IResult {
  const lexingResult: ILexingResult = ArithmeticLexer.tokenize(s);

  if (lexingResult.errors.length > 0) {
    throw new Error(`Lexing Errors: ${lexingResult.errors[0].message}`);
  }

  parser.input = lexingResult.tokens;
  const cst: CstNode = parser.expr();

  if (parser.errors.length > 0) {
    throw new Error(`Parsing Errors: ${parser.errors[0].message}`);
  }

  const visitor = new ArithmeticVisitor();
  const value = visitor.visit(cst) as number;

  return {
    value,
    numbers: visitor.foundNumbers,
  };
}

## Testing

We test the parser with various expressions to verify correctness and see the collected numbers.

In [None]:
function test(s: string): void {
  try {
    const result: IResult = parse(s);
    console.log(`Input: ${s}`);
    console.log(`Result: ${result.value}`);
    console.log(`Numbers: ${result.numbers.toString()}`);
    console.log("------------------------------------------------");
  } catch (e) {
    console.error(`Error parsing '${s}':`, e);
  }
}

In [None]:
test('11+22*(33-44)/(5-10*5/(4-3))')

In [None]:
test('0*11+22*(33-44)/(5-10*5/(4-3))')

In [None]:
test('5-3+2')