In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

# An EBNF based Parser for Arithmetic Expressions

In this notebook we implement an <span style="font-variant:small-caps;">Ebnf</span> recursive-descend parser for arithmetic expressions.  This parser implements the following <span style="font-variant:small-caps;">Ebnf</span> grammar:
$$
  \begin{eqnarray*}
  \mathrm{expr}    & \rightarrow & \mathrm{product}\;\;\bigl((\texttt{'+'}\;|\;\texttt{'-'})\;\; \mathrm{product}\bigr)^* \\[0.2cm]
  \mathrm{product} & \rightarrow & \mathrm{factor} \;\;\bigl((\texttt{'*'}\;|\;\texttt{'/'})\;\; \mathrm{factor}\bigr)^*  \\[0.2cm]   
  \mathrm{factor}  & \rightarrow & \texttt{'('} \;\;\mathrm{expr} \;\;\texttt{')'}                             \\
                   & \mid        & \texttt{NUMBER} 
  \end{eqnarray*}
$$

Unlike the simple recursive parser (which required helper functions like `exprRest`), EBNF allows us to express repetitions (like `+ product` or `* factor`) iteratively using loops. This maps perfectly to Chevrotain's `MANY` construct.

## 1. Imports and Setup

We use `Chevrotain` for parsing and `RecursiveSet` to track all unique numbers encountered during evaluation.

In [None]:
import {
  createToken,
  Lexer,
  CstParser,
  IToken,
  ILexingResult,
  TokenType,
  CstNode,
} from "chevrotain";
import { RecursiveSet } from "recursive-set";

## The Scanner

The scanner transforms the input string into a list of tokens. We define the tokens using regular expressions.

**Note:** We use `Lexer.SKIPPED` for whitespace to automatically filter it out, simplifying our grammar.

In [None]:
const WhiteSpace: TokenType = createToken({
  name: "WhiteSpace",
  pattern: /[ \t]+/,
  group: Lexer.SKIPPED,
});

const NumberToken: TokenType = createToken({
  name: "NumberToken",
  pattern: /[1-9][0-9]*|0/,
});

const Plus: TokenType = createToken({ name: "Plus", pattern: /\+/ });
const Minus: TokenType = createToken({ name: "Minus", pattern: /-/ });
const Multi: TokenType = createToken({ name: "Multi", pattern: /\*/ });
const Div: TokenType = createToken({ name: "Div", pattern: /\// });
const LParen: TokenType = createToken({ name: "LParen", pattern: /\(/ });
const RParen: TokenType = createToken({ name: "RParen", pattern: /\)/ });

const allTokens: TokenType[] = [
  WhiteSpace,
  NumberToken,
  Plus,
  Minus,
  Multi,
  Div,
  LParen,
  RParen,
];

const ArithmeticLexer = new Lexer(allTokens, { positionTracking: "onlyOffset" });

To mimic the behavior of the original notebook and for debugging purposes, we provide a helper function `tokenize` that returns the list of token images (strings).

In [None]:
function tokenize(s: string): string[] {
  const lexingResult: ILexingResult = ArithmeticLexer.tokenize(s);

  if (lexingResult.errors.length > 0) {
    throw new Error(`Lexing errors: ${lexingResult.errors[0].message}`);
  }

  return lexingResult.tokens.map((token: IToken) => token.image);
}

In [None]:
console.log(tokenize('12 * 13 + 14 * 4 / 6 - 7'))

## Implementing the Recursive Descend Parser

We now implement the parser class. Chevrotain allows us to translate the EBNF grammar rules directly into code using `this.MANY`.

The `expr` Rule

**EBNF:** `expr → product (('+' | '-') product)*`

This rule parses a `product`, followed by zero or more occurrences of an addition or subtraction operation and another `product`.

 The `product` Rule

**EBNF:** `product → factor (('*' | '/') factor)*`

Similar to `expr`, this rule parses a `factor`, followed by zero or more occurrences of a multiplication or division and another `factor`.

The `factor` Rule

**EBNF:** `factor → '(' expr ')' | NUMBER`

This rule handles parentheses and numbers.

In [None]:
class EbnfArithmeticParser extends CstParser {
  constructor() {
    super(allTokens);
    this.performSelfAnalysis();
  }
  // ----- expr Rule -----
  // EBNF: expr → product (('+' | '-') product)*
  public expr = this.RULE("expr", () => {
    this.SUBRULE(this.product);
    this.MANY(() => {
      this.OR([
        { ALT: () => this.CONSUME(Plus) },
        { ALT: () => this.CONSUME(Minus) },
      ]);
      this.SUBRULE2(this.product);
    });
  });
  // ----- product Rule -----
  // EBNF: product -> factor ( ('*'|'/') factor )*
  public product = this.RULE("product", () => {
    this.SUBRULE(this.factor);
    this.MANY(() => {
      this.OR([
        { ALT: () => this.CONSUME(Multi) },
        { ALT: () => this.CONSUME(Div) },
      ]);
      this.SUBRULE2(this.factor);
    });
  });
  // ----- factor Rule -----
  // EBNF: factor -> '(' expr ')' | NUMBER
  public factor = this.RULE("factor", () => {
    this.OR([
      {
        ALT: () => {
          this.CONSUME(LParen);
          this.SUBRULE(this.expr);
          this.CONSUME(RParen);
        },
      },
      { ALT: () => this.CONSUME(NumberToken) },
    ]);
  });
}

const parser = new EbnfArithmeticParser();
const BaseCstVisitor = parser.getBaseCstVisitorConstructor();

## 4. The Visitor (Evaluation)

Since the parser creates a **CST (Concrete Syntax Tree)**, we need a Visitor to traverse this tree and compute the result.

Unlike the recursive parser where evaluation happened during parsing, here we iterate over the arrays created by `MANY`.

- We also maintain a `RecursiveSet` to collect all unique numbers found in the expression.

In [None]:
interface IResult {
  value: number;
  numbers: RecursiveSet<number>;
}

class ArithmeticVisitor extends BaseCstVisitor {
  public foundNumbers: RecursiveSet<number>;

  constructor() {
    super();
    this.foundNumbers = new RecursiveSet<number>();
    this.validateVisitor();
  }

  public expr(ctx: {
    product: CstNode[];
    Plus?: IToken[];
    Minus?: IToken[];
  }): number {
    let result: number = this.visit(ctx.product[0]) as number;

    // Iteration over the arrays created by MANY
    if (ctx.product.length > 1) {
      for (let i = 1; i < ctx.product.length; i++) {
        const operand: number = this.visit(ctx.product[i]) as number;

        // wie im Notebook: Plus explizit, sonst Minus
        if (ctx.Plus && ctx.Plus[i - 1]) {
          result += operand;
        } else {
          result -= operand;
        }
      }
    }
    return result;
  }

  public product(ctx: {
    factor: CstNode[];
    Multi?: IToken[];
    Div?: IToken[];
  }): number {
    let result: number = this.visit(ctx.factor[0]) as number;

    if (ctx.factor.length > 1) {
      for (let i = 1; i < ctx.factor.length; i++) {
        const operand: number = this.visit(ctx.factor[i]) as number;

        // wie im Notebook: Multi explizit, sonst Div
        if (ctx.Multi && ctx.Multi[i - 1]) {
          result *= operand;
        } else {
          result /= operand;
        }
      }
    }
    return result;
  }

  public factor(ctx: { expr?: CstNode[]; NumberToken?: IToken[] }): number {
    if (ctx.expr) {
      return this.visit(ctx.expr[0]) as number;
    } else {
      const token: IToken = ctx.NumberToken![0];
      const val: number = parseFloat(token.image);
      this.foundNumbers.add(val);
      return val;
    }
  }
}

## 5. The Main `parse` Function

This function ties everything together:

1. Tokenize input.
2. Parse tokens into CST.
3. Visit CST to calculate result and collect numbers.

In [None]:
function parse(s: string): IResult {
  const lexingResult: ILexingResult = ArithmeticLexer.tokenize(s);

  if (lexingResult.errors.length > 0) {
    throw new Error(`Lexing Errors: ${lexingResult.errors[0].message}`);
  }

  parser.input = lexingResult.tokens;
  const cst: CstNode = parser.expr();

  if (parser.errors.length > 0) {
    throw new Error(`Parsing Errors: ${parser.errors[0].message}`);
  }

  const visitor = new ArithmeticVisitor();
  const value: number = visitor.visit(cst) as number;

  return {
    value,
    numbers: visitor.foundNumbers,
  };
}

## Testing

In [None]:
function test(s: string): void {
  try {
    // Check tokenization output
    console.log(`Tokens: [${tokenize(s).join(", ")}]`);

    // Parse and evaluate
    const result: IResult = parse(s);

    console.log(`Input: ${s}`);
    console.log(`Result: ${result.value}`);
    console.log(`Numbers: ${result.numbers.toString()}`);
    console.log("------------------------------------------------");
  } catch (e) {
    console.error(`Error processing '${s}':`, e);
  }
}

In [None]:
parse('12 * 13 + 14 * 4 / 6 - 7')

In [None]:
test('11+22*(33-44)/(5-10*5/(4-3))')

In [None]:
test('0*11+22*(33-44)/(5-10*5/(4-3))')