In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

## Imports and Library Setup

In [None]:
import {
  createToken,
  Lexer,
  CstParser,
  CstNode,
  IToken,
  ILexingResult,
  TokenType,
} from "chevrotain";

# A Simple Calculator

In this notebook, we implement a **symbolic calculator** that can parse assignments and arithmetic expressions, evaluate them, and maintain a variable environment.

## Architectural Overview

The interpreter works in a pipeline of four stages:

1.  **Lexing (Scanning):** Transforms the raw input string into a stream of **Tokens** (e.g., `NUMBER`, `PLUS`, `IDENTIFIER`).
2.  **Parsing (CST Generation):** Organizes tokens into a **Concrete Syntax Tree (CST)** based on the grammar rules. We use the *Chevrotain* library, which implements an **LL(k)** recursive-descent parser.
3.  **AST Transformation:** A visitor traverses the CST and transforms it into a clean **Abstract Syntax Tree (AST)**. This step resolves operator precedence and associativity.
4.  **Evaluation:** A recursive evaluator traverses the AST and computes the result based on a variable environment.

## The Grammar

The grammar for the language implemented by this parser is adapted for an LL(k) parser (removing left-recursion by using iteration). It is defined as follows:

$$
\begin{array}{lcl}
  \texttt{stmnt}   & \rightarrow & \;\texttt{IDENTIFIER}\; \texttt{':='} \; \;\texttt{expr}\; \texttt{';'} \\
                   & \mid        & \;\texttt{expr}\; \texttt{';'} \\[0.2cm]   
  \texttt{expr}    & \rightarrow & \;\texttt{product}\; \bigl( (\texttt{'+'} \mid \texttt{'-'}) \; \texttt{product} \bigr)^* \\[0.2cm]
  \texttt{product} & \rightarrow & \;\texttt{factor}\; \bigl( (\texttt{'*'} \mid \texttt{'/'}) \; \texttt{factor} \bigr)^* \\[0.2cm]
  \texttt{factor}  & \rightarrow &    \texttt{'('} \; \texttt{expr} \;\texttt{')'}       \\
                   & \mid        & \;\texttt{NUMBER}                                    \\
                   & \mid        & \;\texttt{IDENTIFIER}                                
\end{array}
$$

**Note:** Unlike the recursive definition in pure BNF (e.g. `expr -> expr + product`), we use the iterative EBNF notation (using the Kleene star `*` for repetition) to denote that an expression consists of a product followed by zero or more additions/subtractions. This maps directly to the `MANY` construct used in the Chevrotain implementation.

## 1. Specification of the Scanner

We define the tokens of the calculator language using regular expressions. The scanner (lexer) reads the input string and produces a stream of tokens.

The following table summarizes the token definitions:

| Token Name | Regular Expression | Description |
| :--- | :--- | :--- |
| `NUMBER` | `(0\|[1-9][0-9]*)(\.[0-9]*)?` | Floating point or integer numbers |
| `IDENTIFIER` | `[a-zA-Z][a-zA-Z0-9_]*` | Variable names |
| `ASSIGN` | `:=` | Assignment operator |
| `Operators` | `+`, `-`, `*`, `/` | Arithmetic operators |
| `Parentheses` | `(`, `)` | Grouping symbols |
| `SEMICOLON` | `;` | Statement terminator |
| `WhiteSpace` | `[ \t\r]+` | Skipped (ignored) characters |

In [None]:
interface CalcTokens {
  NumberTok: TokenType;
  Identifier: TokenType;
  Assign: TokenType;
  Plus: TokenType;
  Minus: TokenType;
  Mul: TokenType;
  Div: TokenType;
  LParen: TokenType;
  RParen: TokenType;
  Semicolon: TokenType;
  WhiteSpace: TokenType;
}

### Function `createCalcLexer`

**Description:**
Constructs the Chevrotain Lexer instance and explicitly defines all token types.

**Output:**
An object containing:
* `allTokens`: An array of all defined `TokenType` objects.
* `tokens`: A dictionary mapping names to `TokenType` for easy access.
* `lexer`: The initialized `Lexer` instance ready to tokenize input strings.

In [None]:
function createCalcLexer(): {
  allTokens: TokenType[];
  tokens: CalcTokens;
  lexer: Lexer;
} {
  const NumberTok: TokenType = createToken({
    name: "NUMBER",
    pattern: /(0|[1-9][0-9]*)(\.[0-9]*)?/,
  });
  const Identifier: TokenType = createToken({
    name: "IDENTIFIER",
    pattern: /[a-zA-Z][a-zA-Z0-9_]*/,
  });
  const Assign: TokenType = createToken({ name: "ASSIGN", pattern: /:=/ });
  const Plus: TokenType = createToken({ name: "PLUS", pattern: /\+/ });
  const Minus: TokenType = createToken({ name: "MINUS", pattern: /-/ });
  const Mul: TokenType = createToken({ name: "MUL", pattern: /\*/ });
  const Div: TokenType = createToken({ name: "DIV", pattern: /\// });
  const LParen: TokenType = createToken({ name: "LPAREN", pattern: /\(/ });
  const RParen: TokenType = createToken({ name: "RPAREN", pattern: /\)/ });
  const Semicolon: TokenType = createToken({
    name: "SEMICOLON",
    pattern: /;/,
  });
  const WhiteSpace: TokenType = createToken({
    name: "WhiteSpace",
    pattern: /[ \t\r]+/,
    group: Lexer.SKIPPED,
  });

  const allTokens: TokenType[] = [
    WhiteSpace,
    Assign,
    Plus,
    Minus,
    Mul,
    Div,
    LParen,
    RParen,
    Semicolon,
    NumberTok,
    Identifier,
  ];

  return {
    allTokens,
    tokens: {
      NumberTok,
      Identifier,
      Assign,
      Plus,
      Minus,
      Mul,
      Div,
      LParen,
      RParen,
      Semicolon,
      WhiteSpace,
    },
    lexer: new Lexer(allTokens, { positionTracking: "onlyOffset" }),
  };
}

const {
  allTokens,
  tokens,
  lexer: CalcLexer,
}: {
  allTokens: TokenType[];
  tokens: CalcTokens;
  lexer: Lexer;
} = createCalcLexer();

### Helper Function `tokenizeCalc`

This function serves as a wrapper around the lexer for debugging and testing purposes.

**Input:**
* `input`: A source code string ($\texttt{string}$).

**Output:**
* A list of strings ($\texttt{string[]}$), representing the images of the recognized tokens.

**Error Handling:**
Throws an error if the input contains characters that do not match any token definition.

In [None]:
function tokenizeCalc(input: string): string[] {
  const lexingResult: ILexingResult = CalcLexer.tokenize(input);

  if (lexingResult.errors.length > 0) {
    throw new Error(`Lexing errors: ${lexingResult.errors[0].message}`);
  }

  return lexingResult.tokens.map((t: IToken) => t.image);
}

In [None]:
// Example:
console.log(tokenizeCalc("x := 3 + 4 * 2;"));

## 2. Specification of the Parser

We implement the parser by extending the `CstParser` class. The grammar rules are mapped to methods of this class. The parser constructs a **Concrete Syntax Tree (CST)** where every node represents a grammar rule derivation.

### Grammar Implementation

The grammar rules are implemented using the following methods:

* **`stmnt`**: Parses an assignment or an expression statement.
* **`expr`**: Parses addition and subtraction. It uses `MANY` to handle multiple operators iteratively (e.g., `a + b - c`).
* **`product`**: Parses multiplication and division. It binds stronger than `expr`.
* **`factor`**: Parses atomic units: parenthesized expressions, numbers, or identifiers.

**Input (Implicit):**
The token vector produced by the Lexer.

**Output:**
A `CstNode` representing the root of the parse tree for the respective rule.

In [None]:
class CalcParser extends CstParser {
  constructor() {
    super(allTokens);
    this.performSelfAnalysis();
  }

  // stmnt -> IDENTIFIER ASSIGN expr SEMICOLON
  //        | expr SEMICOLON
  public stmnt = this.RULE("stmnt", () => {
    this.OR([
      {
        ALT: () => {
          this.CONSUME(tokens.Identifier);
          this.CONSUME(tokens.Assign);
          this.SUBRULE(this.expr);
          this.CONSUME(tokens.Semicolon);
        },
      },
      {
        ALT: () => {
          this.SUBRULE1(this.expr);
          this.CONSUME1(tokens.Semicolon);
        },
      },
    ]);
  });

  // expr -> product (('+' | '-') product)*
  public expr = this.RULE("expr", () => {
    this.SUBRULE(this.product);
    this.MANY(() => {
      this.OR([
        { ALT: () => this.CONSUME(tokens.Plus) },
        { ALT: () => this.CONSUME(tokens.Minus) },
      ]);
      this.SUBRULE2(this.product);
    });
  });

  // product -> factor (('*' | '/') factor)*
  public product = this.RULE("product", () => {
    this.SUBRULE(this.factor);
    this.MANY(() => {
      this.OR([
        { ALT: () => this.CONSUME(tokens.Mul) },
        { ALT: () => this.CONSUME(tokens.Div) },
      ]);
      this.SUBRULE2(this.factor);
    });
  });

  // factor -> '(' expr ')' | NUMBER | IDENTIFIER
  public factor = this.RULE("factor", () => {
    this.OR([
      {
        ALT: () => {
          this.CONSUME(tokens.LParen);
          this.SUBRULE(this.expr);
          this.CONSUME(tokens.RParen);
        },
      },
      { ALT: () => this.CONSUME(tokens.NumberTok) },
      { ALT: () => this.CONSUME(tokens.Identifier) },
    ]);
  });
}

const calcParser: CalcParser = new CalcParser();
const CalcBaseVisitor = calcParser.getBaseCstVisitorConstructor();

## 3. Abstract Syntax Tree (AST) Definitions

While the CST represents the syntactic structure (including parentheses and semicolons), the **Abstract Syntax Tree (AST)** represents the logical structure of the program. We define the AST using **Discriminated Unions** in TypeScript to ensure type safety.

**Type Definitions:**

* **`Expr`**: Represents an arithmetic expression. It can be:
    * `num`: A numeric literal ($v \in \mathbb{R}$).
    * `var`: A variable reference ($x \in \Sigma$).
    * `binop`: A binary operation ($e_1 \oplus e_2$ where $\oplus \in \{+, -, *, /\}$).
* **`Stmnt`**: Represents a statement. It can be:
    * `assign`: An assignment $x := e$.
    * `expr`: A standalone expression $e$.
* **`Env`**: The environment mapping variable names to values ($\Sigma: \text{string} \to \mathbb{R}$).

In [None]:
type Expr =
  | { kind: "num"; value: number }
  | { kind: "var"; name: string }
  | { kind: "binop"; op: "+" | "-" | "*" | "/"; left: Expr; right: Expr };

type Stmnt =
  | { kind: "assign"; name: string; expr: Expr }
  | { kind: "expr"; expr: Expr };

type Env = Record<string, number>;

## 4. Visitor: CST to AST Transformation

The `CalcAstVisitor` traverses the CST and constructs the simplified AST.

### Algorithm: Handling Operator Precedence and Associativity

A naive implementation of the `expr` and `product` rules might process operator types separately (e.g., processing all `+` before `-`), which leads to incorrect associativity (e.g., `5 - 3 + 2` becoming `5 - (3 + 2)`).

To solve this, we implement a **position-aware folding algorithm** in the `expr` and `product` methods:

**Algorithm Sketch:**

Let $T = [t_0, t_1, \dots, t_n]$ be the list of operand nodes (e.g., products).
Let $Op = \{op_0, op_1, \dots, op_{n-1}\}$ be the set of all operator tokens found between the operands.

1.  **Collect**: Gather all operator tokens (e.g., both `PLUS` and `MINUS`) into a single list $L_{op}$.
2.  **Sort**: Sort $L_{op}$ based on their textual position (`startOffset`). This reconstructs the original order of operations.
3.  **Fold**: Iterate $i$ from $0$ to $n-1$:
    * Let $lhs$ be the current accumulated result (initialized with transformed $t_0$).
    * Let $rhs$ be the transformed $t_{i+1}$.
    * Let $op$ be the operator at index $i$ in the sorted list $L_{op}$.
    * Update $lhs \leftarrow \mathtt{binop}(op, lhs, rhs)$.

**Input:**
A CST Context object containing arrays of child nodes (operands) and tokens (operators).

**Output:**
A strict `Expr` or `Stmnt` AST node.

In [None]:
class CalcAstVisitor extends CalcBaseVisitor {
  constructor() {
    super();
    this.validateVisitor();
  }

  public stmnt(ctx: {
    IDENTIFIER?: IToken[];
    expr?: CstNode[];
  }): Stmnt {
    if (ctx.IDENTIFIER && ctx.IDENTIFIER.length > 0) {
      const name: string = ctx.IDENTIFIER[0].image;
      const exprNode: Expr = this.visit(ctx.expr![0]) as Expr;
      return { kind: "assign", name, expr: exprNode };
    }
    const exprNode: Expr = this.visit(ctx.expr![0]) as Expr;
    return { kind: "expr", expr: exprNode };
  }

  public expr(ctx: {
    product: CstNode[];
    PLUS?: IToken[];
    MINUS?: IToken[];
  }): Expr {
    let result: Expr = this.visit(ctx.product[0]) as Expr;

    if (ctx.product.length > 1) {
      const pluses = ctx.PLUS || [];
      const minuses = ctx.MINUS || [];
      const allOps = [...pluses, ...minuses].sort((a, b) => a.startOffset - b.startOffset);

      for (let i = 1; i < ctx.product.length; i++) {
        const right: Expr = this.visit(ctx.product[i]) as Expr;
        const operator = allOps[i - 1];

        if (operator.tokenType.name === "PLUS") {
          result = { kind: "binop", op: "+", left: result, right };
        } else {
          result = { kind: "binop", op: "-", left: result, right };
        }
      }
    }

    return result;
  }

  public product(ctx: {
    factor: CstNode[];
    MUL?: IToken[];
    DIV?: IToken[];
  }): Expr {
    let result: Expr = this.visit(ctx.factor[0]) as Expr;

    if (ctx.factor.length > 1) {
      const muls = ctx.MUL || [];
      const divs = ctx.DIV || [];
      const allOps = [...muls, ...divs].sort((a, b) => a.startOffset - b.startOffset);

      for (let i = 1; i < ctx.factor.length; i++) {
        const right: Expr = this.visit(ctx.factor[i]) as Expr;
        const operator = allOps[i - 1];

        if (operator.tokenType.name === "MUL") {
          result = { kind: "binop", op: "*", left: result, right };
        } else {
          result = { kind: "binop", op: "/", left: result, right };
        }
      }
    }

    return result;
  }

  public factor(ctx: {
    expr?: CstNode[];
    NUMBER?: IToken[];
    IDENTIFIER?: IToken[];
  }): Expr {
    if (ctx.expr && ctx.expr.length > 0) {
      return this.visit(ctx.expr[0]) as Expr;
    }
    if (ctx.NUMBER && ctx.NUMBER.length > 0) {
      const value: number = parseFloat(ctx.NUMBER[0].image);
      return { kind: "num", value };
    }
    const name: string = ctx.IDENTIFIER![0].image;
    return { kind: "var", name };
  }
}

## 5. Evaluation

We define the operational semantics of our language using two functions: `evalExpr` for expressions and `execStmnt` for statements.

### Function `evalExpr`

Evaluates an arithmetic expression tree to a numerical value.

**Input:**
* `e`: An expression node ($\mathtt{Expr}$).
* `env`: The current variable environment ($\mathtt{Env}$).

**Output:**
* A number ($\mathbb{R}$).

**Mathematical Definition:**
$$
\text{eval}(e, \sigma) = \begin{cases} 
v & \text{if } e = \text{num}(v) \\
\sigma(x) & \text{if } e = \text{var}(x) \\
\text{eval}(l, \sigma) \oplus \text{eval}(r, \sigma) & \text{if } e = \text{binop}(\oplus, l, r)
\end{cases}
$$
where $\sigma$ is the environment and $\oplus$ is the arithmetic operation corresponding to the operator.

In [None]:
function evalExpr(e: Expr, env: Env): number {
  switch (e.kind) {
    case "num":
      return e.value;
    case "var": {
      const v: number | undefined = env[e.name];
      if (v === undefined) {
        throw new Error(`Undefined variable: ${e.name}`);
      }
      return v;
    }
    case "binop": {
      const left: number = evalExpr(e.left, env);
      const right: number = evalExpr(e.right, env);
      switch (e.op) {
        case "+":
          return left + right;
        case "-":
          return left - right;
        case "*":
          return left * right;
        case "/":
          return left / right;
      }
    }
  }
}

### Function `execStmnt`

Executes a statement, potentially modifying the environment.

**Input:**
* `s`: A statement node ($\mathtt{Stmnt}$).
* `env`: The environment ($\mathtt{Env}$), passed by reference.

**Output:**
* The result of the expression ($\mathbb{R}$) or `undefined`.

**Side Effects:**
If $s$ is an assignment `assign(x, e)`, the environment is updated: $\sigma' = \sigma[x \mapsto \text{eval}(e, \sigma)]$.

In [None]:
function execStmnt(s: Stmnt, env: Env): number | undefined {
  if (s.kind === "assign") {
    const value: number = evalExpr(s.expr, env);
    env[s.name] = value;
    return value;
  }
  // Ausdrucksstatement: Wert berechnen und zurÃ¼ckgeben
  return evalExpr(s.expr, env);
}

## 6. Parsing Pipeline Interface

### Function `parseCalc`

This function orchestrates the parsing pipeline for a single statement string.

**Input:**
* `input`: A single line of source code ($\texttt{string}$).

**Output:**
* An AST node ($\texttt{Stmnt}$).

**Process:**
1.  **Tokenize** the input using `CalcLexer`. Check for lexical errors.
2.  **Parse** the tokens into a CST using `calcParser.stmnt()`. Check for syntax errors.
3.  **Visit** the CST using `CalcAstVisitor` to produce the AST.

In [None]:
function parseCalc(input: string): Stmnt {
  const lexingResult: ILexingResult = CalcLexer.tokenize(input);

  if (lexingResult.errors.length > 0) {
    throw new Error(`Lexing errors: ${lexingResult.errors[0].message}`);
  }

  calcParser.input = lexingResult.tokens;
  const cst: CstNode = calcParser.stmnt();

  if (calcParser.errors.length > 0) {
    throw new Error(`Parsing errors: ${calcParser.errors[0].message}`);
  }

  const visitor: CalcAstVisitor = new CalcAstVisitor();
  return visitor.visit(cst) as Stmnt;
}

## 7. Main Execution Loop

### Function `calc`

Simulates a REPL (Read-Eval-Print Loop) or a batch processor for the defined `inputProgram`.

**Input:**
* Implicitly uses the global `inputProgram` string.

**Output:**
* Prints the result of each statement to the console.
* Prints the final state of the environment.

**Algorithm:**
1.  Initialize an empty environment $\sigma = \{\}$.
2.  Split the input into lines.
3.  For each line:
    * Call `parseCalc` to get the AST.
    * Call `execStmnt` to evaluate and update $\sigma$.
    * Print the result.

In [None]:
// Example input program for the calculator.
// You can change this string and re-run calc().
const inputProgram: string = `
x := 5 - 2 + 3;
y := x * 40;
y + 1;
y - 31;
y := 1;
`;

In [None]:
function calc(): void {
   const env: Env = {};

  const lines: string[] = inputProgram
    .split("\n")
    .map((l: string): string => l.trim())
    .filter((l: string): boolean => l.length > 0);

  for (const line of lines) {
    try {
      const stmnt: Stmnt = parseCalc(line);
      const value: number | undefined = execStmnt(stmnt, env);
      console.log(`Input: ${line}`);
      if (value !== undefined) {
        console.log(`Result: ${value}`);
      }
    } catch (e) {
      console.error(`Error processing '${line}':`, e);
    }
  }

  console.log("Final environment:", env);
}

Run the calculator once for the given inputProgram.

In [None]:
calc();