In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

In [None]:
import { createToken, Lexer, CstParser, IToken, CstNode, TokenType } from "chevrotain";
import { instance } from "@viz-js/viz";
// Falls du RecursiveSet hier explizit nutzen möchtest (für das Parser-Beispiel nicht zwingend, aber konsistent):
import { RecursiveSet, Tuple } from "recursive-set"; 

# Dealing with Conflicts

In this notebook, we analyze **parsing conflicts** using a simple arithmetic grammar. Specifically, we explore how an **LL(k) parser** (like Chevrotain) handles ambiguous grammars where operator precedence is not explicitly defined.

We use **Chevrotain**, a Parser Building Toolkit for TypeScript, which parses top-down (Recursive Descent). Unlike bottom-up parsers (LR) that report "Shift-Reduce" conflicts, LL parsers resolve ambiguities based on the order of alternatives.

The following grammar is *ambiguous* because it does not specify the precedence of the arithmetical operators:
```
    expr : expr '+' expr
         | expr '*' expr
         | NUMBER      
         ;
```

## Specification of the Scanner

We implement a minimal scanner for arithmetic expressions.

In [None]:
const NumberTok: TokenType = createToken({ 
  name: "NUMBER", 
  pattern: /0|[1-9][0-9]*/ 
});

const Plus: TokenType = createToken({ name: "Plus", pattern: /\+/ });
const Mult: TokenType = createToken({ name: "Mult", pattern: /\*/ });

const WhiteSpace: TokenType = createToken({
  name: "WhiteSpace",
  pattern: /[ \t\n\r]+/,
  group: Lexer.SKIPPED,
});

const allTokens: TokenType[] = [WhiteSpace, Plus, Mult, NumberTok];
const ArithmeticLexer = new Lexer(allTokens);

## Grammar Definition

We define a grammar that is structurally similar to the one in the original example. Note that we have two rules: `expr` and `rest`.

- `expr` consumes a number and optionally continues with `rest`.
- `rest` consumes an operator (`+` or `*`) and recursively calls `expr`.

This structure makes the grammar **right-associative** by default if not handled otherwise, which leads to precedence issues (e.g., `1 * 2 + 3` might be parsed as `1 * (2 + 3)`).

In [None]:
class AmbiguousExprParser extends CstParser {
  constructor() {
    super(allTokens);
    this.performSelfAnalysis();
  }

  // expr -> NUMBER rest?
  public expr = this.RULE("expr", () => {
    this.CONSUME(NumberTok);
    this.OPTION(() => {
      this.SUBRULE(this.rest);
    });
  });

  // rest -> ( '+' | '*' ) expr
  public rest = this.RULE("rest", () => {
    this.OR([
      {
        ALT: () => {
          this.OR2([
            { ALT: () => this.CONSUME(Plus) },
            { ALT: () => this.CONSUME(Mult) },
          ]);
          this.SUBRULE(this.expr);
        },
      },
    ]);
  });
}

const parser = new AmbiguousExprParser();

To work with the parse tree (CST) safely, we define **Strict TypeScript Interfaces**. This replaces the loose tuple definitions from the Python version.

We also define a helper function `cstToTuple` that converts the CST into a simplified nested tuple format `[Operator, LeftOperand, RightOperand]` for visualization purposes.

In [None]:
// --- CST Interfaces ---
interface ExprCtx {
    NUMBER: IToken[];
    rest?: CstNode[];
}

interface RestCtx {
    Plus?: IToken[];
    Mult?: IToken[];
    expr: CstNode[];
}

// --- AST Types ---
type BinaryOp = "+" | "*";

interface NumberNode {
    kind: "Number";
    value: number;
}

interface BinaryExpNode {
    kind: "BinaryExp";
    op: BinaryOp;
    left: ASTNode; 
    right: ASTNode; 
}

type ExprResult = number | [BinaryOp, ExprResult]; 

type ASTNode = number | [string, ASTNode] | [string, number, ASTNode]; 

interface ExprTree {
    operator?: string;
    left?: number | ExprTree;
    right?: number | ExprTree;
    value?: number; // Leaf
}

type ASTTuple = number | [string, number, ASTTuple]; 
function cstToTuple(node: CstNode): ASTTuple | [string, ASTTuple] | null {
  if (!node || !node.name) return null;

  switch (node.name) {
    case "expr": {
        // expr children: NUMBER, rest?
        // Fix: Double cast via unknown
        const children = node.children as unknown as ExprCtx;
        
        // Safety check (optional but good for strictness)
        if (!children.NUMBER || children.NUMBER.length === 0) {
            throw new Error("Invalid CST: Missing NUMBER in expr");
        }

        const numStr = children.NUMBER[0].image;
        const numVal = parseInt(numStr, 10);
        
        if (children.rest && children.rest.length > 0) {
            // rest returns [op, rightExpr]
            const restRes = cstToTuple(children.rest[0]) as [string, ASTTuple];
            // We construct: [op, numVal, rightExpr] -> Zeigt Rechtsassoziativität
            return [restRes[0], numVal, restRes[1]];
        }
        return numVal;
    }

    case "rest": {
        // rest children: (Plus|Mult), expr
        // Fix: Double cast via unknown
        const children = node.children as unknown as RestCtx;

        // Safety check
        if (!children.expr || children.expr.length === 0) {
             throw new Error("Invalid CST: Missing expr in rest");
        }

        const op = children.Plus ? "+" : (children.Mult ? "*" : "?");
        const rightRes = cstToTuple(children.expr[0]) as ASTTuple;
        return [op, rightRes];
    }
  }
  return null;
}

## Grammar Inspection

The function below inspects the generated parser. In an LL(k) parser like Chevrotain, "conflicts" (ambiguities) are usually resolved by the order of alternatives in `OR` rules. Unlike LR parsers, it does not generate a conflict table but will simply choose the first matching alternative.

In [None]:
function showChevrotainParserOut(p: any) {
  const productions = p.getGAstProductions();
  // allTokens ist global verfügbar

  let out = "";
  out += "Created by Chevrotain (LL Recursive-Descent Parser)\n\n";
  out += "Grammar\n\n";

  let ruleId = 0;
  for (const [name, rule] of Object.entries(productions)) {
    // Definition sicher extrahieren
    const def = Array.isArray((rule as any).definition)
      ? (rule as any).definition.map((d: any) => d.constructor.name).join(" ")
      : JSON.stringify((rule as any).definition);
      
    out += `Rule ${ruleId++} ${name} -> ${def}\n`;
  }

  out += "\nTerminals, with token patterns\n\n";
  for (const tok of allTokens) {
    const pat = typeof tok.PATTERN === "string" 
        ? tok.PATTERN 
        : tok.PATTERN?.toString().replace(/\n/g, "");
    out += `${tok.name.padEnd(20)} : ${pat}\n`;
  }

  out += "\nNonterminals\n\n";
  for (const [name] of Object.entries(productions)) {
    out += `${name}\n`;
  }

  out += "\nParsing method: LL(k) recursive-descent (Chevrotain)\n";
  out += "\n--------------------------------------------\n";

  const errors = p.definitionErrors || [];
  if (errors.length > 0) {
    out += "⚠️ Grammar Definition Warnings:\n";
    for (const e of errors) out += ` - ${e.message}\n`;
  } else {
    out += "✅ No grammar definition conflicts detected.\n";
  }

  display.text(out);
}

showChevrotainParserOut(parser);

The function `test(s)` parses a string and visualizes the resulting Abstract Syntax Tree (AST). If the parsing is successful, we generate a DOT graph.

We will see that for `1 * 2 + 3`, the parser produces a **right-associative** tree due to the grammar structure, meaning it treats it as `1 * (2 + 3)`. This demonstrates that without explicit precedence handling (or different grammar rules), the parser does not respect standard arithmetic precedence (`*` before `+`).

In [None]:
function tuple2dot(t: ASTTuple): string {
  let dot = "digraph G {\n node [shape=circle];\n";
  let counter = 0;

  function walk(node: ASTTuple | [string, ASTTuple], parent?: string): string {
    const id = "n" + counter++;
    
    // Label logic
    let label: string;
    if (Array.isArray(node)) {
        // node ist [op, number, right] oder [op, right]
        // Im 'expr' Fall: node[0] ist op.
        label = node[0];
    } else {
        label = String(node);
    }
    
    dot += ` ${id} [label="${label}"];\n`;
    if (parent) dot += ` ${parent} -> ${id};\n`;

    if (Array.isArray(node)) {
      // Rekursion für Kinder
      // Fall 1: [op, number, right] (von expr)
      if (node.length === 3) {
           walk(node[1] as number, id); // Zahl
           walk(node[2] as ASTTuple, id); // Rechter Teilbaum
      }
      // Fall 2: [op, right] (von rest) - wird eigentlich flach in expr eingebaut
      else if (node.length === 2) {
          walk(node[1] as ASTTuple, id);
      }
    }
    return id;
  }

  walk(t);
  dot += "}";
  return dot;
}

async function test(s: string) {
  const lexResult = ArithmeticLexer.tokenize(s);
  parser.input = lexResult.tokens;
  
  // Strict call
  const cst = parser.expr();

  if (parser.errors.length > 0) {
    console.error("Syntax error:", parser.errors);
    return;
  }

  // CST -> Tuple (Strict Type Cast)
  // Das Ergebnis von cstToTuple kann [string, ASTTuple] sein, aber expr liefert ASTTuple
  const ast = cstToTuple(cst) as ASTTuple; 
  console.log(JSON.stringify(ast));

  const dot = tuple2dot(ast);
  const viz = await instance();
  const svg = await viz.renderString(dot, { format: "svg" });

  display.html(svg);
}

Parsing `1 * 2 + 3`. Expected mathematically: `(1 * 2) + 3`.
Actual result due to grammar: `1 * (2 + 3)`.

In [None]:
await test('1*2+3')

Parsing `1 + 2 + 3`. The tree grows to the right: `1 + (2 + 3)`.

In [None]:
await test('2+3+4')