In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

In [None]:
import { createToken, Lexer, CstParser, IToken, CstNode, TokenType } from "chevrotain";
import { instance } from "@viz-js/viz";

# Dealing with Conflicts

In this notebook, we analyze **parsing conflicts** using a simple arithmetic grammar. Specifically, we explore how an **LL(k) parser** (like *Chevrotain*) behaves when confronted with an ambiguous grammar structure compared to a bottom-up **LALR parser** (like *PLY*).

## Architectural Overview

1.  **Scanner:** Tokenizes the input into numbers and operators.
2.  **Parser:** Implements a grammar that accepts expressions like `1 + 2 * 3`.
3.  **Ambiguity Analysis:** We demonstrate that without explicit precedence rules, the parser produces a **right-associative** tree (e.g., `1 * (2 + 3)` instead of `(1 * 2) + 3`), effectively treating all operators as having the same precedence.

## Grammar Specification

### Original Ambiguous Grammar (BNF)
This grammar (used in the Python/PLY example) is ambiguous because it does not specify operator precedence:
```text
expr : expr '+' expr
     | expr '*' expr
     | NUMBER
```

### LL(k) Adapted Grammar
Recursive-descent parsers cannot handle left-recursion (`expr -> expr + ...`). Therefore, we must rewrite the grammar to be right-recursive. This structure, while parseable, leads to the right-associativity issue we want to demonstrate:
$$
\begin{array}{lcl}
  \mathrm{expr} & \rightarrow & \mathtt{NUMBER} \; \mathrm{rest}^? \\
  \mathrm{rest} & \rightarrow & (\mathtt{'+'} \mid \mathtt{'*'}) \; \mathrm{expr}
\end{array}
$$

## 1. Specification of the Scanner

We implement a minimal scanner for arithmetic expressions.

**Token Definitions:**

| Token Name | Pattern | Description |
| :--- | :--- | :--- |
| `WhiteSpace` | `[ \t\n\r]+` | Skipped. |
| `NUMBER` | `0\|[1-9][0-9]*` | Integers. |
| `Plus` | `+` | Addition operator. |
| `Mult` | `*` | Multiplication operator. |

In [None]:
const NumberTok: TokenType = createToken({ 
  name: "NUMBER", 
  pattern: /0|[1-9][0-9]*/ 
});

const Plus: TokenType = createToken({ name: "Plus", pattern: /\+/ });
const Mult: TokenType = createToken({ name: "Mult", pattern: /\*/ });

const WhiteSpace: TokenType = createToken({
  name: "WhiteSpace",
  pattern: /[ \t\n\r]+/,
  group: Lexer.SKIPPED,
});

const allTokens: TokenType[] = [WhiteSpace, Plus, Mult, NumberTok];
const ArithmeticLexer = new Lexer(allTokens);

## 2. Grammar Definition (Parser)

We implement the right-recursive grammar defined above using `CstParser`.

**Rules:**
* **`expr`**: Consumes a number. Optionally, it looks ahead to see if a `rest` part follows.
* **`rest`**: Consumes an operator (`+` or `*`) and then recursively calls `expr`.

**Ambiguity Note:**
In an LL parser, `1 + 2 + 3` is parsed as `1 + (2 + 3)` because `expr` consumes `1`, and `rest` consumes `+` and calls `expr` for the remainder. This demonstrates **Right-Associativity**.

In [None]:
class AmbiguousExprParser extends CstParser {
  constructor() {
    super(allTokens);
    this.performSelfAnalysis();
  }

  // expr -> NUMBER rest?
  public expr = this.RULE("expr", () => {
    this.CONSUME(NumberTok);
    this.OPTION(() => {
      this.SUBRULE(this.rest);
    });
  });

  // rest -> ( '+' | '*' ) expr
  public rest = this.RULE("rest", () => {
    this.OR([
      {
        ALT: () => {
          this.OR2([
            { ALT: () => this.CONSUME(Plus) },
            { ALT: () => this.CONSUME(Mult) },
          ]);
          this.SUBRULE(this.expr);
        },
      },
    ]);
  });
}

const parser = new AmbiguousExprParser();

## 3. CST Inspection Helper

To visualize the parse tree, we convert the Concrete Syntax Tree (CST) into a simplified tuple format.

To work with the parse tree (CST) safely, we define **Strict TypeScript Interfaces**. This replaces the loose tuple definitions from the Python version.

In [None]:
// --- CST Interfaces ---
interface ExprCtx {
    NUMBER: IToken[];
    rest?: CstNode[];
}

interface RestCtx {
    Plus?: IToken[];
    Mult?: IToken[];
    expr: CstNode[];
}

// --- AST Types ---
type BinaryOp = "+" | "*";

interface NumberNode {
    kind: "Number";
    value: number;
}

interface BinaryExpNode {
    kind: "BinaryExp";
    op: BinaryOp;
    left: ASTNode; 
    right: ASTNode; 
}

type ExprResult = number | [BinaryOp, ExprResult]; 

type ASTNode = number | [string, ASTNode] | [string, number, ASTNode]; 

interface ExprTree {
    operator?: string;
    left?: number | ExprTree;
    right?: number | ExprTree;
    value?: number; // Leaf
}

type ASTTuple = number | [string, number, ASTTuple];

### Function `cstToTuple`

**Input:**
* `node`: A `CstNode` from the parser.

**Output:**
* A recursive tuple structure: `[Operator, LeftOperand, RightOperand]` or a simple `number`.

**Logic:**
* Converts the textual token image of `NUMBER` to a JavaScript `number`.
* If a `rest` node exists, it constructs a tuple `[op, number, rightSide]`, effectively visualizing the right-associative structure.

To work with the parse tree (CST) safely, we define **Strict TypeScript Interfaces**. This replaces the loose tuple definitions from the Python version.

In [None]:
function cstToTuple(node: CstNode): ASTTuple | [string, ASTTuple] | null {
  if (!node || !node.name) return null;

  switch (node.name) {
    case "expr": {
        // expr children: NUMBER, rest?
        const children = node.children as unknown as ExprCtx;
        
        // Safety check (optional but good for strictness)
        if (!children.NUMBER || children.NUMBER.length === 0) {
            throw new Error("Invalid CST: Missing NUMBER in expr");
        }

        const numStr = children.NUMBER[0].image;
        const numVal = parseInt(numStr, 10);
        
        if (children.rest && children.rest.length > 0) {
            // rest returns [op, rightExpr]
            const restRes = cstToTuple(children.rest[0]) as [string, ASTTuple];
            // We construct: [op, numVal, rightExpr] -> Zeigt Rechtsassoziativität
            return [restRes[0], numVal, restRes[1]];
        }
        return numVal;
    }

    case "rest": {
        // rest children: (Plus|Mult), expr
        const children = node.children as unknown as RestCtx;

        // Safety check
        if (!children.expr || children.expr.length === 0) {
             throw new Error("Invalid CST: Missing expr in rest");
        }

        const op = children.Plus ? "+" : (children.Mult ? "*" : "?");
        const rightRes = cstToTuple(children.expr[0]) as ASTTuple;
        return [op, rightRes];
    }
  }
  return null;
}

## 4. Grammar Inspection

Unlike LALR parser generators (like Yacc/Bison/PLY) which generate a "conflict report" (Shift-Reduce / Reduce-Reduce) based on a state machine table, **LL parsers do not have shift-reduce conflicts**.

Instead, ambiguity in LL parsers manifests as **definition errors** (e.g., ambiguous alternatives).

### Function `showChevrotainParserOut`

Introspects the Chevrotain parser instance and prints the generated grammar rules and any definition errors.

**Output:**
* Prints the grammar structure.
* Prints "No grammar definition conflicts detected" because the rewritten grammar (`expr -> NUMBER rest`) is technically unambiguous to the parser—it just produces the "wrong" mathematical tree.

In [None]:
function showChevrotainParserOut(p: any) {
  const productions = p.getGAstProductions();

  let out = "";
  out += "Created by Chevrotain (LL Recursive-Descent Parser)\n\n";
  out += "Grammar\n\n";

  let ruleId = 0;
  for (const [name, rule] of Object.entries(productions)) {
    const def = Array.isArray((rule as any).definition)
      ? (rule as any).definition.map((d: any) => d.constructor.name).join(" ")
      : JSON.stringify((rule as any).definition);
      
    out += `Rule ${ruleId++} ${name} -> ${def}\n`;
  }

  out += "\nTerminals, with token patterns\n\n";
  for (const tok of allTokens) {
    const pat = typeof tok.PATTERN === "string" 
        ? tok.PATTERN 
        : tok.PATTERN?.toString().replace(/\n/g, "");
    out += `${tok.name.padEnd(20)} : ${pat}\n`;
  }

  out += "\nNonterminals\n\n";
  for (const [name] of Object.entries(productions)) {
    out += `${name}\n`;
  }

  out += "\nParsing method: LL(k) recursive-descent (Chevrotain)\n";
  out += "\n--------------------------------------------\n";

  const errors = p.definitionErrors || [];
  if (errors.length > 0) {
    out += "⚠️ Grammar Definition Warnings:\n";
    for (const e of errors) out += ` - ${e.message}\n`;
  } else {
    out += "✅ No grammar definition conflicts detected.\n";
  }

  display.text(out);
}

showChevrotainParserOut(parser);

## 5. Visualization & Testing

We generate a DOT graph to inspect the resulting AST.

### Function `test`

Parses a string and renders the AST.

**Input:**
* `s`: Expression string.

**Output:**
* An SVG visualization of the parse tree.

**Observation:**
We expect to see trees growing to the **right**. For example, `1 * 2 + 3` will be grouped as `1 * (2 + 3)`, which is mathematically incorrect for standard arithmetic (where multiplication binds stronger), but syntactically correct for our right-recursive grammar.

In [None]:
function tuple2dot(t: ASTTuple): string {
  let dot = "digraph G {\n node [shape=circle];\n";
  let counter = 0;

  function walk(node: ASTTuple | [string, ASTTuple], parent?: string): string {
    const id = "n" + counter++;
    
    // Label logic
    let label: string;
    if (Array.isArray(node)) {
        label = node[0];
    } else {
        label = String(node);
    }
    
    dot += ` ${id} [label="${label}"];\n`;
    if (parent) dot += ` ${parent} -> ${id};\n`;

    if (Array.isArray(node)) {

      if (node.length === 3) {
           walk(node[1] as number, id); // Zahl
           walk(node[2] as ASTTuple, id); // Rechter Teilbaum
      }
      else if (node.length === 2) {
          walk(node[1] as ASTTuple, id);
      }
    }
    return id;
  }

  walk(t);
  dot += "}";
  return dot;
}

async function test(s: string) {
  const lexResult = ArithmeticLexer.tokenize(s);
  parser.input = lexResult.tokens;
  
  // Strict call
  const cst = parser.expr();

  if (parser.errors.length > 0) {
    console.error("Syntax error:", parser.errors);
    return;
  }

  // CST -> Tuple (Strict Type Cast)
  const ast = cstToTuple(cst) as ASTTuple; 
  console.log(JSON.stringify(ast));

  const dot = tuple2dot(ast);
  const viz = await instance();
  const svg = await viz.renderString(dot, { format: "svg" });

  display.html(svg);
}

Parsing `1 * 2 + 3`. Expected mathematically: `(1 * 2) + 3`.
Actual result due to grammar: `1 * (2 + 3)`.

In [None]:
await test('1*2+3')

Parsing `1 + 2 + 3`. The tree grows to the right: `1 + (2 + 3)`.

In [None]:
await test('2+3+4')