In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

# Term Rewriting System for Regular Expressions

In this notebook, we implement a **Term Rewriting System** (TRS) to simplify complex regular expressions. The expressions generated by algorithms like *State Elimination* (see Module 05) often contain redundancies (e.g., $R + \emptyset$, $\varepsilon \cdot R$).

We define an algebraic simplification engine based on the axioms of **Regular Algebra** (Kleene Algebra).

## Data Structures: Type Extension Architecture

To ensure seamless integration with our existing toolchain (Parser $\to$ NFA $\to$ DFA), we **reuse the strict `RegExp` type** defined in `03-RegExp-2-NFA`.

However, a rewriting system introduces a new concept: **Variables**.
A rule like $R + \emptyset \to R$ uses $R$ as a placeholder for *any* sub-expression. Since our strict `RegExp` type only allows characters from the alphabet $\Sigma$, we define an **Extended Type System**:

1.  **`RegExp` (Strict):** The concrete type used by NFAs (Imported from Module 03).
2.  **`PatternRegExp` (Extended):** A superset that allows both concrete `RegExp` nodes AND **Variables** (strings representing placeholders like "R", "S").

In [None]:
import { RecursiveSet, Tuple } from "recursive-set";
import { RegExp, UnaryOp, BinaryOp, EmptySet, Epsilon } from "./03-RegExp-2-NFA";

type PatternRegExp = 
  | RegExp 
  | string
  | Tuple<[PatternRegExp, UnaryOp]> 
  | Tuple<[PatternRegExp, BinaryOp, PatternRegExp]>;

type Subst = Map<string, PatternRegExp>;
type Rule = [PatternRegExp, PatternRegExp];

### The Extended View Pattern

We adapt the **View Pattern** to handle this extended type hierarchy.
Instead of manually checking if a node is a concrete `RegExp` or a variable, we use a helper function `getPatternView(r)`.

This function returns a **Discriminated Union** (`PatternView`) that categorizes every node into one of the following logic types:

* **Concrete Atoms:** `EmptySet` ($\emptyset$), `Epsilon` ($\varepsilon$), `Char` ($a, b, \dots$).
* **Variables:** `Variable` ($R, S, T \dots$) — detected by convention (single uppercase letters).
* **Composites:** `Star`, `Concat`, `Union` (recursively containing `PatternRegExp`).

$$
\begin{array}{lcl}
  \text{Pattern} & : & \text{Concrete} \mid \text{Variable} \mid \text{Composite} \\
  \text{Concrete} & : & \emptyset \mid \varepsilon \mid \text{Char} \\
  \text{Variable} & : & \text{"R"} \mid \text{"S"} \mid \dots \\
  \text{Composite} & : & [P, \text{'*'}] \mid [P, \text{Op}, P]
\end{array}
$$

In [None]:
type PatternView = 
  | { kind: 'EmptySet' }
  | { kind: 'Epsilon' }
  | { kind: 'Char',     value: string }
  | { kind: 'Variable', name: string }
  | { kind: 'Star',     inner: PatternRegExp }
  | { kind: 'Concat',   left: PatternRegExp, right: PatternRegExp }
  | { kind: 'Union',    left: PatternRegExp, right: PatternRegExp };

function getPatternView(r: PatternRegExp): PatternView {
    // 1. Primitives & Variables
    if (r === 0)   return { kind: 'EmptySet' };
    if (r === 'ε') return { kind: 'Epsilon' };
    
    if (typeof r === 'string') {
        // Convention: Uppercase single letter = Variable
        if (r.length === 1 && r >= 'A' && r <= 'Z') {
            return { kind: 'Variable', name: r };
        }
        return { kind: 'Char', value: r };
    }

    // 2. Tuples
    if (r instanceof Tuple) {
        const raw = r.raw;
        
        if (raw.length === 2 && raw[1] === '*') {
            return { kind: 'Star', inner: raw[0] as PatternRegExp };
        }

        if (raw.length === 3) {
            const left = raw[0] as PatternRegExp;
            const op   = raw[1];
            const right = raw[2] as PatternRegExp;
            
            if (op === '⋅') return { kind: 'Concat', left, right };
            if (op === '+') return { kind: 'Union', left, right };
        }
    }
    
    throw new Error(`Unknown Pattern Structure: ${r}`);
}

// Helper to build tuples easily
function T(...args: any[]): PatternRegExp {
    return new Tuple(...args) as unknown as PatternRegExp;
}

## Pattern Matching Engine

The core of a rewriting system is **Pattern Matching**. We need to determine if a specific concrete term (e.g., `(a + 0)`) matches a defined rule pattern (e.g., `(R + 0)`).

The engine consists of four main functions, all operating on `PatternRegExp`:

1.  **`deepEquals`**: Recursively checks if two ASTs are structurally identical. It uses `getPatternView` to compare logic nodes safely (e.g., ensuring a `Char 'a'` never equals a `Variable 'a'`).
2.  **`match`**: Checks if a `term` matches a `pattern`.
    * If the pattern node is a **Variable**, it binds the corresponding sub-tree of the term to that variable in a `Substitution` map.
    * If the pattern is a structure, it verifies that the term has the exact same structure and recursively matches children.
3.  **`apply`**: Reconstructs the term by replacing variables in the Right-Hand-Side (RHS) of a rule with their bound values from the substitution map.
4.  **`rewrite`**: Combines matching and application to perform a single transformation step.

In [None]:
function deepEquals(a: PatternRegExp, b: PatternRegExp): boolean {
    if (a === b) return true;

    const vA = getPatternView(a);
    const vB = getPatternView(b);

    if (vA.kind !== vB.kind) return false;

    switch (vA.kind) {
        case 'Char':     return vA.value === (vB as any).value;
        case 'Variable': return vA.name === (vB as any).name;
        case 'Star':     return deepEquals(vA.inner, (vB as any).inner);
        case 'Concat':   
        case 'Union':
            return deepEquals(vA.left, (vB as any).left) && 
                   deepEquals(vA.right, (vB as any).right);
        default: return true;
    }
}

function match(pattern: PatternRegExp, term: PatternRegExp, substitution: Subst): boolean {
    const vPat = getPatternView(pattern);

    // 1. Variable Match (The core of rewriting)
    if (vPat.kind === 'Variable') {
        const varName = vPat.name;
        if (substitution.has(varName)) {
            return deepEquals(substitution.get(varName)!, term);
        } else {
            substitution.set(varName, term);
            return true;
        }
    }

    const vTerm = getPatternView(term);

    // 2. Structure Match
    if (vPat.kind !== vTerm.kind) return false;

    switch (vPat.kind) {
        case 'Char': return vPat.value === (vTerm as any).value;
        case 'Star': return match(vPat.inner, (vTerm as any).inner, substitution);
        case 'Concat':
        case 'Union':
            return match(vPat.left, (vTerm as any).left, substitution) &&
                   match(vPat.right, (vTerm as any).right, substitution);
        default: return true;
    }
}

function apply(term: PatternRegExp, substitution: Subst): PatternRegExp {
    const v = getPatternView(term);

    if (v.kind === 'Variable') {
        return substitution.has(v.name) ? substitution.get(v.name)! : term;
    }

    if (v.kind === 'Star') {
        return new Tuple(apply(v.inner, substitution), '*') as unknown as PatternRegExp;
    }

    if (v.kind === 'Concat' || v.kind === 'Union') {
        const left = apply(v.left, substitution);
        const right = apply(v.right, substitution);
        const op = v.kind === 'Concat' ? '⋅' : '+';
        return new Tuple(left, op, right) as unknown as PatternRegExp;
    }

    return term; // Atoms
}

function rewrite(term: PatternRegExp, rule: Rule): { simplified: boolean, result: PatternRegExp } {
    const [lhs, rhs] = rule;
    const substitution: Subst = new Map();

    if (match(lhs, term, substitution)) {
        return { simplified: true, result: apply(rhs, substitution) };
    }
    return { simplified: false, result: term };
}

## Algebraic Rules (Axioms)

We define the **Axioms of Regular Algebra** (Kleene Algebra) as a list of rewrite rules `LHS -> RHS`.

**Common Simplifications:**
* **Identity:** $R + 0 \to R$, $\varepsilon \cdot R \to R$
* **Annihilation:** $R \cdot 0 \to 0$
* **Idempotence:** $R + R \to R$
* **Kleene Star:** $\varepsilon + R \cdot R^* \to R^*$ (Arden's Rule lemma)
* **Associativity:** $(R + S) + T \to R + (S + T)$ (Standardizing structure to the right)

We use the helper `T(...)` to define these rules compactly.

In [None]:
// === THE RULES ===

function getRules(): Rule[] {
    const rules: Rule[] = [
        // Addition (Identity & Idempotence)
        [T('R', '+', 0), 'R'],
        [T(0, '+', 'R'), 'R'],
        [T('R', '+', 'R'), 'R'],

        // Kleene Star & Epsilon Simplifications
        [T('ε', '+', T('R', '*')), T('R', '*')],
        [T(T('R', '*'), '+', 'ε'), T('R', '*')],
        [T('ε', '+', T('R', '⋅', T('R', '*'))), T('R', '*')],
        [T('ε', '+', T(T('R', '*'), '⋅', 'R')), T('R', '*')],
        [T(T('R', '⋅', T('R', '*')), '+', 'ε'), T('R', '*')],
        [T(T(T('R', '*'), '⋅', 'R'), '+', 'ε'), T('R', '*')],

        // Distributive Laws (Arden's Rule specifics)
        [T('S', '+', T('S', '⋅', 'T')), T('S', '⋅', T('ε', '+', 'T'))],
        [T('S', '+', T('T', '⋅', 'S')), T(T('ε', '+', 'T'), '⋅', 'S')],

        // Multiplication (Annihilator & Identity)
        [T(0, '⋅', 'R'), 0],
        [T('R', '⋅', 0), 0],
        [T('ε', '⋅', 'R'), 'R'],
        [T('R', '⋅', 'ε'), 'R'],

        // Absorption
        [T(T('ε', '+', 'R'), '⋅', T('R', '*')), T('R', '*')],
        [T(T('R', '+', 'ε'), '⋅', T('R', '*')), T('R', '*')],
        [T(T('R', '*'), '⋅', T('R', '+', 'ε')), T('R', '*')],
        [T(T('R', '*'), '⋅', T('ε', '+', 'R')), T('R', '*')],

        // Constant Kleene Stars
        [T(0, '*'), 'ε'],
        [T('ε', '*'), 'ε'],
        
        // Nested Kleene Stars
        [T(T('ε', '+', 'R'), '*'), T('R', '*')],
        [T(T('R', '+', 'ε'), '*'), T('R', '*')],

        // Associativity (Rebalancing to the right)
        [T('R', '+', T('S', '+', 'T')), T(T('R', '+', 'S'), '+', 'T')],
        [T('R', '⋅', T('S', '⋅', 'T')), T(T('R', '⋅', 'S'), '⋅', 'T')],
        
        // Complex Absorption
        [T(T('R', '⋅', T('S', '*')), '⋅', T('ε', '+', 'S')), T('R', '⋅', T('S', '*'))]
    ];
    return rules;
}

## Main Simplification Algorithm

The simplification process uses a **Fixpoint Iteration** strategy combined with recursive descent.

### Algorithm `simplifyOnce`
This function performs a single pass over the AST:
1.  **Check Current Node:** It tries to apply every rule in the catalogue to the current term. If a rule matches (e.g., $R \cdot \varepsilon \to R$), it returns the transformed result immediately.
2.  **Recurse:** If no rule matches at the top level, it recurses into the children (`left`, `right`, or `inner`) to simplify sub-expressions.

### Algorithm `simplify`
Repeatedly calls `simplifyOnce` until the term stabilizes (i.e., `current == next`). This ensures that simplifications propagate correctly up the tree (e.g., simplifying a leaf node $0^* \to \varepsilon$ might trigger a subsequent parent node simplification $R \cdot \varepsilon \to R$).

In [None]:
function simplify(t: PatternRegExp): PatternRegExp {
    const rules = getRules();
    let current = t;
    let iterations = 0;
    const MAX = 1000;

    // Fixed-Point Iteration
    while (true) {
        const next = simplifyOnce(current, rules);
        if (deepEquals(current, next)) return next;
        
        current = next;
        if (++iterations > MAX) {
            console.warn("Rewrite limit reached");
            return current;
        }
    }
}

function simplifyOnce(term: PatternRegExp, rules: Rule[]): PatternRegExp {
    // Try top-level rewrite
    for (const rule of rules) {
        const { simplified, result } = rewrite(term, rule);
        if (simplified) return result;
    }

    // Recurse
    const v = getPatternView(term);
    if (v.kind === 'Star') {
        return new Tuple(simplifyOnce(v.inner, rules), '*') as unknown as PatternRegExp;
    }
    if (v.kind === 'Concat' || v.kind === 'Union') {
        const op = v.kind === 'Concat' ? '⋅' : '+';
        return new Tuple(
            simplifyOnce(v.left, rules), 
            op, 
            simplifyOnce(v.right, rules)
        ) as unknown as PatternRegExp;
    }
    return term;
}

## Pretty Printing

Finally, we convert the internal AST back into a human-readable string format.
The function `regexpToString` leverages the **View Pattern** to smartly render:
* **Atoms:** $\emptyset$ for `0`, $\varepsilon$ for `'ε'`.
* **Variables:** The variable name directly.
* **Precedence:** Parentheses are added only when necessary (e.g., `(a+b)*` needs them, `a*` does not).

In [None]:
function regexpToString(r: PatternRegExp): string {
    const v = getPatternView(r);

    switch (v.kind) {
        case 'EmptySet': return "∅";
        case 'Epsilon':  return "ε";
        case 'Char':     return v.value;
        case 'Variable': return v.name;
        case 'Star': {
            const inner = regexpToString(v.inner);
            const vInner = getPatternView(v.inner);
            const needsParens = !(vInner.kind === 'Char' || vInner.kind === 'Variable' || vInner.kind === 'EmptySet');
            return needsParens ? `(${inner})*` : `${inner}*`;
        }
        case 'Concat':
            return regexpToString(v.left) + regexpToString(v.right);
        case 'Union':
            return `(${regexpToString(v.left)}+${regexpToString(v.right)})`;
    }
    return "?";
}