In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

# Term Rewriting System for Regular Expressions

In this notebook, we implement a **Term Rewriting System** (TRS) to simplify complex regular expressions.

The expressions generated by algorithms like *State Elimination* (conversion from DFA/NFA to RegExp) often contain redundancies (e.g., $R + \emptyset$, $\varepsilon \cdot R$, or $(a+\emptyset)^*$).

We define an algebraic simplification engine based on the axioms of **Regular Algebra** (Kleene Algebra).

## Data Structures: Concrete vs. Pattern

To ensure seamless integration with our existing toolchain (Parser $\to$ NFA $\to$ DFA), we reuse the class hierarchy defined in Module 03.

However, a rewriting system introduces a concept that strictly concrete regular expressions do not have: **Variables**.
A rule like $R + \emptyset \to R$ uses $R$ as a placeholder for *any* sub-expression (a tree).

1.  **Concrete RegExp:** Contains only `Char`, `Union`, `Concat`, `Star`, `0`, `ε`. (Used for NFAs).
2.  **Pattern RegExp:** Can also contain `Variable` nodes (e.g., "R", "S"). These serve as placeholders in our algebraic rules.

In [2]:
import { 
    RegExp, Variable, EmptySet, Epsilon, CharNode, Star, Concat, Union, RegExpNode 
} from "./03-RegExp-2-NFA";
type Subst = Map<string, RegExp>;
type Rule = [RegExp, RegExp];

## Pattern Matching Engine

The core of a rewriting system is **Pattern Matching**. We need to determine if a specific concrete term (e.g., `(a + 0)`) matches a defined rule pattern (e.g., `(R + 0)`).



The engine consists of four main functions:

1.  **`deepEquals`**: Recursively checks if two ASTs are structurally identical.
2.  **`match`**: Checks if a `term` matches a `pattern`.
    * If the pattern node is a **Variable**, it binds the corresponding sub-tree of the term to that variable in a `Substitution` map.
    * If the variable was already bound, it checks for consistency (e.g., for $R + R$, both sides must be identical).
3.  **`apply`**: Reconstructs the term by substituting variables in the Right-Hand-Side (RHS) of a rule with their bound values.
4.  **`rewrite`**: Combines matching and application to perform a single transformation step.

In [6]:
function deepEquals(a: RegExp, b: RegExp): boolean {
    if (a === b) return true;
    if (a.constructor !== b.constructor) return false;

    if (a instanceof CharNode && b instanceof CharNode) return a.value === b.value;
    if (a instanceof Variable && b instanceof Variable) return a.name === b.name;
    if (a instanceof Star && b instanceof Star) return deepEquals(a.inner, b.inner);
    if ((a instanceof Concat && b instanceof Concat) || (a instanceof Union && b instanceof Union))
        return deepEquals(a.left, b.left) && deepEquals(a.right, b.right);
    return true;
}

function match(pattern: RegExp, term: RegExp, substitution: Subst): boolean {
    if (pattern instanceof Variable) {
        const name = pattern.name;
        if (substitution.has(name)) {
            return deepEquals(substitution.get(name)!, term);
        } else {
            substitution.set(name, term);
            return true;
        }
    }
    if (pattern.constructor !== term.constructor) return false;
    if (pattern instanceof Star && term instanceof Star) return match(pattern.inner, term.inner, substitution);  
    if ((pattern instanceof Concat && term instanceof Concat) || (pattern instanceof Union && term instanceof Union))
        return match(pattern.left, term.left, substitution) && match(pattern.right, term.right, substitution);
    if (pattern instanceof CharNode && term instanceof CharNode)
        return pattern.value === term.value;
    return true;
}

function apply(term: RegExp, substitution: Subst): RegExp {
    if (term instanceof Variable) return substitution.has(term.name) ? substitution.get(term.name)! : term;
    if (term instanceof Star) return new Star(apply(term.inner, substitution));
    if (term instanceof Concat)
        return new Concat( apply(term.left, substitution), apply(term.right, substitution) );
    if (term instanceof Union)
        return new Union( apply(term.left, substitution), apply(term.right, substitution) );
    return term;
}

function rewrite(term: RegExp, rule: Rule): { simplified: boolean, result: RegExp } {
    const [lhs, rhs] = rule;
    const substitution: Subst = new Map();
    if (match(lhs, term, substitution))
        return { simplified: true, result: apply(rhs, substitution) };
    return { simplified: false, result: term };
}

## Algebraic Rules (Axioms) & DSL

We define the **Axioms of Regular Algebra** (Kleene Algebra) as a list of rewrite rules `LHS -> RHS`.



**Common Simplifications:**
* **Identity:** $R + \emptyset \to R$, $\varepsilon \cdot R \to R$
* **Annihilation:** $R \cdot \emptyset \to \emptyset$
* **Idempotence:** $R + R \to R$
* **Kleene Star:** $\varepsilon + R \cdot R^* \to R^*$ (Arden's Rule lemma)
* **Associativity:** $(R + S) + T \to R + (S + T)$ (Standardizing structure to the right)

### Domain Specific Language (DSL)

To define these rules compactly without writing verbose constructor calls like `new Union(new Variable("R"), new EmptySet())`, we implement a tiny helper function `T(...)`.

**DSL Syntax Mapping:**
* **Variables:** `T("R")` $\to$ `Variable("R")` (Uppercase strings)
* **Terminals:** `T("a")` $\to$ `CharNode("a")` (Lowercase strings)
* **Constants:** `T(0)` $\to$ `EmptySet`, `T("ε")` $\to$ `Epsilon`
* **Operations:** * `T(A, "*")` $\to$ `Star(A)`
    * `T(A, "+", B)` $\to$ `Union(A, B)`
    * `T(A, "⋅", B)` $\to$ `Concat(A, B)`

In [9]:
type DSLInput = RegExp | 0 | string;

function T(arg: DSLInput): RegExp;
function T(inner: DSLInput, op: '*'): RegExp;
function T(left: DSLInput, op: '+' | '⋅', right: DSLInput): RegExp;

function T(arg0: DSLInput, arg1?: string, arg2?: DSLInput): RegExp {
    if (arg1 === undefined) {
        if (arg0 instanceof RegExpNode) return arg0;
        if (arg0 === 0) return new EmptySet();
        if (arg0 === "ε") return new Epsilon();
        if (typeof arg0 === "string") {
            return (arg0.length === 1 && arg0 >= "A" && arg0 <= "Z") 
                ? new Variable(arg0) : new CharNode(arg0);
        }
        throw new Error(`Invalid Atom: ${arg0}`);
    }
    if (arg1 === '*') return new Star(T(arg0));
    if (arg2 !== undefined) {
        if (arg1 === '+') return new Union(T(arg0), T(arg2));
        if (arg1 === '⋅') return new Concat(T(arg0), T(arg2));
    }
    throw new Error(`Invalid Rule: ${arg0}, ${arg1}, ${arg2}`);
}

In [10]:
function getRules(): Rule[] {
    const rules: Rule[] = [
        [T("R", "+", 0), T("R")], 
        [T(0, "+", "R"), T("R")],
        [T("R", "+", "R"), T("R")],

        [T("ε", "+", T("R", "*")), T("R", "*")],
        [T(T("R", "*"), "+", "ε"), T("R", "*")],
        [T("ε", "+", T("R", "⋅", T("R", "*"))), T("R", "*")],
        [T("ε", "+", T(T("R", "*"), "⋅", "R")), T("R", "*")],
        [T(T("R", "⋅", T("R", "*")), "+", "ε"), T("R", "*")],
        [T(T(T("R", "*"), "⋅", "R"), "+", "ε"), T("R", "*")],

        [T("S", "+", T("S", "⋅", "T")), T("S", "⋅", T("ε", "+", "T"))],
        [T("S", "+", T("T", "⋅", "S")), T(T("ε", "+", "T"), "⋅", "S")],

        [T(0, "⋅", "R"), T(0)],
        [T("R", "⋅", 0), T(0)],
        [T("ε", "⋅", "R"), T("R")],
        [T("R", "⋅", "ε"), T("R")],

        [T(T("ε", "+", "R"), "⋅", T("R", "*")), T("R", "*")],
        [T(T("R", "+", "ε"), "⋅", T("R", "*")), T("R", "*")],
        [T(T("R", "*"), "⋅", T("R", "+", "ε")), T("R", "*")],
        [T(T("R", "*"), "⋅", T("ε", "+", "R")), T("R", "*")],

        [T(0, "*"), T("ε")],
        [T("ε", "*"), T("ε")],

        [T(T("ε", "+", "R"), "*"), T("R", "*")],
        [T(T("R", "+", "ε"), "*"), T("R", "*")],

        [T("R", "+", T("S", "+", "T")), T(T("R", "+", "S"), "+", "T")],
        [T("R", "⋅", T("S", "⋅", "T")), T(T("R", "⋅", "S"), "⋅", "T")],

        [
            T(T("R", "⋅", T("S", "*")), "⋅", T("ε", "+", "S")),
            T("R", "⋅", T("S", "*")),
        ],
    ];
    return rules;
}

## Main Simplification Algorithm

The simplification process uses a **Fixpoint Iteration** strategy combined with recursive descent.

### Algorithm `simplifyOnce`
This function performs a single pass over the AST:
1.  **Check Current Node:** It tries to apply every rule in the catalogue to the current term. If a rule matches (e.g., $R \cdot \varepsilon \to R$), it returns the transformed result immediately.
2.  **Recurse:** If no rule matches at the top level, it recurses into the children (`left`, `right`, or `inner`) to simplify sub-expressions.

### Algorithm `simplify`
Repeatedly calls `simplifyOnce` until the term stabilizes (i.e., `current == next`). This ensures that simplifications propagate correctly up the tree (e.g., simplifying a leaf node $0^* \to \varepsilon$ might trigger a subsequent parent node simplification $R \cdot \varepsilon \to R$).

In [11]:
function simplifyOnce(term: RegExp, rules: Rule[]): RegExp {
    for (const rule of rules) {
        const { simplified, result } = rewrite(term, rule);
        if (simplified) return result;
    }
    if (term instanceof Star) 
        return new Star(simplifyOnce(term.inner, rules));
    if (term instanceof Concat) 
        return new Concat(simplifyOnce(term.left, rules), simplifyOnce(term.right, rules));
    if (term instanceof Union) 
        return new Union(simplifyOnce(term.left, rules), simplifyOnce(term.right, rules));
    return term;
}

function simplify(t: RegExp): RegExp {
    const rules = getRules();
    let current = t;
    let i = 0;
    while (i++ < 1000) {
        const next = simplifyOnce(current, rules);
        if (deepEquals(current, next)) return next;
        current = next;
    }
    console.warn("Rewrite limit reached");
    return current;
}

## Pretty Printing

Finally, we convert the internal AST back into a human-readable string format.
The function `regexpToString` leverages the **View Pattern** to smartly render:
* **Atoms:** $\emptyset$ for `0`, $\varepsilon$ for `'ε'`.
* **Variables:** The variable name directly.
* **Precedence:** Parentheses are added only when necessary (e.g., `(a+b)*` needs them, `a*` does not).

In [12]:
function regexpToString(r: RegExp): string {
    if (r instanceof EmptySet) return "∅";
    if (r instanceof Epsilon) return "ε";
    if (r instanceof CharNode) return r.value;
    if (r instanceof Variable) return r.name;
    if (r instanceof Star) {
        const inner = regexpToString(r.inner);
        const needsParens = r.inner instanceof Union || r.inner instanceof Concat;
        return needsParens ? `(${inner})*` : `${inner}*`;
    }
    if (r instanceof Concat)
        return regexpToString(r.left) + regexpToString(r.right);
    if (r instanceof Union)
        return `(${regexpToString(r.left)}+${regexpToString(r.right)})`;
    return "?";
}