In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

# Checking the Equivalence of Regular Expressions

In this notebook, we implement an algorithm to verify if two regular expressions $r_1$ and $r_2$ describe the same language, i.e., $L(r_1) = L(r_2)$.

The algorithm proceeds in three steps:
1.  **Conversion to NFA:** Convert $r_1$ and $r_2$ into non-deterministic finite state machines (NFAs).
2.  **Determinization:** Convert these into deterministic finite state machines (DFAs) $D_1$ and $D_2$.
3.  **Difference Check:** Verify that the symmetric difference of their languages is empty.
    $$L(D_1) = L(D_2) \iff (L(D_1) \setminus L(D_2) = \emptyset) \land (L(D_2) \setminus L(D_1) = \emptyset)$$

`NFA-2-DFA.ts` contains the function `nfa2dfa` that converts a non-deterministic 
*Fsm* into an equivalent deterministic *Fsm*.

The notebook `RegExp-2-NFA.ipynb` contains the function `RegExp2NFA.toNFA` that can be used to compute a non-deterministic 
<span style="font-variant:small-caps;">Fsm</span> that accepts the language described by a given regular expression.

In [None]:
import { RecursiveSet, RecursiveMap, Tuple, Structural } from "recursive-set";
import {
    State,
    Char,
    DFA,
    DFAState,
    nfa2dfa,
} from "./01-NFA-2-DFA";
import { RegExp, RegExp2NFA } from "./03-RegExp-2-NFA";
import { GenericDFA } from "./FSM-2-Dot";

## Type Definitions: Generic & Product DFA

To ensure our algorithms work with **any** kind of deterministic automaton (whether states are simple numbers, sets of numbers, or complex nested structures like in the Minimized DFA), we use **Generics**.

### 1. The Generic DFA
We rely on a flexible interface `GenericDFA<S>`, where `S` represents the type of a state. The only constraint is that `S` must be **Structural** (i.e., it supports deep equality comparisons via `recursive-set`).

Although we import this interface from our library, structurally it looks like this:

```typescript
interface GenericDFA<S extends Structural> {
    Q: RecursiveSet<S>;                  // Set of States
    Σ: RecursiveSet<Char>;               // Alphabet
    δ: RecursiveMap<Tuple<[S, Char]>, S> // Transition Function
    q0: S;                               // Start State
    A: RecursiveSet<S>;                  // Accepting States
}
```

### 2. The Product DFA
For the equivalence check, we construct a product automaton where states are pairs $(p, q)$ from the two input automata.



Since the input automata might have different state types $S_1$ and $S_2$ (e.g., one is a standard DFA, the other is minimized), we define `ProductDFA` as a `GenericDFA` whose states are **Tuples** of $[S_1, S_2]$.

* **StatePair:** A tuple $(p, q)$ where $p \in S_1$ and $q \in S_2$.
* **ProductDFA:** A Generic DFA operating on these pairs.

In [None]:
type StatePair<S1 extends Structural, S2 extends Structural> = Tuple<[S1, S2]>;
type ProductDFA<S1 extends Structural, S2 extends Structural> = GenericDFA<StatePair<S1, S2>>;

## Constructing the Difference Automaton

The function `fsm_complement(F1, F2)` constructs a **Product Automaton** $P$ that recognizes the difference language $L(F_1) \setminus L(F_2)$.

This corresponds to the set of strings accepted by $F_1$ but **rejected** by $F_2$.

**Construction Definition:**
Given $F_1 = (Q_1, \Sigma, \delta_1, q_{01}, A_1)$ and $F_2 = (Q_2, \Sigma, \delta_2, q_{02}, A_2)$, we define $P = (Q', \Sigma, \delta', q'_0, A')$ where:

1.  **States:** $Q' = Q_1 \times Q_2$ (Pairs of states).
2.  **Start State:** $q'_0 = (q_{01}, q_{02})$.
3.  **Transitions:** The machine simulates both $F_1$ and $F_2$ in parallel:
    $$\delta'((p, q), c) = (\delta_1(p, c), \delta_2(q, c))$$
4.  **Accepting States:** We accept if $F_1$ accepts AND $F_2$ does **not** accept:
    $$A' = A_1 \times (Q_2 \setminus A_2)$$

In [None]:
function fsmComplement<S1 extends Structural, S2 extends Structural>( F1: GenericDFA<S1>, F2: GenericDFA<S2>): ProductDFA<S1, S2> {
    const Q = F1.Q.cartesianProduct(F2.Q);
    const δ = new RecursiveMap<Tuple<[Tuple<[S1, S2]>, Char]>, Tuple<[S1, S2]>>();
    for (const q of Q) for (const c of F1.Σ) {
        const n1 = F1.δ.get(new Tuple(q.get(0), c)), n2 = F2.δ.get(new Tuple(q.get(1), c));
        if (n1 && n2) δ.set(new Tuple(q, c), new Tuple(n1, n2));
    }
    return { Q: Q, Σ: F1.Σ, δ: δ, q0: new Tuple(F1.q0, F2.q0), A: F1.A.cartesianProduct(F2.Q.difference(F2.A)) };
}

## Conversion Wrapper

The function `regexp2DFA` is a convenience wrapper that combines the pipeline from previous notebooks:
1.  `RegExp` $\to$ `NFA` (Thompson's Construction)
2.  `NFA` $\to$ `DFA` (Subset Construction)

In [None]:
function regexp2DFA(r: RegExp, Sigma: RecursiveSet<Char>): DFA {
    const converter = new RegExp2NFA(Sigma);
    const nfa = converter.toNFA(r);
    return nfa2dfa(nfa);
}

## Emptiness Check

The function `is_empty(F)` checks whether the language accepted by a generic DFA $F$ is empty, i.e., $L(F) = \emptyset$.
This is true if and only if **no accepting state is reachable** from the start state.

**Algorithm (Reachability Analysis):**
We perform a fixed-point iteration (Breadth-First Search) to compute the set of reachable states $R$:
1.  Initialize $R_0 = \{q_0\}$.
2.  Iteratively add successors: $R_{i+1} = R_i \cup \{ \delta(q, c) \mid q \in R_i, c \in \Sigma \}$.
3.  Stop when $R_{i+1} \subseteq R_i$.

Finally, we check if $R \cap A = \emptyset$.

In [None]:
function isEmpty<S extends Structural>(F: GenericDFA<S>): boolean {
    let reachable = new RecursiveSet(F.q0);
    while (true) {
        const next = new RecursiveSet<S>();
            for (const q of reachable) for (const c of F.Σ) {
            const target = F.δ.get(new Tuple(q, c));
            if (target) next.add(target);
        }
        if (next.isSubset(reachable)) break;
        reachable = reachable.union(next);
    }
    return reachable.intersection(F.A).isEmpty();
}

## Finding a Counter-Example (Witness)

While `isEmpty` tells us **if** the languages differ, it doesn't tell us **how**. To debug non-equivalent expressions, we need to find a specific string $w$ that is in the language of the Difference Automaton (i.e., accepted by one original DFA but rejected by the other).

The function `findWitness(F)` searches for such a string.

**Implementation Note on Generics:**
Unlike `isEmpty`, which is strictly typed for the `ProductDFA`, `findWitness` uses **Generics** (`GenericDFA<S>`). This makes it a universal debugging tool for:
* **Product Automata** (Finding counter-examples).
* **Minimized DFAs** (Inspecting shortest accepted words).
* **Standard DFAs** (General verification).



**Algorithm (Breadth-First Search):**
Instead of computing the full reachable set, we perform a **Breadth-First Search (BFS)** starting from $q_0$.
* We use a `Queue` to store pairs of `(State, Word)`.
* We explore the state space layer by layer.
* The first time we hit an accepting state, the associated `Word` is guaranteed to be the **shortest witness**.

In [None]:
function findWitness<S extends Structural>(F: GenericDFA<S>): string | null {
    const queue = [{ s: F.q0, w: "" }];
    const visited = new RecursiveSet(F.q0);
    for (let i = 0; i < queue.length; i++) {
        const { s, w } = queue[i];
        if (F.A.has(s)) return w || "ε";
        for (const c of F.Σ) {
            const next = F.δ.get(new Tuple(s, c));
            if (next && !visited.has(next)) {
                visited.add(next);
                queue.push({ s: next, w: w + c });
            }
        }
    }
    return null;
}

## Equivalence Check

The function `regExpEquiv` puts everything together.
To prove $r_1 \equiv r_2$, we check mutual inclusion:

1.  Construct DFAs $F_1$ and $F_2$ for $r_1$ and $r_2$.
2.  Check if $L(F_1) \setminus L(F_2) = \emptyset$ (i.e., $F_1$ accepts nothing that $F_2$ rejects).
3.  Check if $L(F_2) \setminus L(F_1) = \emptyset$ (i.e., $F_2$ accepts nothing that $F_1$ rejects).

If both difference languages are empty, the automata accept exactly the same strings.

In [None]:
function regExpEquiv(r1: RegExp, r2: RegExp, Σ: RecursiveSet<Char>): boolean {
    const toDFA = (r: RegExp) => nfa2dfa(new RegExp2NFA(Σ).toNFA(r));
    const F1 = toDFA(r1), F2 = toDFA(r2);
    return isEmpty(fsmComplement(F1, F2)) && isEmpty(fsmComplement(F2, F1));
}

The notebook `Test-Equivalence.ipynb` can be used to test this function.