In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

# Checking the Equivalence of Regular Expressions

In this notebook, we implement an algorithm to verify if two regular expressions $r_1$ and $r_2$ describe the same language, i.e., $L(r_1) = L(r_2)$.

The algorithm proceeds in three steps:
1.  **Conversion to NFA:** Convert $r_1$ and $r_2$ into non-deterministic finite state machines (NFAs).
2.  **Determinization:** Convert these into deterministic finite state machines (DFAs) $D_1$ and $D_2$.
3.  **Difference Check:** Verify that the symmetric difference of their languages is empty.
    $$L(D_1) = L(D_2) \iff (L(D_1) \setminus L(D_2) = \emptyset) \land (L(D_2) \setminus L(D_1) = \emptyset)$$

`NFA-2-DFA.ts` contains the function `nfa2dfa` that converts a non-deterministic 
*Fsm* into an equivalent deterministic *Fsm*.

The notebook `RegExp-2-NFA.ipynb` contains the function `RegExp2NFA.toNFA` that can be used to compute a non-deterministic 
<span style="font-variant:small-caps;">Fsm</span> that accepts the language described by a given regular expression.

In [None]:
import { RecursiveSet, Tuple } from "recursive-set";
import {
    State,
    Char,
    DFA,
    DFAState,
    nfa2dfa,
    key,
    TransitionKey,
} from "./01-NFA-2-DFA";
import { RegExp, RegExp2NFA } from "./03-RegExp-2-NFA";

## Type Definitions

We define a `ProductDFA` to represent the combination of two DFAs.
* **StatePair:** A tuple $(p, q)$ where $p$ is a state from the first DFA and $q$ from the second.
* **ProductDFA:** A structure where the set of states $Q$ consists of `StatePair`s.

In [None]:
type StatePair = Tuple<[DFAState, DFAState]>;

type ProductDFA = {
    Q: RecursiveSet<StatePair>;
    Sigma: RecursiveSet<Char>;
    delta: Map<TransitionKey, StatePair>;
    q0: StatePair;
    A: RecursiveSet<StatePair>;
};

**Implementation Note:**

To ensure strict type safety without cluttering the main algorithms with assertions, we use a helper function `unwrapPair`. It safely unpacks a generic `Tuple` into a typed `[DFAState, DFAState]` array.

In [None]:
function unwrapPair(pair: StatePair): [DFAState, DFAState] {
    const raw = pair.raw;
    return [raw[0] as DFAState, raw[1] as DFAState];
}

## Cartesian Product

Given two sets $A$ and $B$, the function `cartesianProduct(A, B)` (provided by `RecursiveSet`) computes:
$$A \times B := \{ (x, y) \mid x \in A \wedge y \in B \}.$$

This is the foundation for constructing the state space of our Product Automaton.

In [None]:
const testSetA = RecursiveSet.fromSortedUnsafe([1, 2]);
const testSetB = new RecursiveSet('a', 'b'); 

const cp = testSetA.cartesianProduct(testSetB);
cp;

## Constructing the Difference Automaton

The function `fsm_complement(F1, F2)` constructs a **Product Automaton** $P$ that recognizes the difference language $L(F_1) \setminus L(F_2)$.

This corresponds to the set of strings accepted by $F_1$ but **rejected** by $F_2$.

**Construction Definition:**
Given $F_1 = (Q_1, \Sigma, \delta_1, q_{01}, A_1)$ and $F_2 = (Q_2, \Sigma, \delta_2, q_{02}, A_2)$, we define $P = (Q', \Sigma, \delta', q'_0, A')$ where:

1.  **States:** $Q' = Q_1 \times Q_2$ (Pairs of states).
2.  **Start State:** $q'_0 = (q_{01}, q_{02})$.
3.  **Transitions:** The machine simulates both $F_1$ and $F_2$ in parallel:
    $$\delta'((p, q), c) = (\delta_1(p, c), \delta_2(q, c))$$
4.  **Accepting States:** We accept if $F_1$ accepts AND $F_2$ does **not** accept:
    $$A' = A_1 \times (Q_2 \setminus A_2)$$

In [None]:
function fsm_complement(F1: DFA, F2: DFA): ProductDFA {
    const newStates = F1.Q.cartesianProduct(F2.Q);

    const newDelta = new Map<TransitionKey, StatePair>();

    for (const pair of newStates) {
        const [p1, p2] = unwrapPair(pair);

        for (const c of F1.Sigma) {
            const next1 = F1.delta.get(key(p1, c));
            const next2 = F2.delta.get(key(p2, c));

            if (next1 && next2) {
                const nextPair = new Tuple(next1, next2);
                newDelta.set(key(pair, c), nextPair);
            }
        }
    }

    const startPair = new Tuple(F1.q0, F2.q0);

    const diffSet = F2.Q.difference(F2.A);
    const newAccepting = F1.A.cartesianProduct(diffSet);

    return {
        Q: newStates,
        Sigma: F1.Sigma,
        delta: newDelta,
        q0: startPair,
        A: newAccepting,
    };
}

## Conversion Wrapper

The function `regexp2DFA` is a convenience wrapper that combines the pipeline from previous notebooks:
1.  `RegExp` $\to$ `NFA` (Thompson's Construction)
2.  `NFA` $\to$ `DFA` (Subset Construction)

In [None]:
function regexp2DFA(r: RegExp, Sigma: RecursiveSet<Char>): DFA {
    const converter = new RegExp2NFA(Sigma);
    const nfa = converter.toNFA(r);
    return nfa2dfa(nfa);
}

## Emptiness Check

The function `is_empty(F)` checks whether the language accepted by a generic DFA $F$ is empty, i.e., $L(F) = \emptyset$.
This is true if and only if **no accepting state is reachable** from the start state.

**Algorithm (Reachability Analysis):**
We perform a fixed-point iteration (Breadth-First Search) to compute the set of reachable states $R$:
1.  Initialize $R_0 = \{q_0\}$.
2.  Iteratively add successors: $R_{i+1} = R_i \cup \{ \delta(q, c) \mid q \in R_i, c \in \Sigma \}$.
3.  Stop when $R_{i+1} \subseteq R_i$.

Finally, we check if $R \cap A = \emptyset$.

In [None]:
function is_empty(F: ProductDFA): boolean {
    let reachable = new RecursiveSet<StatePair>(F.q0);

    while (true) {
        const newFoundArr: StatePair[] = [];

        for (const q of reachable) {
            for (const c of F.Sigma) {
                const target = F.delta.get(key(q, c));
                if (target) {
                    newFoundArr.push(target);
                }
            }
        }

        const newFound = RecursiveSet.fromArray(newFoundArr);

        if (newFound.isSubset(reachable)) {
            break;
        }

        reachable = reachable.union(newFound);
    }

    return reachable.intersection(F.A).isEmpty();
}

## Equivalence Check

The function `regExpEquiv` puts everything together.
To prove $r_1 \equiv r_2$, we check mutual inclusion:

1.  Construct DFAs $F_1$ and $F_2$ for $r_1$ and $r_2$.
2.  Check if $L(F_1) \setminus L(F_2) = \emptyset$ (i.e., $F_1$ accepts nothing that $F_2$ rejects).
3.  Check if $L(F_2) \setminus L(F_1) = \emptyset$ (i.e., $F_2$ accepts nothing that $F_1$ rejects).

If both difference languages are empty, the automata accept exactly the same strings.

In [None]:
function regExpEquiv(
    r1: RegExp,
    r2: RegExp,
    Sigma: RecursiveSet<Char>,
): boolean {
    const F1 = regexp2DFA(r1, Sigma);
    const F2 = regexp2DFA(r2, Sigma);

    const r1MinusR2 = fsm_complement(F1, F2);
    const r2MinusR1 = fsm_complement(F2, F1);

    return is_empty(r1MinusR2) && is_empty(r2MinusR1);
}

The notebook `Test-Equivalence.ipynb` can be used to test this function.