In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

# Converting a Deterministic <span style="font-variant:small-caps;">Fsm</span> into a Regular Expression

This notebook implements the algorithm to convert a given DFA back into an equivalent regular expression. This is based on the **State Elimination** method (generalized here via the recursive `rpq` path function).

## Data Structures and Imports

To maintain strict type safety, we import the core definitions from our previous modules.

**Crucially**, we use `TransRelDet` because in a DFA, the transition relation maps to a single resulting state (which itself is a `DFAState` / `RecursiveSet`).

- **`01-NFA-2-DFA`**: For `DFA`, `DFAState` (the sets), and `TransRelDet`.
- **`03-RegExp-2-NFA`**: For the `RegExp` AST and aliases.

In [None]:
import { RecursiveSet, RecursiveMap, Tuple } from "recursive-set";
import { DFA, DFAState, State, Char, TransRelDet } from "./01-NFA-2-DFA";
import {
    RegExp,
    EmptySet,
    Epsilon,
    CharNode,
    Star,
    Concat,
    Union
} from "./03-RegExp-2-NFA";

The function `regexp_sum` takes a set $S = \{ r_1, \cdots, r_n \}$ of regular expressions
as its argument.  It returns the regular expression 
$$ r_1 + \cdots + r_n. $$
The regular expression will be represented as a nested tuple that uses the operators `+` (for alternatives), `&` (for concatenations), and `*` (for repetitions).


In [None]:
function regexpSum(S: RecursiveSet<RegExp> | RegExp[]): RegExp {
    const [head, ...tail] = S instanceof RecursiveSet ? [...S] : S;
    if (!head) return new EmptySet();
    if (tail.length === 0) return head;
    return new Union(head, regexpSum(tail));
}

### State Elimination Algorithm

The function `rpq` assumes there is some <span style="font-variant:small-caps;">Fsm</span>
$$F = \langle \texttt{States}, \Sigma, \delta, \texttt{q0}, \texttt{Accepting} \rangle$$
given and takes five arguments:
* `p1` and `p2` are states of the <span style="font-variant:small-caps;">Fsm</span> $F$,
* $\Sigma$ is the alphabet of the <span style="font-variant:small-caps;">Fsm</span>,
* $\delta$ is the transition function of the <span style="font-variant:small-caps;">Fsm</span> $F$, and
* `Allowed` is a subset of the set `States`. On recursive calls, `Allowed` is a list of states.

The function `rpq` computes a regular expression that describes those strings that take the <span style="font-variant:small-caps;">Fsm</span> $F$ from the state `p1` to state `p2`.
When $F$ switches states from `p1` to `p2`, only states in the set `Allowed` may be visited in-between the states `p1` and `p2`.



The function is defined by recursion on the set `Allowed`. There are two cases:

#### 1. Base Case: $\texttt{Allowed} = \emptyset$
Define `AllChars` as the set of all characters that when read by $F$ in the state `p1` cause $F$ to enter the state `p2`:
$$\texttt{AllChars} = \{ c \in \Sigma \mid \delta(p_1, c) = p_2 \}$$

Then we need a further case distinction:
* **If $p_1 = p_2$:** In this case we have:
  $$ \texttt{rpq}(p_1, p_2, \emptyset) := \sum\limits_{c\in\texttt{AllChars}} c \quad + \varepsilon$$
  If $\texttt{AllChars} = \emptyset$, the sum is to be interpreted as the regular expression $\emptyset$.
  Otherwise, if $\texttt{AllChars} = \{c_1,\cdots,c_n\}$, we have $\sum c = c_1 + \cdots + c_n$.

* **If $p_1 \neq p_2$:** In this case we have:
  $$ \texttt{rpq}(p_1, p_2, \emptyset) := \sum\limits_{c\in\texttt{AllChars}} c \quad$$

#### 2. Recursive Step: $\texttt{Allowed} = \{ q \} \cup \texttt{RestAllowed}$
In this case we recursively define the following variables:
1. $\texttt{rp1p2} := \texttt{rpq}(p_1, p_2, \Sigma, \delta, \texttt{RestAllowed})$
2. $\texttt{rp1q } := \texttt{rpq}(p_1, q, \Sigma, \delta, \texttt{RestAllowed})$
3. $\texttt{rqq }\texttt{ } := \texttt{rpq}(q, q, \Sigma, \delta, \texttt{RestAllowed})$
4. $\texttt{rqp2 } := \texttt{rpq}(q, p_2, \Sigma, \delta, \texttt{RestAllowed})$

Then we can define:
$$\texttt{rpq}(p_1, p_2, \texttt{Allowed}) := \texttt{rp1p2} + \texttt{rp1q} \cdot \texttt{rqq}^* \cdot \texttt{rqp2}$$

This formula can be understood as follows: If a string $w$ is read in state $p_1$ and reading this string takes the <span style="font-variant:small-caps;">Fsm</span> $F$ from state $p_1$ to state $p_2$ while only visiting states from `Allowed` in-between, then there are two cases:
* Reading $w$ **does not visit** the state $q$ in-between. Hence the string $w$ can be described by `rp1p2`.
* The string $w$ can be written as $w = t u_1 \cdots u_n v$ where:
    * reading $t$ takes $F$ from $p_1$ to $q$,
    * reading $u_i$ takes $F$ from $q$ back to $q$ (loop), and
    * reading $v$ takes $F$ from $q$ to $p_2$.

In [None]:
function rpq( p1: DFAState, p2: DFAState, Sigma: RecursiveSet<Char>, delta: TransRelDet, Allowed: readonly DFAState[] ): RegExp {
    if (Allowed.length === 0) {
        const allChars = [...Sigma]
            .filter(c => delta.get(new Tuple(p1, c))?.equals(p2))
            .map(c => new CharNode(c));
        const r = regexpSum(allChars);
        return p1.equals(p2) ? new Union(new Epsilon(), r) : r;
    }
    const [q, ...rest] = Allowed;
    const rp1p2 = rpq(p1, p2, Sigma, delta, rest);
    const rp1q  = rpq(p1, q,  Sigma, delta, rest);
    const rqq   = rpq(q,  q,  Sigma, delta, rest);
    const rqp2  = rpq(q,  p2, Sigma, delta, rest);    
    return new Union(rp1p2, new Concat(new Concat(rp1q, new Star(rqq)), rqp2));
}

The function `dfa_2_regexp` takes a deterministic <span style="font-variant:small-caps;">Fsm</span> $F$ and computes a regular expression $r$ that describes the same language as $F$, i.e. we have
$$ L(A) = L(r). $$

In [None]:
function dfa2regexp(F: DFA): RegExp {
    return regexpSum([...F.A].map(p => rpq(F.q0, p, F.Σ, F.δ, [...F.Q])));
}

The notebook `06-Test-DFA-2-RegExp.ipynb` provides a test for the function `dfa_2_regexp`.