In [1]:
from IPython.core.display import HTML
with open ("style.css", "r") as file:
    css = file.read()
HTML(css)

# Converting a Deterministic <span style="font-variant:small-caps;">Fsm</span> into a Regular Expression

Given a set `S`, the function `arb(S)` returns an arbitrary member from `S`.

In [2]:
def arb(S):
    for x in S:
        return x

The function `regexp_sum` takes a set $S = \{ r_1, \cdots, r_n \}$ of regular expressions
as its argument.  It returns the regular expression 
$$ r_1 + \cdots + r_n. $$

In [None]:
def regexp_sum(S):
    n = len(S)
    if n == 0:
        return 0
    elif n == 1:
        return arb(S)
    else:
        r = arb(S)
        return ('+', r, regexp_sum(S - { r }))

The function `rpq` assumes there is some <span style="font-variant:small-caps;">Fsm</span>
$$ F = \langle \texttt{States}, \Sigma, \delta, \texttt{q0}, \texttt{Accepting} \rangle $$
given and takes five arguments:
- `p1` and `p2` are states of the <span style="font-variant:small-caps;">Fsm</span> $F$,
- $\Sigma$ is the alphabet of the <span style="font-variant:small-caps;">Fsm</span>,
- $\delta$ is the transition function of the <span style="font-variant:small-caps;">Fsm</span> $F$, and
- `Allowed` is a subset of the set `States`.

The function `rpq` computes a regular expression that describes those strings that take the 
<span style="font-variant:small-caps;">Fsm</span> $F$ from the state `p1` to state `p2`.
When $F$ switches states from `p1` to `p2` only states in the set `Allowed` may be visited in-between the states `p1` and `p2`.

The function is defined by recursion on the set `Allowed`.  There are two cases
- $\texttt{Allowed} = \{\}$.  
  Define `AllChars`as the set of all characters that when read by $F$ in the state `p1` cause $F$ to enter the state `p2`:
  $$ \texttt{AllChars} = \{ c \in \Sigma \mid \delta(p_1, c) = p_2 \} $$
  Then we need a further case distinction:
  - $p_1 = p_2$: In this case we have:
    $$ \texttt{rpq}(p_1, p_2, \{\}) := \sum\limits_{c\in\texttt{AllChars}} c \quad + \varepsilon$$
    If $\texttt{AllChars} = \{\}$ the sum $\sum\limits_{c\in\texttt{AllChars}} c$ is to be interpreted as the
    regular expression $\emptyset$ that denotes the empty language. 
    
    Otherwise, if $\texttt{AllChars} = \{c_1,\cdots,c_n\}$ we have
    $\sum\limits_{c\in\texttt{AllChars}} c \quad = c_1 + \cdots + c_n$.
  - $p_1 \not= p_2$: In this case we have:
    $$ \texttt{rpq}(p_1, p_2, \{\}) := \sum\limits_{c\in\texttt{AllChars}} c \quad$$
- $\texttt{Allowed} = \{ q \} + \texttt{RestAllowed}$.  In this case we recursively define the following variables:
  1. $\texttt{rp1p2} := \texttt{rpq}(p_1, p_2, \Sigma, \delta, \texttt{RestAllowed})$,
  2. $\texttt{rp1q } := \texttt{rpq}(p_1, q, \Sigma, \delta, \texttt{RestAllowed})$,
  3. $\texttt{rqq }\texttt{ } := \texttt{rpq}(q, q, \Sigma, \delta, \texttt{RestAllowed})$,
  4. $\texttt{rqp2 } := \texttt{rpq}(q, p_2, \Sigma, \delta, \texttt{RestAllowed})$.

  Then we can define:
  $$ \texttt{rpq}(p_1, p_2, \texttt{Allowed}) := \texttt{rp1p2} + \texttt{rp1q} \cdot \texttt{rqq}^* \cdot \texttt{rqp} $$
  This formula can be understood as follows:  If a string $w$ is read in state $p_1$ and reading this string takes the 
  <span style="font-variant:small-caps;">Fsm</span> $F$ from the state $p_1$ to the state $p_2$ without visiting any state from the set 
  `Allowed` in-between, then there are two cases:
  - Reading $w$ does not visit the state $q$ in-between.  Hence the string $w$ can be described by the regular expression
    `rp1p2`.
  - The string $w$ can be written as $w = t u_1 \cdots u_n v$ where:
    - reading $t$ in the state $p_1$ takes the <span style="font-variant:small-caps;">Fsm</span> $F$ into the state $q$,
    - for all $i \in \{1,\cdots,n\}$ reading $v_i$ in the state $q$ takes the <span style="font-variant:small-caps;">Fsm</span> $F$ from $q$ to $q$, and
    - reading $v$ in the state $q$ takes the <span style="font-variant:small-caps;">Fsm</span> $F$ into the state $p_2$.

In [3]:
def rpq(p1, p2, Σ, 𝛿, Allowed):
    if Allowed == set():
        AllChars = { c for c in Σ 
                       if 𝛿.get((p1, c)) == p2 
                   }
        r = regexp_sum(AllChars)
        if p1 == p2:
            if AllChars == set():
                return ''
            else:
                return ('+', '', r)
        else:
             return r
    else:
        q = arb(Allowed)
        RestAllowed = Allowed - { q }
        rp1p2 = rpq(p1, p2, Σ, 𝛿, RestAllowed)
        rp1q  = rpq(p1,  q, Σ, 𝛿, RestAllowed)
        rqq   = rpq( q,  q, Σ, 𝛿, RestAllowed)
        rqp2  = rpq( q, p2, Σ, 𝛿, RestAllowed)
        return ('+', rp1p2, ('&', ('&', rp1q, ('*', rqq)), rqp2))

The function `dfa_2_regexp` takes a deterministic <span style="font-variant:small-caps;">Fsm</span> $F$ and computes a regular expression $r$ that describes the same language as $F$, i.e. we have
$$ L(A) = L(r). $$
Furthermore, it tries to simplify the regular expression $r$ using some algebraic rules.

In [4]:
def dfa_2_regexp(F):
    States, Σ, 𝛿, q0, Accepting = F
    r = regexp_sum({ rpq(q0, p, Σ, 𝛿, States) for p in Accepting })
    return r

The notebook `Test-DFA-2-RegExp.ipynb` provides a test for the function `dfa_2_regexp`.