In [1]:
from IPython.core.display import HTML
with open ('../style.css', 'r') as file:
    css = file.read()
HTML(css)

In [2]:
%load_ext nb_mypy

Version 1.0.5


In [3]:
from typing import TypeVar
from collections.abc import Iterable

# Checking the Equivalence of Regular Expressions

In order to check whether two regular expressions $r_1$ and $r_2$ are *equivalent*, perform the 
following steps:
- convert the regular expressions $r_1$ and $r_2$ into non-deterministic *FSMs*
  $F_1$ and $F_2$ such that $L(r_1) = L(F_1)$ and $L(r_2) = L(F_2)$,
- convert the non-deterministic *FSMs* $F_1$ and $F_2$ into deterministic *FSMs*
  $D_1$ and $D_2$ such that $L(D_1) = L(F_1)$ and $L(D_2) = L(F_2)$
- check whether both $L(D_1) \backslash L(D_2)$ and $L(D_2) \backslash L(D_1)$ are empty.

The notebook `Regexp-2-NFA.ipynb` contains the function `RegExp2NFA.toNFA` that can be used to compute a non-deterministic 
<span style="font-variant:small-caps;">Fsm</span> that accepts the language described by a given regular expression.

In [4]:
%run Regexp-2-NFA.ipynb

The nb_mypy extension is already loaded. To reload it, use:
  %reload_ext nb_mypy


`NFA-2-DFA.ipynb` contains the function `nfa2dfa` that converts a non-deterministic 
*Fsm* into an equivalent deterministic *Fsm*.

In [5]:
%run NFA-2-DFA.ipynb

The nb_mypy extension is already loaded. To reload it, use:
  %reload_ext nb_mypy


In [6]:
S = TypeVar('S', bound=Iterable)
T = TypeVar('T', bound=Iterable)

Given two sets `A` and `B`, the function `cartesian_product(A, B)` computes the 
<em style="color:blue">cartesian product</em> $A \times B$ which is defined as
$$ A \times B := \{ (x, y) \mid x \in A \wedge y \in B \}. $$

In [7]:
def cartesian_product(A: S, B: T) -> set[tuple[S, T]]:
    return { (x, y) for x in A
                    for y in B
           }

In [8]:
cartesian_product({1, 2}, {'a', 'b'})

{(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')}

In [9]:
State     = TypeVar('State')
TransRel  = dict[tuple[State, str], State]
StatePair = tuple[State, State]
TransRel1 = dict[tuple[State, str], State]
TransRel2 = dict[tuple[StatePair, str], StatePair]
DFA1      = tuple[set[State], set[str], TransRel1, State, set[State]]
DFA2      = tuple[set[StatePair], set[str], TransRel2, StatePair, set[StatePair]]

Given to deterministic *FSMs* `F1` and `F2`, the expression `fsm_complement(F1, F2)` computes a deterministic 
*FSM* that recognizes the language  $L(F_1)\backslash L(F_2)$.

In [10]:
def fsm_complement(F1: DFA1, F2: DFA1) -> DFA2:
    States1, Σ, 𝛿1, q1, A1 = F1
    States2, _, 𝛿2, q2, A2 = F2
    States = cartesian_product(States1, States2)
    𝛿 = {}
    for p1, p2 in States:
        for c in Σ:
            𝛿[(p1, p2), c] = (𝛿1[p1, c], 𝛿2[p2, c])
    return States, Σ, 𝛿, (q1, q2), cartesian_product(A1, States2 - A2)

Given a regular expression $r$ and an alphabet $\Sigma$, the function $\texttt{regexp2DFA}(r, \Sigma)$
computes a deterministic *FSM* that accepts
the language specified by $r$.

In [11]:
RegExp = TypeVar('RegExp')
RegExp = int | str | tuple[RegExp, ...]

In [12]:
def regexp2DFA(r: RegExp, Σ: set[str]) -> DFA1:
    converter = RegExp2NFA(Σ)       # type: ignore
    nfa       = converter.toNFA(r)
    dfa       = nfa2dfa(nfa)        # type: ignore
    return dfa # type: ignore

Given a deterministic *FSM* $F$ the function 
`is_empty(F)` checks whether the language accepted by $F$ is empty.
In this function, the variable `Reachable` is the set of those states that are already known to be reachable
from the start state `q0`. `NewFound` are those states that can be reached from a state in the set 
`Reachable`.  When we find no new states that are reachable, the iteration stops and we check whether
there is a state that is both reachable and acceptable because in that case the language is not empty.

In [14]:
def is_empty(F: DFA2) -> bool:
    States, Σ, δ, q0, Accepting = F
    Reachable = { q0 }
    while True:
        NewFound = { δ[q, c] for q in Reachable for c in Σ }
        if NewFound <= Reachable:
            break
        Reachable |= NewFound
    return Reachable & Accepting == set()

The function `regExpEquiv` takes three arguments:
- $r_1$ and $r_2$ are regular expressions,
- $\Sigma$ is the alphabet used in these regular expressions.

The function returns `True` iff $r_1 \doteq r_2$, i.e. if $r_1$ and $r_2$ are equivalent. 

In [15]:
def regExpEquiv(r1: RegExp, r2: RegExp, Σ: set[str]) -> bool:
    F1 = regexp2DFA(r1, Σ)
    F2 = regexp2DFA(r2, Σ)    
    r1_minus_r2 = fsm_complement(F1, F2)
    r2_minus_r1 = fsm_complement(F2, F1)
    return is_empty(r1_minus_r2) and is_empty(r2_minus_r1)

The notebook `Test-Equivalence.ipynb` can be used to test this function.