In [1]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

# From Regular Expressions to <span style="font-variant:small-caps;">Fsm</span>s

This notebook shows how a given regular expression $r$
can be transformed into an equivalent finite state machine. It implements the theory that is outlined in section 4.4. of the lecture notes.

## Declaring the Necessary Types

First, we import the necessary libraries. We continue to use `RecursiveSet` to manage sets of states and alphabets efficiently.

In [2]:
import { RecursiveSet } from "recursive-set";

In [3]:
type BinaryOp = '⋅' | '+';
type UnaryOp  = '*';

The type `RegExp` describes the parse tree of a regular expression. This will be the input of the program we develop in this notebook.

- The expression `0` denotes the empty set $\emptyset$.
- The string `'ε'` denotes the empty word $\varepsilon$.
- Any other string of length 1 denotes a character.
- `[r, '*']` denotes the Kleene star $r^*$.
- `[r1, '⋅', r2]` denotes concatenation $r_1 \cdot r_2$.
- `[r1, '+', r2]` denotes union $r_1 + r_2$.

In [4]:
type State = number;
type Char = string;
type RegExp = number | string | [RegExp, UnaryOp] | [RegExp, BinaryOp, RegExp];

The type `Delta` denotes the transition relation of a non-deterministic finite automaton. The type `NFA` denotes a non-deterministic finite automaton. Its elements are 5-tuples of the form
$$ \langle Q, \Sigma, \delta, q_0, A \rangle $$
where
- $Q$ is the set of states,
- $\Sigma$ is the alphabet,
- $\delta: Q \times (\Sigma \cup \{\varepsilon\}) \rightarrow 2^Q$ is the transition relation,
- $q_0 \in Q$ is the start state, and
- $A \subseteq Q$ is the set of accepting states.

In [5]:
function key(q: State, c: Char): string {
  return `${q},${c}`;
}

In [6]:
type Delta = Map<string, RecursiveSet<State>>;

In [7]:
type NFA = {
    Q: RecursiveSet<State>;
    Sigma: RecursiveSet<Char>;
    delta: Delta;
    q0: State;
    A: RecursiveSet<State>;
};

### State Generator

Since we need to generate unique integer states for each new NFA component we build, we use a simple helper class `StateGenerator`.


In [8]:
class StateGenerator {
    private stateCount: number = 0;

    getNewState(): State {
        this.stateCount += 1;
        return this.stateCount;
    }
}

## NFA Construction Functions

The NFA `genEmptyNFA()` is defined as
$$ \langle \{q_0, q_1\}, \Sigma, \{\}, q_0, \{q_1\} \rangle. $$
Note that this NFA has no transitions at all.


In [9]:
function genEmptyNFA(gen: StateGenerator, Sigma: RecursiveSet<Char>): NFA {
    const q0 = gen.getNewState();
    const q1 = gen.getNewState();
    
    return {
        Q: new RecursiveSet(q0, q1),
        Sigma: Sigma,
        delta: new Map(),
        q0: q0,
        A: new RecursiveSet(q1)
    };
}

The NFA `genEpsilonNFA` is defined as
$$ \langle \{q_0, q_1\}, \Sigma, \{ \langle q_0, \varepsilon \rangle \mapsto \{q_1\} \}, q_0, \{q_1\} \rangle. $$


In [10]:
function genEpsilonNFA(gen: StateGenerator, Sigma: RecursiveSet<Char>): NFA {
    const q0 = gen.getNewState();
    const q1 = gen.getNewState();
    
    const delta: Delta = new Map();
    delta.set(key(q0, 'ε'), new RecursiveSet(q1));
    
    return {
        Q: new RecursiveSet(q0, q1),
        Sigma: Sigma,
        delta: delta,
        q0: q0,
        A: new RecursiveSet(q1)
    };
}

For a letter $c \in \Sigma$, the NFA `genCharNFA(c)` is defined as
$$ A(c) = \langle \{q_0, q_1\}, \Sigma, \{ \langle q_0, c \rangle \mapsto \{q_1\} \}, q_0, \{q_1\} \rangle. $$

In [11]:
function genCharNFA(gen: StateGenerator, Sigma: RecursiveSet<Char>, c: Char): NFA {
    const q0 = gen.getNewState();
    const q1 = gen.getNewState();
    
    const delta: Delta = new Map();
    delta.set(key(q0, c), new RecursiveSet(q1));
    
    return {
        Q: new RecursiveSet(q0, q1),
        Sigma: Sigma,
        delta: delta,
        q0: q0,
        A: new RecursiveSet(q1)
    };
}

### Helper: Merging Deltas

When combining NFAs, we often need to merge two transition functions $\delta_1$ and $\delta_2$.


In [12]:
function copyDelta(d1: Delta, d2: Delta): Delta {
    const newDelta = new Map(d1);
    for (const [k, v] of d2) {
        newDelta.set(k, v);
    }
    return newDelta;
}

### Concatenation

Given two NFAs $f_1$ and $f_2$, the function `catenate(f1, f2)` creates an NFA that recognizes a string $s$ if it can be written in the form $s = s_1 s_2$ where $s_1$ is recognized by $f_1$ and $s_2$ is recognized by $f_2$.

Assume that $f_1 = \langle Q_1, \Sigma, \delta_1, q_1, \{q_2\} \rangle$ and $f_2 = \langle Q_2, \Sigma, \delta_2, q_3, \{q_4\} \rangle$ with disjoint states.
Then `catenate(f1, f2)` is defined as:
$$ \langle Q_1 \cup Q_2, \Sigma, \{ \langle q_2, \varepsilon \rangle \mapsto \{q_3\} \} \cup \delta_1 \cup \delta_2, q_1, \{q_4\} \rangle. $$


In [13]:
function catenate(gen: StateGenerator, f1: NFA, f2: NFA): NFA {
    const Q1 = f1.Q;
    const Sigma = f1.Sigma;
    const delta1 = f1.delta;
    const q1 = f1.q0;
    const A1 = f1.A;
    
    const Q2 = f2.Q;
    const delta2 = f2.delta;
    const q3 = f2.q0;
    const A2 = f2.A;
    
    const q2 = Array.from(A1)[0] as State;
    
    const delta = copyDelta(delta1, delta2);
    
    delta.set(key(q2, 'ε'), new RecursiveSet(q3));
    
    return {
        Q: Q1.union(Q2),
        Sigma: Sigma,
        delta: delta,
        q0: q1,
        A: A2
    };
}

### Disjunction

Given two NFAs $f_1$ and $f_2$, the function `disjunction(f1, f2)` creates an NFA that recognizes a string $s$ if it is either recognized by $f_1$ or by $f_2$.

The construction introduces a new start state $q_0$ and a new final state $q_5$, connecting $q_0$ via $\varepsilon$ to the start states of $f_1, f_2$, and connecting the old final states to $q_5$.


In [14]:
function disjunction(gen: StateGenerator, f1: NFA, f2: NFA): NFA {
    const Q1 = f1.Q;
    const Sigma = f1.Sigma;
    const delta1 = f1.delta;
    const q1 = f1.q0;
    const A1 = f1.A;

    const Q2 = f2.Q;
    const delta2 = f2.delta;
    const q2 = f2.q0;
    const A2 = f2.A;
    
    const q3 = Array.from(A1)[0] as State;
    const q4 = Array.from(A2)[0] as State;
    
    const q0 = gen.getNewState();
    const q5 = gen.getNewState();
    
    const delta = copyDelta(delta1, delta2);
    
    delta.set(key(q0, 'ε'), new RecursiveSet(q1, q2));
    delta.set(key(q3, 'ε'), new RecursiveSet(q5));
    delta.set(key(q4, 'ε'), new RecursiveSet(q5));
    
    return {
        Q: new RecursiveSet(q0, q5).union(Q1).union(Q2),
        Sigma: Sigma,
        delta: delta,
        q0: q0,
        A: new RecursiveSet(q5)
    };
}

### Kleene Star

Given an NFA $f$, the function `kleene(f)` creates an NFA for $r^*$.
It introduces new start state $q_0$ and new final state $q_3$.

- $q_0 \xrightarrow{\varepsilon} q_1$ (old start)
- $q_0 \xrightarrow{\varepsilon} q_3$ (skip/empty)
- $q_2 \xrightarrow{\varepsilon} q_1$ (loop back)
- $q_2 \xrightarrow{\varepsilon} q_3$ (exit)


In [15]:
function kleene(gen: StateGenerator, f: NFA): NFA {
    const M = f.Q;
    const Sigma = f.Sigma;
    const delta0 = f.delta;
    const q1 = f.q0;
    const A = f.A;
    
    const q2 = Array.from(A)[0] as State;
    
    const q0 = gen.getNewState();
    const q3 = gen.getNewState();
    
    const delta = new Map(delta0);
    
    delta.set(key(q0, 'ε'), new RecursiveSet(q1, q3));
    delta.set(key(q2, 'ε'), new RecursiveSet(q1, q3));
    
    return {
        Q: new RecursiveSet(q0, q3).union(M),
        Sigma: Sigma,
        delta: delta,
        q0: q0,
        A: new RecursiveSet(q3)
    };
}

## Main Class: RegExp2NFA

Now we bundle everything into the main class that recursively processes the `RegExp` tree.

In [16]:
class RegExp2NFA {
    private gen: StateGenerator;
    private sigma: RecursiveSet<Char>;

    constructor(sigma: RecursiveSet<Char>) {
        this.sigma = sigma;
        this.gen = new StateGenerator();
    }

    public toNFA(r: RegExp): NFA {
        if (r === 0) {
            return genEmptyNFA(this.gen, this.sigma);
        }
        
        if (r === 'ε') {
            return genEpsilonNFA(this.gen, this.sigma);
        }
        
        if (typeof r === 'string' && r.length === 1) {
            return genCharNFA(this.gen, this.sigma, r);
        }
        
        if (Array.isArray(r)) {
            if (r.length === 2 && r[1] === '*') {
                return kleene(this.gen, this.toNFA(r[0]));
            }
            
            if (r.length === 3 && r[1] === '⋅') {
                return catenate(this.gen, this.toNFA(r[0]), this.toNFA(r[2]));
            }
            
            if (r.length === 3 && r[1] === '+') {
                return disjunction(this.gen, this.toNFA(r[0]), this.toNFA(r[2]));
            }
        }
        
        throw new Error(`${JSON.stringify(r)} is not a proper regular expression.`);
    }
}

The notebook `04-Test-Regexp-2-NFA.ipynb` can be used to test the functions implemented in this notebook.