A deterministic finite-state automaton (DFA) is one of the simplest computing machines, consisting of a finite number of states and bounded read/write memory. Such machines are very limited in the things they can compute. However, the tasks they do compute are performed extremely fast, like integer reductions modulo $m$.

Associated to each DFA is a unique minimal state DFA; the unique one (up to renumbering of states) which accepts the same language and has the smallest possible number of states. Thus, every regular language can be represented by a unique minimal state DFA. From hereon, an $(n, k)$-DFA is one with $n$ states and an alphabet of size $k$.

There are many well-known algorithms that, on input of any DFA, produce the minimal DFA associated to it. We shall use Hopcroft's table-filling algorithm. We aim to count all the minimal $(n, k)$-DFAs for sufficiently small $(n, k)$, and we investigate which of these define finite languages.

There are many symmetry properties of DFAs we can appeal to that may make our programs run faster and more efficiently (and in some cases, more elegant and concise). Moreover, we can verify some of the small cases by hand, which can be useful to check that our programs are running correctly.

---

Without loss of generality, we will assume that our states are always labelled by integers, and that any DFA with $n$ states has these labelled by $\{1, \dots, n\}$. We will also always assume that the state labelled $1$ is the start state. Finally, we will assume that any DFA on $k$ letters has alphabet $\{1, \dots, k\}$.

An $(n, k)$-transition table is a transition table for an $(n, k)$-DFA. We will need to establish a way to store transition tables. As we have adopted the convention that state $1$ is the start state, this does not need to be reflected in the table. One way to store such tables could be as a matrix. We can use an $n \times (k + 1)$ matrix to represent an $(n, k)$-transition table, with the final column consisting of $1$'s and $0$'s to denote which states are accepting or non-accepting. Conventions vary, but we take the convention that the start state can be an accept state.

In [25]:
# @title Data

Table_1 = """
2  5  1
10 10 1
1  2  0
8  3  0
9  2  0
9  2  0
1  9  1
4  6  0
3  6  0
9  2  1
"""

Table_2 = """
6 4 7 1
3 6 5 0
2 6 2 0
6 2 2 0
7 7 3 0
3 7 6 0
5 4 6 0
"""

Table_3 = """
9 5 5 1
7 2 6 0
7 4 1 0
1 2 8 0
8 1 6 0
9 9 8 0
9 3 4 0
8 3 5 0
8 3 1 0
"""

Table_4 = """
18  65  1
67  33  1
34  66  1
90  75  1
12  59  1
99  75  1
54  24  1
71  74  1
100 98  1
29  87  1
42  9   1
47  37  1
77  37  1
82  69  1
11  60  1
18  79  1
36  37  1
6   21  1
53  9   1
34  78  1
18  21  1
21  39  1
91  56  1
68  23  1
47  65  1
92  49  1
11  16  1
75  79  1
74  11  1
57  30  1
19  24  1
60  54  1
30  10  1
14  41  1
22  11  1
90  12  1
8   79  1
25  30  1
6   61  1
45  97  1
2   44  1
90  70  1
20  76  1
10  44  1
31  66  1
46  11  1
11  94  1
100 19  1
34  27  1
30  80  1
7   49  1
30  77  1
5   40  1
51  28  1
77  4   1
64  68  1
9   43  1
9   46  1
78  61  1
91  6   1
54  32  1
11  78  1
83  70  1
34  13  1
30  14  1
75  10  1
2   1   1
5   43  1
67  66  1
61  73  1
53  54  1
73  11  1
71  64  1
79  13  1
29  14  1
70  10  1
56  15  1
40  17  1
7   20  1
79  32  1
34  32  1
61  22  1
75  26  1
11  90  1
13  71  1
55  56  1
49  19  1
90  22  1
80  8   1
74  92  1
6   71  1
8   56  1
9   32  1
80  17  1
95  63  1
69  99  1
14  18  1
73  26  1
12  40  1
12  8   1
"""

The number of $(n, k)$-DFAs can be vast, even for relatively small $k$ and $n$. However, the number of unique languages defined by these is much less than the total number of possible DFAs.

The number of DFAs with $n$ states and an alphabet of size $k$ is given by the formula
\begin{equation}
    N = n^{nk} 2^n.
\end{equation}
This can be seen as follows: For every state (of which there are $n$) and every alphabet symbol (of which there are $k$), there must be exactly one transition to a next state. Since there are $n$ possible target states for each of the $nk$ entries in the transition table, there are $n^{nk}$ possible transition functions. Each of the $n$ states can independently be either an accepting state or a non-accepting state. This results in $2^n$ possible subsets of accepting states.

In [26]:
import pandas as pd

def number_of_DFAs_table(n, k):
    df = pd.DataFrame(columns=range(1, k+1), index=range(1, n+1))
    df.columns.name = 'n \\ k'
    for i in range(1, n+1):
        for j in range(1, k+1):
            val = i**(i*j) * 2**i
            df.loc[i, j] = val
    return df

display(number_of_DFAs_table(6, 4).style.format(formatter="{:.1e}"))

n \ k,1,2,3,4
1,2.0,2.0,2.0,2.0
2,16.0,64.0,260.0,1000.0
3,220.0,5800.0,160000.0,4300000.0
4,4100.0,1000000.0,270000000.0,69000000000.0
5,100000.0,310000000.0,980000000000.0,3100000000000000.0
6,3000000.0,140000000000.0,6500000000000000.0,3e+20


A state in a DFA is said to be accessible if it can be reached by starting at the start state and following a path in the transition diagram labelled by some word w. Otherwise, it is inaccessible. Clearly, a minimal DFA can have no inaccessible states. An arithmetic operation can be taken to be the addition, subtraction, multiplication or division of two integers; or the reading or re-writing of a matrix entry.

To determine the set of accessible states, we can perform a graph traversal such as DFS or BFS starting from the start state $1$. There are $n$ states which correspond to $n$ vertices. Since the alphabet size is $k$, every state has exactly $k$ outgoing transitions so the total number of edges is $n \times k$.

A standard BFS/DFS algorithm visits every accessible vertex and traverses every reachable edge exactly once. The time complexity is $O(V + E) = O(n + nk) = O(nk)$. In terms of matrix operations, we read the transition entries for every accessible state. In the worst case (all states accessible), we perform $nk$ matrix reads. Thus, the overall time complexity is $O(nk)$.

We calculate the number of labeled DFAs where all states are reachable from the start state $1$. This is done by subtracting DFAs with inaccessible states from the total number of transition structures, then multiplying by the $2^n$ possible accept/reject configurations.

In [27]:
import numpy as np

def parse_table(table_str):
    '''
    Parses a space-delimited string into a 2D integer list (matrix).
    Handles multi-line strings with potential extra whitespace.
    '''
    # Filter out empty lines to avoid parsing errors
    lines = [line for line in table_str.strip().splitlines() if line.strip()]
    return np.loadtxt(lines, dtype=int)

def get_accessible_states(transition_table):
    '''
    Determines the set of accessible states.
    Args:
        transition_table: n x (k+1) matrix.
    Returns:
        Sorted list of accessible state labels (1-based).
    '''
    # Handle empty case
    if transition_table.size == 0:
        return []

    n_states, cols = transition_table.shape
    k = cols - 1  # Number of alphabet columns

    # Boolean array to keep track of visited states (0-indexed)
    visited = np.zeros(n_states, dtype=bool)

    # Start at state 1 (index 0)
    queue = [0]
    visited[0] = True

    while queue:
        curr_idx = queue.pop(0)

        # Get all target states for the current state across all inputs
        neighbors = transition_table[curr_idx, :k] - 1

        # Iterate through neighbors
        for next_idx in neighbors:
            if not visited[next_idx]:
                visited[next_idx] = True
                queue.append(next_idx)

    # Convert boolean mask to indices, then back to 1-based labels
    accessible_indices = np.where(visited)[0]
    return (accessible_indices + 1)

Let $A_n$ be the number of connected transition structures on $n$ states with alphabet size $k$. The recurrence relation is
\begin{equation}
    A_n = n^{nk} - \sum_{i=1}^{n-1} \binom{n-1}{i-1} A_i n^{(n-i)k}.
\end{equation}
The total number of such DFAs is $A_n \times 2^n$. For $k=2$:

*   $n=1$:
    *   Transition structures: $1^{1\times2} = 1$. (State $1$ goes to $1$ on both inputs).
    *   Connected structures ($A_1$): $1$.
    *   Total DFAs: $1 \times 2^1 = 2$.

*   $n=2$:
    *   Total transition structures: $2^{2\times2} = 16$.
    *   Disconnects: The only way to have inaccessible states is if state $1$ transitions only to itself ($1 \to 1, 1 \to 1$). State $2$ can go anywhere ($4$ options). $1 \times 4 = 4$ disconnected structures.
    *   Connected structures ($A_2$): $16 - 4 = 12$.
    *   Total DFAs: $12 \times 2^2 = 48$.

*   **$n=3$:**
    *   Total transition structures: $3^{3\times2} = 3^6 = 729$.
    *   Using the formula, $A_3 = 432$.
    *   Total DFAs: $432 \times 2^3 = 3,456$.

*   **$n=4$:**
    *   Total transition structures: $4^{4\times2} = 4^8 = 65,536$.
    *   Using the formula, $A_4 = 31,488$.
    *   Total DFAs: $31,488 \times 2^4 = 503,808$.

In [28]:
raw_tables = [Table_1, Table_2, Table_3, Table_4]
table_names = ["Table 1", "Table 2", "Table 3", "Table 4"]
matrices = [parse_table(raw_str) for raw_str in raw_tables]
accessible = [get_accessible_states(matrix) for matrix in matrices]
print(f"{'Table':<10} | {'Total States':<18} | {'Accessible States':<18}")
print("-" * 55)
for i, matrix in enumerate(matrices):
    print(f"{table_names[i]:<10} | {matrix.shape[0]:<18} | {len(accessible[i]):<18}")

Table      | Total States       | Accessible States 
-------------------------------------------------------
Table 1    | 10                 | 7                 
Table 2    | 7                  | 7                 
Table 3    | 9                  | 9                 
Table 4    | 100                | 76                


Two states $p$ and $q$ of a DFA are said to be equivalent if, for every word $w$, if we follow the two (unique) paths in the transition diagram from $p$ and from $q$ labelled by $w$, then either both end at accepting states or both end at non-accepting states. Hopcroft's table-filling algorithm is a quick and efficient way to determine which states of a DFA are equivalent.

In [29]:
def remove_inaccessible_states(table):
    '''
    Remove unreachable rows and renumber the remaining states sequentially.
    '''
    # Get the list of accessible state labels (1-based)
    acc_labels = get_accessible_states(table)
    # Convert to 0-based indices
    acc_indices = np.array(acc_labels) - 1

    # Extract only the accessible rows
    sub_table = table[acc_indices].copy()

    # Create a remapping map (Old Label -> New Label)
    # Initialise with 0 or -1
    mapping = np.zeros(len(table) + 1, dtype=int)
    for new_label, old_label in enumerate(acc_labels, start=1):
        mapping[old_label] = new_label

    # Update the transition columns (0 to k-1) with new labels
    k = sub_table.shape[1] - 1

    # Apply mapping to all transition cells
    sub_table[:, :k] = mapping[sub_table[:, :k]]

    return sub_table

def minimise_dfa(table):
    '''
    Minimises the DFA using Hopcroft's table-filling algorithm.
    '''
    if table.size == 0:
        return table

    # Remove inaccessible states
    clean_table = remove_inaccessible_states(table)
    n, cols = clean_table.shape
    k = cols - 1

    # Table filling algorithm
    # distinct[i, j] is True if states i and j are distinguishable
    distinct = np.zeros((n, n), dtype=bool)

    # Base Case: Mark pairs where one is accepting and the other is not
    is_accepting = clean_table[:, k] == 1
    for i in range(n):
        for j in range(i + 1, n):
            if is_accepting[i] != is_accepting[j]:
                distinct[i, j] = True
                distinct[j, i] = True # Symmetric

    # Iteration: Propagate distinguishability
    changed = True
    while changed:
        changed = False
        for i in range(n):
            for j in range(i + 1, n):
                if not distinct[i, j]:
                    # Check transitions for all alphabet symbols
                    for char_idx in range(k):
                        # Convert 1-based table values to 0-based indices
                        u = clean_table[i, char_idx] - 1
                        v = clean_table[j, char_idx] - 1

                        if distinct[u, v]:
                            distinct[i, j] = True
                            distinct[j, i] = True
                            changed = True
                            break

    # Construct minimal DFA and group equivalent states
    # state_mapping: old_state_index -> new_group_id
    state_mapping = np.full(n, -1, dtype=int)
    group_count = 0

    # Determine groups
    for i in range(n):
        if state_mapping[i] == -1:
            state_mapping[i] = group_count
            # Find all equivalent states j > i
            for j in range(i + 1, n):
                if not distinct[i, j]:
                    state_mapping[j] = group_count
            group_count += 1

    # Ensure start state (old index 0) maps to new index 0 (label 1)
    start_group = state_mapping[0]
    if start_group != 0:
        # Swap group IDs in the mapping
        state_mapping = np.where(state_mapping == 0, -2, state_mapping)
        state_mapping = np.where(state_mapping == start_group, 0, state_mapping)
        state_mapping = np.where(state_mapping == -2, start_group, state_mapping)

    # Build minimal table
    min_table = np.zeros((group_count, cols), dtype=int)

    # Pick one representative per group to read transitions from
    visited_groups = set()

    for i in range(n):
        g = state_mapping[i]
        if g not in visited_groups:
            visited_groups.add(g)
            # Set accepting status
            min_table[g, k] = clean_table[i, k]
            # Set transitions
            for c in range(k):
                old_target = clean_table[i, c] - 1
                new_target_group = state_mapping[old_target]
                min_table[g, c] = new_target_group + 1 # Store as 1-based

    return min_table

In [30]:
raw_tables = [Table_1, Table_2, Table_3]
table_names = ["Table 1", "Table 2", "Table 3"]
matrices = [parse_table(raw_str) for raw_str in raw_tables]
original_n_states = [matrix.shape[0] for matrix in matrices]
accessible_states = [get_accessible_states(matrix) for matrix in matrices]
minimal_matrices = [minimise_dfa(matrix) for matrix in matrices]
minimal_n_states = [matrix.shape[0] for matrix in minimal_matrices]

print(f"{'Table':<10} | {'Original States':<18} | {'Accessible States':<18} | {'Minimal States':<18}")
print("-" * 70)
for i, matrix in enumerate(matrices):
    print(f"{table_names[i]:<10} | {original_n_states[i]:<18} | {len(accessible_states[i]):<18} | {minimal_n_states[i]:<18}")

Table      | Original States    | Accessible States  | Minimal States    
----------------------------------------------------------------------
Table 1    | 10                 | 7                  | 6                 
Table 2    | 7                  | 7                  | 2                 
Table 3    | 9                  | 9                  | 9                 


It is often useful in mathematics to compute and analyse some special cases of a mathematical object, to gain some intuition for how the object behaves as a whole. We will try and understand properties of minimal DFAs and the language they define, by looking at all the minimal $(n, k)$-DFAs for small $n$ and $k$.

We want to count the number of distinct regular languages that require exactly $n$ states to be recognised, for $k=2$. This is equivalent to finding the number of minimal $(n, 2)$-DFAs that are distinct up to isomorphism (renumbering of states).

To solve this, we can use a generate and filter approach:

1.  Iterate through all possible transition tables ($n^{nk}$ combinations).
2.  Discard any table where not all $n$ states are accessible from the start state.
3.  For each valid transition structure, iterate through all $2^n$ possible acceptance configurations (which states are accepting vs non-accepting).
4.  Run the minimisation algorithm. If the resulting minimal DFA has fewer than $n$ states, then discard it. (This means the language could have been defined by a smaller DFA).
5.  Filter isomorphisms: Different transition tables might represent the same graph, just with state labels swapped. To prevent double counting, convert every valid minimal DFA into a canonical form. Renumber the states based on a standard BFS traversal order starting from state $1$. Add the canonical representation of the DFA to a set.
6.  Count the size of the set.


In [31]:
import itertools

def canonicalise(table):
    '''
    Converts a DFA table into a standard 'canonical' form.
    '''
    n, cols = table.shape
    k = cols - 1

    # Build the mapping from Old State -> New State
    mapping = np.full(n, -1, dtype=int)

    # We assume the input table is always 1-based,
    start_node = 0
    mapping[start_node] = 0
    next_label = 1

    queue = [start_node]

    while queue:
        curr = queue.pop(0)
        # Check transitions in fixed order
        for c in range(k):
            # Table values are 1-based, convert to index
            target_idx = table[curr, c] - 1

            if mapping[target_idx] == -1:
                mapping[target_idx] = next_label
                next_label += 1
                queue.append(target_idx)

    # Create the new table
    new_table = np.zeros((n, cols), dtype=int)

    for old_idx in range(n):
        new_idx = mapping[old_idx]

        # Copy accepting status
        new_table[new_idx, k] = table[old_idx, k]

        # Remap transitions
        for c in range(k):
            old_target = table[old_idx, c] - 1
            new_target = mapping[old_target]
            new_table[new_idx, c] = new_target + 1 # Store as 1-based

    return tuple(new_table.flatten())

def get_distinct_languages(n, k=2):
    distinct_languages = set()

    # Generate all transition structures: n states, k inputs -> n^(n*k) combinations
    states = list(range(1, n + 1))

    # Create iterator for one row
    row_options = list(itertools.product(states, repeat=k))

    # Iterate through all possible transition tables
    for transitions_flat in itertools.product(row_options, repeat=n):
        # Construct the transition part of the matrix
        trans_matrix = np.array(transitions_flat, dtype=int)

        # Check accessibility immediately
        # We temporarily append a dummy accept column to satisfy function signature
        dummy_table = np.hstack((trans_matrix, np.zeros((n, 1), dtype=int)))

        accessible = get_accessible_states(dummy_table)

        # If not all n states are accessible, then this cannot be a minimal DFA
        if len(accessible) != n:
            continue

        # Iterate through all 2^n accept/reject configurations
        for accept_config in itertools.product([0, 1], repeat=n):
            accept_col = np.array(accept_config).reshape(n, 1)

            # Full table: [Transitions | Accept]
            full_table = np.hstack((trans_matrix, accept_col))

            # Minimise the DFA
            minimal_table = minimise_dfa(full_table)

            # We only consider the minimal tables with has exactly n states
            if minimal_table.shape[0] == n:
                # Canonicalise to handle isomorphisms
                signature = canonicalise(minimal_table)
                distinct_languages.add(signature)

    return distinct_languages

In [33]:
distinct_languages = [get_distinct_languages(n, k=2) for n in range(1, 5)]
counts = [len(languages) for languages in distinct_languages]
print(f"{'n':<5} | {'Count'}")
print("-" * 15)
for n, count in enumerate(counts, start=1):
    print(f"{n:<5} | {count}")

n     | Count
---------------
1     | 2
2     | 24
3     | 1028
4     | 56014
