# Basic Operations

In [19]:
import gtn
import nb_utils
nb_utils.init()

An operation on a transducer (or acceptor) takes one or more transducers as input and outputs a transducer. You can think of these operations as functions on graphs. We'll typically denote graphs by upper case variables, so the variable $A$ for example can represent a graph. Functions will be denoted by lower case variables. So $f(A)$ is a function which takes as input a single graph and outputs a graph.

## Closure

The closure, sometimes called the Kleene star, is a unary function (takes a single input) which can operate on either an acceptor or transducer. If the string $x$ is accepted by $A$, then the zero or more copies of $x$ are accepted by the closure of $A$. More formally, if the language of an acceptor is $\mathcal{L}(A)$, then the language of the closure of $A$ is $\{x^n \mid x \in \mathcal{L}(A),\;\; n = 0, 1, \ldots, \}$. The notation $x^n$ means $x$ concatenated $n$ times. So $x^2$ is $xx$ and $x^0$ is the empty string. Usually the closure of an acceptor is denoted by $^*$, as in $A^*$. This is the same notation used in regular expressions.

In [20]:
# Define the mapping from integer ids to arc symbol labels
symbols = {0: 'a', 1: 'b', 2: 'c'}

fsa = gtn.Graph()
fsa.add_node(start=True)
fsa.add_node()
fsa.add_node()
fsa.add_node(accept=True)
fsa.add_arc(src_node=0, dst_node=1, label=0)
fsa.add_arc(src_node=1, dst_node=2, label=1)
fsa.add_arc(src_node=2, dst_node=3, label=0)

gtn.draw(fsa, "figures/nb/fsa_pre_closure.svg", isymbols=symbols)

The closure of a graph is easy to construct with the use of $\epsilon$ transitions. The language of the graph below is the string $aba$. 

![fsa pre closure](figures/nb/fsa_pre_closure.svg)

The closure of the graph needs to accept an arbitrary number of copies of $aba$ including the empty string. To accept the empty string we make the start state an accept state as well. To accept one or more copies of $aba$ we simply wire up the old accept states to the new start state with $\epsilon$ transitions.

In [21]:
# Recreate the old graph but without any start states
fsa = gtn.Graph()
old_start = fsa.add_node()
fsa.add_node()
fsa.add_node()
accept = fsa.add_node(accept=True)
fsa.add_arc(src_node=0, dst_node=1, label=0)
fsa.add_arc(src_node=1, dst_node=2, label=1)
fsa.add_arc(src_node=2, dst_node=3, label=0)

# New start state which is also an accept state
start = fsa.add_node(start=True, accept=True)

# Connect the new start state to the old start states with an ϵ transition
fsa.add_arc(src_node=start, dst_node=old_start, label=gtn.epsilon)

# Connect the accept states to the new start state with an ϵ transition
fsa.add_arc(src_node=accept, dst_node=start, label=gtn.epsilon)

gtn.draw(fsa, "figures/nb/fsa_closure.svg", isymbols=symbols)

The closed graph is shown below.

![fsa closure](figures/nb/fsa_closure.svg)

---
### Example

You might think that the state $4$ in the above graph is superflouous. Consider an alternate construction for computing the closure of a graph. We could have made the state $0$ into an accept state and connected state $3$ to state $0$ with an $\epsilon$ transition.

In [5]:
fsa = gtn.Graph()

# Make the start state also an accept state
start = fsa.add_node(start=True, accept=True)
fsa.add_node()
fsa.add_node()
accept = fsa.add_node(accept=True)
fsa.add_arc(src_node=0, dst_node=1, label=0)
fsa.add_arc(src_node=1, dst_node=2, label=1)
fsa.add_arc(src_node=2, dst_node=3, label=0)

# Connect the accept states to the start state with an ϵ transition
fsa.add_arc(src_node=accept, dst_node=start, label=gtn.epsilon)

gtn.draw(fsa, "figures/nb/fsa_closure_2.svg", isymbols=symbols)

This alternate construction would result in the graph below.

![fsa closure 2](figures/nb/fsa_closure_2.svg)

This would work and require fewer states and arcs. In the general case, this construction turns every start state into an accept state instead of adding a new start state. Give an example where this doesn't work? In other words, give an example where the graph from this modified construction is not the closure of the original graph.

In [6]:
fsa = gtn.Graph()
fsa.add_node(start=True)
fsa.add_node(accept=True)
fsa.add_arc(src_node=0, dst_node=0, label=0)
fsa.add_arc(src_node=0, dst_node=1, label=1)

gtn.draw(fsa, "figures/nb/fsa_pre_closure_wrong.svg", isymbols=symbols)

![fsa pre closure wrong](figures/nb/fsa_pre_closure_wrong.svg)

The language of the above graph is $ab$ and the closure is $(ab)^*$.

In [7]:
# The modified (and incorrect) construction for the closure
fsa = gtn.Graph()
fsa.add_node(start=True, accept=True)
fsa.add_node(accept=True)
fsa.add_arc(src_node=0, dst_node=0, label=0)
fsa.add_arc(src_node=0, dst_node=1, label=1)
fsa.add_arc(src_node=1, dst_node=0, label=gtn.epsilon)

gtn.draw(fsa, "figures/nb/fsa_closure_wrong.svg", isymbols=symbols)

If we follow the modified construction for the closure, as in the graph below, then the language would incorrectly include the set of strings $a^*$.

![fsa closure wrong](figures/nb/fsa_closure_2.svg)

In [8]:
# The correct construction for the closure
fsa = gtn.Graph()
fsa.add_node()
fsa.add_node(accept=True)
fsa.add_node(start=True, accept=True)
fsa.add_arc(src_node=0, dst_node=0, label=0)
fsa.add_arc(src_node=0, dst_node=1, label=1)
fsa.add_arc(src_node=2, dst_node=0, label=gtn.epsilon)
fsa.add_arc(src_node=1, dst_node=2, label=gtn.epsilon)

gtn.draw(fsa, "figures/nb/fsa_closure_right.svg", isymbols=symbols)

The graph following the correct construction of the closure is shown below.

![fsa closure right](figures/nb/fsa_closure_right.svg)

---

## Union

The union takes as input two or more graphs and produces a new graph. The language of the resultant graph is the union of the languages of the input graphs. More formally let $A_1, \ldots, A_n$ be $n$ graphs. The language of the union graph is given by $\{ x \mid x \in A_i \textrm{ for some } i = 1, \ldots, n \}$.

In [9]:
# A graph which recognizes "aba*"
g1 = gtn.Graph()
g1.add_node(start=True)
g1.add_node()
g1.add_node(accept=True)
g1.add_arc(src_node=0, dst_node=1, label=0)
g1.add_arc(src_node=1, dst_node=2, label=1)
g1.add_arc(src_node=2, dst_node=2, label=0)

# A graph which recognizes "ba"
g2 = gtn.Graph()
g2.add_node(start=True)
g2.add_node()
g2.add_node(accept=True)
g2.add_arc(src_node=0, dst_node=1, label=1)
g2.add_arc(src_node=1, dst_node=2, label=0)

# A graph which recognizes "ac"
g3 = gtn.Graph()
g3.add_node(start=True)
g3.add_node()
g3.add_node(accept=True)
g3.add_arc(src_node=0, dst_node=1, label=0)
g3.add_arc(src_node=1, dst_node=2, label=2)

gtn.draw(g1, "figures/nb/union_1.svg", isymbols=symbols)
gtn.draw(g2, "figures/nb/union_2.svg", isymbols=symbols)
gtn.draw(g3, "figures/nb/union_3.svg", isymbols=symbols)

Since we let a graph have multiple start states and multiple accept states, the union is especially easy to construct. A state in the union graph is a start state if it was a start state in the original graph. A state in the union graph is an accept state if it was an accept state in the original graph.

Consider the three graphs below with languages $\{ab, aba, abaa, \ldots\}$, $\{ba\}$, and $\{ac\}$ respectively.

![union 1](figures/nb/union_1.svg)
![union 2](figures/nb/union_2.svg)
![union 3](figures/nb/union_3.svg)

In [10]:
fsa = gtn.union([g1, g2, g3])
gtn.draw(fsa, "figures/nb/union.svg", isymbols=symbols)

Notice in the union graph below the only visual distinction from the individual graphs is that the states are numbered consecutively from $0$ to $8$ indicating a single graph with nine states instead of three individual graphs. The language of the union graph below is $\{ba\} \cup \{ac\} \cup \{ab, aba, abaa, \ldots\}$.

![fsa union](figures/nb/union.svg)

## Concatenation

Like union, concatenation produces a new graph given two or more graphs as input. The language of the concatenated graph is the set of strings which can be formed by any concatenation of strings from the individual graph. Concatenation is not commutative, the order of the input graphs matters. More formally the language of the concatenated graph is given by $\{x_1 \ldots x_n \mid x_1 \in \mathcal{L}(A_1), \ldots, x_n \in \mathcal{L}(A_n)\}$

In [11]:
# The graph which recognizes "ba"
g1 = gtn.Graph()
g1.add_node(start=True)
g1.add_node()
g1.add_node(accept=True)
g1.add_arc(src_node=0, dst_node=1, label=1)
g1.add_arc(src_node=1, dst_node=2, label=0)

# The graph which recognizes "ac" and "bc"
g2 = gtn.Graph()
g2.add_node(start=True)
g2.add_node()
g2.add_node()
g2.add_node(accept=True)
g2.add_arc(src_node=0, dst_node=1, label=0)
g2.add_arc(src_node=1, dst_node=3, label=2)
g2.add_arc(src_node=0, dst_node=2, label=1)
g2.add_arc(src_node=2, dst_node=3, label=2)

gtn.draw(g1, "figures/nb/concat_1.svg", isymbols=symbols)
gtn.draw(g2, "figures/nb/concat_2.svg", isymbols=symbols)

The concatenated graph can be constructed from the original input graphs by wiring start states to accept states. Assume we are concatenating $A_1, \ldots, A_n$. The start states of the concatendated graph are the start states of the first graph, $A_1$. The accept states of the concatendated graph are the accept states of $A_n$. For any two graph $A_i$ and $A_{i+1}$, we connect each start state of $A_i$ to each accept state of $A_{i+1}$ with an $\epsilon$ transition.

![fsa concat 1](figures/nb/concat_1.svg)
![fsa concat 2](figures/nb/concat_2.svg)

As an example, consider the two graphs above.

In [12]:
fsa = gtn.concat([g1, g2])
gtn.draw(fsa, "figures/nb/concat.svg", isymbols=symbols)

The concatenated graph is below and has a language $\{baac, babc\}$.

![fsa concat](figures/nb/concat.svg)

---

### Example

What is the identity graph for the concatenation function? The identity in a binary operation is the value which when used in the operation leaves the secon input unchanged. In multiplication this would be $1$ since $c * 1 = c$ for any real value $c$.

What is the equivalent of the annihilator graph in the concatenation function? The annihilator in a binary operation is the value such that the operation with the annihilator always returns the annihilator. For multiplication $0$ is the annihilator since $0*c = 0$ for any real value $c$.

The graph which accepts the empty string is the identity. The graph which does not accept any strings is the annihilator. See the figures below for an example of these two graphs.

In [13]:
# The graph which accepts the empty string
fsa = gtn.Graph()
fsa.add_node(start=True, accept=True)
gtn.draw(fsa, "figures/nb/concat_identity.svg", isymbols=symbols)

In [14]:
# The graph which does not accept any strings
fsa = gtn.Graph()
fsa.add_node(start=True, accept=False)
gtn.draw(fsa, "figures/nb/concat_annihilator.svg", isymbols=symbols)

The identity graph is a single node which is both a start and accept state. The language of the identiy graph is the empty string.

![concat identity](figures/nb/concat_identity.svg)

The annihilator graph is a single non accepting state. The language of the annihilator graph is the empty set.

![concat annihilator](figures/nb/concat_annihilator.svg)

Note the subtle distinction between the language that contains the empty string and the language that is the empty set. The former can be written as $\{\epsilon\}$ whereas the latter would be $\{\}$ (also commonly denoted by $\varnothing$).

---

### Example

Construct the concatenation of the two graphs below.

In [15]:
g1 = gtn.Graph()
g1.add_node(start=True)
g1.add_node()
g1.add_node(accept=True)
g1.add_node(accept=True)
g1.add_arc(src_node=0, dst_node=1, label=1)
g1.add_arc(src_node=1, dst_node=2, label=0)
g1.add_arc(src_node=1, dst_node=3, label=2)

g2 = gtn.Graph()
g2.add_node(start=True)
g2.add_node(start=True)
g2.add_node(accept=True)
g2.add_arc(src_node=0, dst_node=2, label=0)
g2.add_arc(src_node=1, dst_node=2, label=2)

gtn.draw(g1, "figures/nb/concat_example_1.svg", isymbols=symbols)
gtn.draw(g2, "figures/nb/concat_example_2.svg", isymbols=symbols)

The two input graphs are below.

![concat ex 1](figures/nb/concat_example_1.svg)
![concat ex 2](figures/nb/concat_example_2.svg)


In [16]:
gtn.draw(gtn.concat([g1, g2]), "figures/nb/concat_example.svg", isymbols=symbols)

The concatenated graph is shown below.

![concat ex](figures/nb/concat_example.svg)

---

### Example

Suppose we have a list of graphs to concatenat $A_1, \ldots, A_n$ where the $i$-th graph has $s_i$ start states and $a_i$ accept states. How many new arcs will the concatenated graph require?

For each consecutive pair of graphs $A_i$ and $A_{i+1}$, we need to add $a_i * s_{i+1}$ connecting arcs in the concatenated graph. So the total number of additional arcs is:

$$\sum_{i=1}^{n-1} a_i * s_{i+1}$$.

---

## Summary

We've seen three basic operations so far:

- **closure**: The closed graph accepts any string in the input graph repeated zero or more times.
- **union**: The union graph accepts any string from any of the input graphs.
- **concatenation**: The concatenated graph accepts any string which can be formed by concatenating strings (repsecting order) from the input graphs.

---

### Example

Assume you are given the following individual graphs which recognizer $a$, $b$, and $c$ respectively.

In [17]:
fsa_a = gtn.Graph()
fsa_a.add_node(start=True)
fsa_a.add_node(accept=True)
fsa_a.add_arc(src_node=0, dst_node=1, label=0)

fsa_b = gtn.Graph()
fsa_b.add_node(start=True)
fsa_b.add_node(accept=True)
fsa_b.add_arc(src_node=0, dst_node=1, label=1)

fsa_c = gtn.Graph()
fsa_c.add_node(start=True)
fsa_c.add_node(accept=True)
fsa_c.add_arc(src_node=0, dst_node=1, label=2)

gtn.draw(fsa_a, "figures/nb/fsa_a.svg", isymbols=symbols)
gtn.draw(fsa_b, "figures/nb/fsa_b.svg", isymbols=symbols)
gtn.draw(fsa_c, "figures/nb/fsa_c.svg", isymbols=symbols)

<div class="figure">
  <div class="img">
    <img src="figures/nb/fsa_a.svg"/>
  </div>
  <div class="img">
    <img src="figures/nb/fsa_b.svg"/> 
  </div>
  <div class="img">
    <img src="figures/nb/fsa_c.svg"/>
  </div>
  <div class="caption" markdown="span">
     The three individual automata with languages $\{a\}$, $\{b\}$, and $\{c\}$ respectively.
  </div>
</div>

Using only the closure, union, and concatenation operations, construct the graph which recognizes any number of repeats of the strings $aa$, $bb$, and $cc$. For example $aabb$ and $bbaacc$ are in the language but $b$ and $ccaab$ are not.

In [18]:
aa = gtn.concat([fsa_a, fsa_a])
bb = gtn.concat([fsa_b, fsa_b])
cc = gtn.concat([fsa_c, fsa_c])

fsa_repeats = gtn.closure(gtn.union([aa, bb, cc]))

gtn.draw(fsa_repeats, "figures/nb/fsa_repeats.svg", isymbols=symbols)

First concatenate the individual graphs with themselves to get graphs which recognize $aa$, $bb$, and $cc$. Then take the union of the three concatenated graphs followed by the closure. The resulting graph is shown below.

<img src="figures/nb/fsa_repeats.svg"/>`

---