---

# 3. More on Regular Languages
**[Emil Sekerinski](http://www.cas.mcmaster.ca/~emil/), McMaster University, January 2024**

---

Note: The following line imports all Python declarations from Chapter 2 Regular Languages

In [None]:
%run "../02 Regular Languages/Regular Languages.ipynb"

### Complement and Intersection

The language of regular expressions is extended by two constructs:

- `~E`, where `E` is a regular expression (*complement*)
- `E₁ & E₂`, where `E₁`, `E₂` are regular expressions (*intersection*)

<div style="float:right;background-color:lightgrey;border-left:2em solid white">

**Example.** `a*&~(aaa)` describes all  
sequences over `a` except `aaa`:

 `L(a*&~(aaa))`  
`=  (⋃ n ≥ 0 • Lⁿ(a)) ∩ L(~aaa))`  
`=  (⋃ n ≥ 0 • aⁿ) ∩ ∁L(aaa))`  
`=  (⋃ n ≥ 0 • aⁿ) ∩ ∁(L(a) L(a) L(a))`  
`=  (⋃ n ≥ 0 • aⁿ) ∩ ∁({a} {a} {a})`  
`=  (⋃ n ≥ 0 • aⁿ) ∩ ∁{aaa}`  
`=  (⋃ n ≥ 0 • aⁿ) ∩ (a* – {aaa})`  
`=  (⋃ n ≥ 0 • aⁿ) – {aaa}`  
`=  (⋃ n ≥ 0 | n ≠ 3 • aⁿ)`  
</div>

The languages described by `~E` and `E₁ & E₂` are,

| E | L(E)          |
|:-------------------|:------------------|
| `~E`               | `∁L(E)`            |
| `E₁ & E₂`          | `L(E₁) ∩ L(E₂)`   |

where `∁A` is the complement of set `A`. For a language over `T`, the universe is `T*`, so `∁L = T* – L`.

For example, the sequences over `{a, b}` with an even number of `a`'s and an even number of `b`'s described by:
```
b*(ab*ab*)* & a*(ba*ba*)*
```

**Question.** What is the language described by `(aa)* & (aaa)*`? Can this be expressed simpler?

*Answer.*  
The language of `(aa)*` is all sequences over `a` of even length. The language of `(aaa)*` is all sequences over `a` with length being a multiple of `3`. Their intersection is all sequences over `a` with the length being a multiple of `6`, which can be simpler expressed as `(aaaaaa)*`.

In regular expressions, unary `~` binds tightest and `&` binds stronger than`|` but weaker than concatenation. That is, `~ab&c` is understood as `((~a)b)&c`.

Given regular expressions `E`, `F`, `G`, the basic rules of complement and intersection are:

- *Double Complement:* `~~E = E`
- *Commutativity:* `E & F  = F & E`
- *Associativity:* `E & F  & G = E & (F & G)`
- *Idempotence:* `E & E = E`
- *De Morgan:* `~(E & F) = ~E | ~F` and `~(E | F) = ~E & ~F`
- *Distributivity:* `E | (F & G) = (E & F) | (E & G)` and `E & (F | G) = (E | F) & (E | G)`
- *Distributivity:* `E (F & G) = E F & E G` and `(E & F) G = E G & E F`

Since languages are sets, the above rules follow from those of sets. For example, De Morgan's rule for regular expressions follows from De Morgan's rule for sets:

```
   L(~(E & F))
= ∁L(E & F)
= ∁(L(E) ∩ L(F))
= ∁L(E) ∪ ∁L(F)
= L(~E) ∪ L(~F)
= L(~E | ~F)
```

The abstract syntax of regular expressions is extended accordingly:

    expr ::= 'ε' | sym | '~' expr | expr '|' expr | expr '&' expr | expr expr | expr '*'

Constructors for representing complement and intersection in Python are introduced:

In [None]:
class Complement:
    def __init__(self, e): self.e = e
    def __repr__(self): return '~(' + str(self.e) + ')'

class Intersection:
    def __init__(self, e1, e2): self.e1, self.e2 = e1, e2
    def __repr__(self): return '(' + str(self.e1) + '&' + str(self.e2) + ')'

### Extended Regular Expression to Finite State Automaton

<img style="width:16em;float:right;border-left:2em solid white" src="./img/dfacomplement.svg"></img>
Consider a deterministic automaton `A = (T, Q, R, q₀, F)` where the transition relation is total. An automaton `A'` for accepting the complement of `L(A)` is obtained by simply swapping final and non-final states swapped, i.e. `A' = (T, Q, R, q₀, Q – F)`. 

From De Morgan and double negation, we can conclude that:

```
E₁ & E₂ = ~(~E₁ | ~E₂)
```

Thus, an automaton for accepting `E₁ & E₂` can be constructed by constructing one for `~(~E₁ | ~E₂)` instead. The Python implementation extends the previous implementation with cases for complement and intersection:

In [None]:
def REToFSA(re) -> FiniteStateAutomaton:
    global QC
    if re == '': q = QC; QC += 1; return FiniteStateAutomaton(set(), {q}, set(), q, {q})
    elif type(re) == str:
        q = QC; QC += 1; r = QC; QC += 1
        return FiniteStateAutomaton({re}, {q, r}, {(q, re, r)}, q, {r})
    elif type(re) == Complement:
        t = QC; QC += 1 # t is uniquely named trap state
        A = totalFSA(deterministicFSA(REToFSA(re.e)), t)
        return FiniteStateAutomaton(A.T, A.Q, A.R, A.q0, A.Q - A.F)
    elif type(re) == Choice:
        A1, A2 = REToFSA(re.e1), REToFSA(re.e2)
        R2 = {(A1.q0 if q == A2.q0 else q, a, r) for (q, a, r) in A2.R} # A2.q0 renamed to A1.q0 in A2.R
        F2 = {A1.q0 if q == A2.q0 else q for q in A2.F} # A2.q0 renamed to A1.q0 in A2.F
        return FiniteStateAutomaton(A1.T | A2.T, A1.Q | A2.Q, A1.R | R2, A1.q0, A1.F | F2)
    elif type(re) == Intersection:
        return REToFSA(Complement(Choice(Complement(re.e1), Complement(re.e2))))
    elif type(re) == Conc:
        A1, A2 = REToFSA(re.e1), REToFSA(re.e2)
        R = A1.R | {(f, a, r) for (q, a, r) in A2.R if q == A2.q0 for f in A1.F} | \
            {(q, a, r) for (q, a, r) in A2.R if q != A2.q0}
        F = (A2.F - {A2.q0}) | (A1.F if A2.q0 in A2.F else set())
        return FiniteStateAutomaton(A1.T | A2.T, A1.Q | A2.Q, R, A1.q0, F)
    elif type(re) == Star:
        A = REToFSA(re.e)
        R = A.R | {(f, a, r) for (q, a, r) in A.R if q == A.q0 for f in A.F}
        return FiniteStateAutomaton(A.T, A.Q, R, A.q0, {A.q0} | A.F)
    else: raise Exception('not a regular expression')

def convertRegExToFSA(re):
    global QC; QC = 0
    return REToFSA(re)

Continuing with `a*&~(aaa)`, we see that `aaa` is not accepted, but other sequences are:

In [None]:
E9 = Intersection(Star('a'), Complement(Conc(Conc('a', 'a'), 'a'))) # a*&~(aaa)
A9 = deterministicFSA(convertRegExToFSA(E9))

In [None]:
assert accepts(A9, '')
assert accepts(A9, 'a')
assert not accepts(A9, 'b')
assert not accepts(A9, 'aaa')
assert accepts(A9, 'aaaa')

Here is the generated automaton after minimization and renaming:

In [None]:
# renameFSA(minimizeFSA(A9)) #uncomment: long output

Here are tests showing that `(aa)*&(aaa)*` includes only sequences over `a` with length being a multiple of `6`: 

In [None]:
E10 = Intersection(Star(Conc('a', 'a')), Star(Conc(Conc('a', 'a'), 'a'))) # (aa)*&(aaa)*
A10 = deterministicFSA(convertRegExToFSA(E10))

In [None]:
assert accepts(A10, '')
assert not accepts(A10, 'aa')
assert not accepts(A10, 'aaa')
assert accepts(A10, 'aaaaaa')
assert not accepts(A10, 'aaaaaaa')

Using the procedure `equivalentFSA`, we can check that indeed `(aa)*&(aaa)* = (aaaaaa)*`:

In [None]:
E11 = Star(Conc(Conc(Conc(Conc(Conc('a', 'a'), 'a'), 'a'),'a'),'a'))
assert equivalentFSA(A10, convertRegExToFSA(E11))

### Historic Notes and Further Reading

### Bibliography

<div class="cite2c-biblio"></div>

<div class="cite2c-biblio"></div>