---
**Chapter 2: Introduction to Computer Theory by Daniel I. A. Cohen**

In [None]:
from itertools import product
from pprint import pprint

def kleene_star(S):
    result = [{} for _ in range(8)]
    for repeat in range(8):
        for combo in product(S, repeat=repeat):
            combo = "".join(combo)
            length = len(combo)
            if length < len(result):
                if combo not in result[length]:
                    result[length][combo] = 1
                else: result[length][combo] += 1
    for length, words in enumerate(result):
        if words is not None:
            print(length, len(words), words)

---
**Problem 1:** Consider the language $S^{*}$, where $S =$ {*a b*}.
1. How many words does the language have of length 2?
2. of length 3?
3. of length *n*?

**Answer 1:**
1. $4$
2. $8$
3. $2^{n}$

In [None]:
# Observe the powers of two sequence: 1, 2, 4, 8, 16, 32, 64, ...
S = {"a", "b"}
kleene_star(S)

---
**Problem 2:** Consider the language $S^{*}$, where $S =$ {*aa b*}.
1. How many words does the language have of length 4?
2. of length 5?
3. of length 6?
4. What can be said in general?

**Answer 2:**
1. $5$
2. $8$
3. $13$
4. *See below.**

In [None]:
# Observe the Fibonacci sequence: 1, 1, 2, 3, 5, 8, 13, ...
S = {"aa", "b"}
kleene_star(S)

---
**Problem 3:** Consider the language $S^{*}$, where $S =$ {*ab ba*}.
1. Write out all the words in $S^{*}$ that have seven or fewer letters.
2. Can any word in this language contain the substrings *aaa* or *bbb*?
3. What is the smallest word *not* in this language?

**Answer 3:**

1. *See below.*
2. No. $S$ does not contain *aa* or *bb*. Any substring has at most two consecutive *a*'s or *b*'s.
3. Either *a* or *b*.

In [None]:
# Answer 3.1
S = {"ab", "ba"}
kleene_star(S)

---
**Problem 4:** Consider the language $S^{*}$, where $S =$ {*a ab ba*}.
1. Is the string (*abbba*) a word in this language?
2. Write out all the words in this language with six or fewer letters.
3. What is another way in which to describe the words in this language? Be careful, this is not simply the language of all words without bbb.

**Answer 4:**

1. No. Neither *b* nor *bb* are in $S$. At most two consecutive *b*'s.
2. *See below.*
3. All the strings of *a*'s and *b*'s where each *b* is adjacent to a unique *a*.

In [None]:
# Answer 4.2
S = {"a", "ab", "ba"}
kleene_star(S)

---
**Problem 5:** Consider the language $S^{*}$, where $S =$ {*aa aba baa*}.
1. Show that the words *aabaa*, *baaabaaa*, and *baaaaababaaaa* are all in this language.
2. Can any word in this language be interpreted as a string of elements from $S$ in two different ways?
3. Can any word in this language have an odd total number of *a*'s?

**Answer 5:**
1. *aabaa* = *aa baa* \
   *baaabaaa* = *baa aba aa* \
   *baaaaababaaaa* = *baa aa aba baa aa*
2. No. None of the words in $S$ is a prefix of another word in $S$. Neither *a*, *ab*, nor *ba* are elements of $S$.
3. No. Each element has two *a*'s and any composite string will have only multiples.

---
**Problem 6:** Consider the language $S^{*}$, where $S =$ {*xx xxx*}.
In how many ways can $x^{19}$ be written as the product of words in $S$?
This means: How many different factorizations are there of $x^{19}$ into *xx* and *xxx*?

**Answer 6:**

$\text{Let } a = \#(\text{copies of } xx), \quad b = \#(\text{copies of } xxx)$.

The length condition is $2a + 3b = 19, \quad a,b \in \mathbb{Z}_{\ge 0}$.

For each feasible pair $(a,b)$, the number of distinct orderings is $\binom{a+b}{a}$.

Solving $2a + 3b = 19$:

$$
\begin{aligned}
b &= 1 \quad\Rightarrow\quad a = 8, &\binom{9}{8} &= 9, \\
b &= 3 \quad\Rightarrow\quad a = 5, &\binom{8}{5} &= 56, \\
b &= 5 \quad\Rightarrow\quad a = 2, &\binom{7}{2} &= 21.
\end{aligned}
$$

Summing over all solutions: $9 + 56 + 21 = 86$.

$$
\boxed{\text{There are 86 distinct factorizations.}}
$$

---
**Problem 7:** Consider the language PALINDROME over the alphabet {*a b*}.

$\textbf{Let}$ $\text{PALINDROME} = \{\, w \in \{a,b\}^* \mid w = w^R \,\}$, where $w^R$ denotes the reverse of $w$.

* (i) Prove that if $x$ is in PALINDROME, then so is $x^{n}$ for any *n*.
* (ii) Prove that if $y^{3}$ is in PALINDROME, then so is $y$.
* (iii) Prove that if $z^{n}$ is in PALINDROME for some $n$ (greater than 0), then $z$ itself is also.
* (iv) Prove that PALINDROME has as many words of length 4 as it does of length 3.
* (v) Prove that PALINDROME has as many words of length $2n$ as it has of length $2n - 1$. How many words is that?

---
**Answer 7(i):**

$\textbf{Claim.}$ If $x \in \text{PALINDROME}$, then $x^n \in \text{PALINDROME}$ for any integer $n \ge 0$.

$\textbf{Proof.}$

Recall that for any strings *u,v*, we have $(uv)^R = v^R u^R.$

Since $x \in \text{PALINDROME}$, we have $x = x^R$. Thus, $(x^n)^R = (x^R)^n = x^n$.

---
**Answer 7(ii):**

$\textbf{Claim.}$ If $y^3 \in \text{PALINDROME}$, then $y \in \text{PALINDROME}$.

$\textbf{Proof.}$

Recall that for any strings *u,v*, we have $(uv)^R = v^R u^R.$

Since $y^3 \in \text{PALINDROME}$, we have $yyy = (yyy)^R$, $(yyy)^R = y^Ry^Ry^R$. Given they are one-to-one and corresponding, therefore $y = y^R$.

---
**Answer 7(iii):**

$\textbf{Claim.}$ If $y^3 \in \text{PALINDROME}$, then $y \in \text{PALINDROME}$.

$\textbf{Proof.}$

Recall that for any strings *u,v*, we have $(uv)^R = v^R u^R.$

Since $z^n \in \text{PALINDROME}$, we have $z^n = (z^n)^R$, $(z^n)^R = (z^R)^n$. Given they are one-to-one and corresponding, therefore $z = z^R$.

**Answer 7(iv):**

For alphabet {*a b*}, a palindrome of length $n$ is determined by its first ${\lceil n/2 \rceil}$ positions, each with 2 choices. Thus the number of palindromes of length $n$ is $2^{\lceil n/2 \rceil}$. For $n=3$ the length is $2$ and for $n=4$ the length is $2$, hence the same number of palindromes, which is $4$.


---
**Answer 7(v):**

For alphabet {*a b*}, a palindrome of length $n$ is determined by its first ${\lceil n/2 \rceil}$ positions, each with 2 choices. Thus the number of palindromes of length $n$ is $2^{\lceil n/2 \rceil}$. For $n=2N$ the length is ${\lceil 2N/2 \rceil}$ and for $n=2N-1$ the length is ${\lceil (2N-1)/2 \rceil}$, but $2N/2=(2N-1)/2=N$ hence the same number of palindromes, which is $2^{\lceil N \rceil}$.

---
**Problem 8:** Show that if the concatenation of two words (neither $Λ$) in PALINDROME is also a word in PALINDROME, then both words are powers of some other word; that is, if $x$ and $y$ and $xy$ are all in PALINDROME, then there is a word $z$ such that $x = z^{p}$  and $y = z^{q}$ for some integers $p$ and $q$ (maybe $p$ or $q = 1$).

* $x = x^R$
* $y = y^R$
* $(xy) = (xy)^R = (y^Rx^R) = (yx)$
  
so $x$ and $y$ commute.

We prove by induction on $|x|+|y|$ that if nonempty words $x,y$ commute, then there exists $z$ with $x=z^m$ and $y=z^n$ for some $m,n\ge 1$.

If $|x|=|y|$, then $xy=yx$ implies $x=y$, and taking $z=x$ gives the claim. Otherwise, without loss of generality $|x|>|y|$. Since $xy=yx$, the word $y$ is a prefix of $x$; write $x=yt$ for some (possibly empty) $t$. Substituting into $xy=yx$ yields

$(yt)y = y(yt) \;\;\Rightarrow\;\; yty = yyt \;\;\Rightarrow\;\; ty=yt$,

where we used left-cancellation in the free monoid. If $t=\varepsilon$, then $x=y$ and we are done as before. If $t\neq\varepsilon$, then $t$ and $y$ are shorter nonempty words that commute, and $|t|+|y|<|x|+|y|$. By the induction hypothesis, there exists a word $z$ and integers $p,q\ge 1$ such that $t=z^p$ and $y=z^q$. Hence

$x=yt = z^q z^p = z^{p+q} \quad\text{and}\quad y=z^q$,

so taking $m=p+q$ and $n=q$ completes the proof.

Thus, from $x,y,xy\in$ PALINDROME with $x,y\neq\varepsilon$, we conclude that $x$ and $y$ are powers of a common word.

---
**Problem 9:**
* (i) Let $S$ = {*ab bb*} and let $T$ = {*ab bb bbbb*}. Show that $S^{*} = T^{*}$.
* (ii) Let $S$ = {*ab bb*} and let $T$ = {*ab bb bbb*}. Show that $S^{*} \neq T^{*}$, but that $S^{*} \subset T^{*}$.
* (iii) What principle does this illustrate?

Proof that $S^{*} = T^{*}$

Let $S =$ {*ab bb*}, $T =$ {*ab bb bbbb*}.

By definition of the Kleene star:

$
A^{0} = \{\varepsilon\},
A^{n} = \{x_{1}x_{2}\cdots x_{n} \mid x_{i} \in A \ \text{for all } i\},
A^{*} = \bigcup_{n \ge 0} A^{n}.
$

1. $S^{*} \subseteq T^{*}$

Let $w \in S^{*}$. Then $w \in S^{n}$ for some $n \ge 0$, so $w = s_{1}s_{2}\cdots s_{n}
\quad\text{with each } s_{i} \in S$.

Since $S \subseteq T$, each $s_{i} \in T$, hence $w \in T^{n} \subseteq T^{*}$.

2. $T^{*} \subseteq S^{*}$

Let $w \in T^{*}$. Then $w \in T^{k}$ for some $k \ge 0$, so $w = x_{1}x_{2}\cdots x_{k} \quad\text{with each } x_{i} \in T = \{ab,\, bb,\, bbbb\}$.

We can rewrite each \(x_{i}\) as a sequence of elements from $S$:
- If $x_{i} = ab$, replace it by $(ab)$.
- If $x_{i} = bb$, replace it by $(bb)$.
- If $x_{i} = bbbb$, note that $bbbb = (bb)(bb)$, so replace it by ($bb$ $bb$).

Concatenating all these replacements yields $w = y_{1}y_{2}\cdots y_{m}\quad\text{with each } y_{j} \in S$.

Thus $w \in S^{m} \subseteq S^{*}$.

3. Since $S^{*} \subseteq T^{*}$ and $T^{*} \subseteq S^{*}$, we have $S^{*} = T^{*}$.

---
**Problem 10:** How does the situation in Problem 9 change if we replace the operator $^{*}$ with the operator $^{+}$ as defined in this chapter? Note the language $S^{+}$ means the same same $S^{*}$, but does not allow the "concatenation of no words" of $S$.

**Answer 10:**
No changes in the the equalities and inclusions.

---
**Problem 11:** Prove that for all sets $S$,
* (i) $(S^{+})^{*} = (S^{*})^{*}$
* (ii) $(S^{+})^{+} = S^{+}$
* (iii) Is $(S^{*})^{+} = (S^{+})^{*}$ for all sets $S$?

**Answer 11:**

* (i) $(S^+)^*$ includes $Λ$ even if $S$ does not, so $(S^+)^*=S^*$. $S^*=S^**$ by Theorem 1.
* (ii) There can be no factor in $S$ that is not in $S^+$, $(S^+)^+ \subseteq S^+$. In general any set is contained in its positive closure $S^+ \subseteq (S^+)^+$. Therefore, $(S^+)^+ = S^+$.
* (iii) Yes.

---
**Problem 12:** Let $S$ = {*a bb bab abaab*}.
1. Is *abbabaabab* in $S^{*}$?
2. Is *abaabbabbaabb*?
3. Does any word in $S^{*}$ have an odd total number of *b*'s?

**Answer 12:**

1. No.
2. No.
3. No.

---
**Problem 13:** Suppose that for some language $L$ we can always concatenate two words in $L$ and get another word in $L$, if and only if the words are not the same. That is, for any words $w_{1}$ and $w_{2}$ in L where $w_{1} \neq w_{2}$, the word $w_{1}$$w_{2}$ is in $L$ but the word $w_{1}$$w_{1}$ is not in $L$. Prove that this cannot happen.

**Answer 13:** This is the same as saying that the language L would allow all concatenations that did not produce squares. First observe that Λ = ΛΛ, so Λ cannot be in the language. Consider $w_1 \neq w_2 $ and $ w_1w_2 \in L$. Let $w_3 = w_1w_2$, since $Λ \notin L$, $w_3 \neq w_2$, so $w_4 = w_2w_3 \in L$ where $w_4 \neq w_1$, finally let $w_5 = w_1w_4 \in L$. However, $w5 = w_1w_2w_1w_2 = w_3w_3$ which is square, so $w_5 \notin L$.


---
**Problem 14:** Let us define $(S^{**})^{*} = S^{***}$.

1. Is this set bigger than $S^{*}$?
2. Is it bigger that $S$?

**Answer 14:**

1. No. $(S^{**})^{*} = (S^{*})^{*} = S^{*}$ by Theorem 1.
2. It is often bigger than $S$.

---
**Problem 15:** Let $w$ be a string of letters and let the language $T$ be defined by adding $w$ to the language $S$. Suppose further that $T^{*} = S^{*}$.
* (i) Is it necessarily true that $w \in S$?
* (ii) Is it necessarily true that $w \in S^{*}$?

**Answer 15:**
* (i) no
* (ii) yes, $T = S + \{w\} \implies w \in T \implies w \in T^*$ and $T^* = S^* \implies w \in S^*$.

---
**Problem 16:** Give an example of a set $S$ such that the language $S^{*}$ has more six-letter words than eight-letter words. Does there exists an $S^{*}$ such that it has more six-letter words that twelve-letter words?

**Answer 16:**
Let $S = $ {*aaa*}, $S^*$ has one six-letter word and no seven-letter words and no eight-letter words. However, it is impossible for $S^*$ for $S$ to contain more six-letter words that twelve-letter words, because for every six-letter word $w$ there is a twelve-letter word $ww$ in $S^*$.

---
**Problem 17:**
* (i) Consider the language $S^{*}$, where $S$ = {*aa ab ba bb*}. Give another description of this language.
* (ii) Give an example of a set $S$ such that $S^{*}$ *only* contains all possible strings of *a*'s and *b*'s that have length divisible by 3.
* (iii) Let S be all strings of *a*'s and *b*'s with odd length. What is $S^{*}$?

**Answer 17:**
- (i) All words over $Σ =$ {*a b*} of even length.
- (ii) $S =$ {*aaa aab aba abb baa bab bba bbb*}
- (iii) All strings of *a*'s and *b*'s except Λ

---
**Problem 18:**
* (i) If $S =$ {*a b*} and $T^{*} = S^{*}$, prove that $T$ must contain $S$.
* (ii) Find another pair of sets $S$ and $T$ such that if $T^{*} = S^{*}$, then $S \subset T$.

**Answer 18:**
- (i) $S^*$ and $T^*$ both represent the set of all strings of *a*'s *b*'s. Therefore $T$ must include at least the words *a* and *b*, which is the set $S$.
- (ii) $S =$ {*a bb*}, $T =$ {*a aa bb*}.

---
**Problem 19:** One student suggested the following algorithm to test a string of *a*'s and *b*'s to see if it is a word in $S^{*}$, where $S =$ {*aa ba aba abaab*}.
* Step 1, cross off the longest set of characters from the front of the string that is a word in $S$.
* Step 2, repeat step 1 until it is no longer possible. If what remains is the string $Λ$, the original string was a word in $S^{*}$. If what remains is not $Λ$ (this means that some letters are left, but we cannot find a word in $S$ at the beginning), the original string was not a word in $S^{*}$. Find a string that disproves this algorithm.

**Answer 19:**
The word *abaaba* disproves the algorithm.

---
**Problem 20:** A language $L_1$ is smaller than another language $L_2$ if $L_1 \subset L_2$ and $L_1 \neq L_2$. Let $T$ be any language closed under concatenation; that is, if $t_1 \in T$ and $t_2 \in T$, then $t_1t_2$ is also an element of $T$. Show that if $T$ contains $S$ but $T \neq S^{*}$, then $S^{*}$ is smaller that $T$. We can summarize this by saying that $S^{*}$ is the smallest closed language containing $S$.

**Answer 20:**
Since $T$ is closed and $S \subset T$, any factors in $S$ concatentated together two at a time will be a word in $T$. Likewise concatentating factors in $S$ any number of times produces a word in $T$. That is any word in $S^*$ is also in $T$. However we are given that $T \neq S^*$ so $T$ contains some words not in $S^*$. We can conclude that $S^*$ is a proper subset of $T$, in other words $S^*$ is smaller that $T$, and in symbols $S^* \subset T$.