Math symbols to copy and paste:
```
· × ∑
≤ ≥ ≠ ≡ ≢
¬ ∧ ∨ ∀ ∃ 
⇐ ⇒ →
∩ ∪ ⊂ ⊆ ∈ ∉ ∅ ∁ ε
₀ ₁ ₂ ₃ ₄ ₅ ₆ ₇ ₈ ₉ ᵢ ⱼ ₘ ⁰ ¹ ² ³ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹ ⁱ ⁿ 
«»
```

#### Occurrences in Regular Expressions

Consider following grammar for regular expressions with choice (`|`), concatenation, repetition (`*`), option (`?`). 

    expression  →  term { '|' term }
    term  →  factor { factor }
    factor → atom [ '*' | '?' ]
    atom  →  plainchar | escapedchar | '(' expression ')'
    plainchar  →  ' ' | ... | '~'
    escapedchar  → '\\' ( '(' | ... | '|' )
    
In `aba|b`, there are 4 *occurrences* of symbols. Add attribute rules that compute the total number of occurrences of symbols! [2 points]

    expression(e)  →  term(e) { '|' term(f) « e := e + f »}
    term(e)  →  factor(e) { factor(f) « e := e + f »}
    factor(e) → atom(e) [ '*' | '?' ]
    atom(e)  →  plainchar(e) | escapedchar(e) | '(' expression(e) ')'
    plainchar(e)  →  ' ' « e := 1 » | ... | '~' « e := 1 »
    escapedchar(e)  → '\\' ( '(' « e := 1 » | ... | '|' « e := 1 » )

Here is a parser for above grammar:

In [2]:
PlainChars = ' !"#$%&\',-./0123456789:;<=>@ABCDEFGHIJKLMNO' + \
             'PQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{}~'
EscapedChars = '()*+?\\|'
FirstFactor = PlainChars + '\\('

def nxt():
    global pos, sym
    if pos < len(src): sym, pos = src[pos], pos+1
    else: sym = chr(0) # end of input symbol

def expression():
    term()
    while sym == '|': nxt(); term()

def term():
    factor()
    while sym in FirstFactor: factor()

def factor():
    atom()
    if sym in '*?': nxt()

def atom():
    if sym in PlainChars: nxt()
    elif sym == '\\':
        nxt()
        if sym in EscapedChars: nxt()
        else: raise Exception("invalid escaped character at " + str(pos))
    elif sym == '(':
        nxt(); expression()
        if sym == ')': nxt()
        else: raise Exception("')' expected at " + str(pos))
    else: raise Exception("invalid character at " + str(pos))

def occurrences(s: str):
    global src, pos;
    src, pos = s, 0; nxt(); expression()
    if sym != chr(0): raise Exception("unexpected character at " + str(pos))

Modify the parser according to your attribute grammar such that `occurrences` returns the number of occurrences! [2 points]

In [8]:
def expression():
    n = term()
    while sym == '|': nxt(); n += term()
    return n

def term():
    n = factor()
    while sym in FirstFactor: n += factor()
    return n

def factor():
    n = atom()
    if sym in '*?': nxt()
    return n

def atom():
    n = 0
    if sym in PlainChars: nxt(); n += 1
    elif sym == '\\':
        nxt()
        if sym in EscapedChars: nxt(); n += 1
        else: raise Exception("invalid escaped character at " + str(pos))
    elif sym == '(':
        nxt(); n += expression()
        if sym == ')': nxt()
        else: raise Exception("')' expected at " + str(pos))
    else: raise Exception("invalid character at " + str(pos))
    return n

def occurrences(s: str):
    global src, pos;
    src, pos = s, 0; nxt(); n = expression()
    if sym != chr(0): raise Exception("unexpected character at " + str(pos))
    return n

Here are some test cases [2 points]:

In [9]:
assert occurrences('aba') == 3
assert occurrences('ab|ac') == 4
assert occurrences('(ab)?aa*') == 4
assert occurrences('\(a*\)') == 3

Consider regular expressions with choice (`|`), concatenation, repetition (`*`), option (`?`). Given a regular expressions with `n` occurrences of symbols, how many states does an equivalent finite state automaton have at most when constructed as in the course notes? Give a brief explanation! [3 points]

Given a regular expressions with `n` occurrences of symbols, an equivalent finite state automaton will have at most `n + 1` states

    - a terminal symbol has two states
    - case E|F:
        - it has at most (n + 1) + (m + 1) - 1 = n + m + 1 states, the -1 is because the initial states of E and F are merged
    - case EF:
        - it has at most (n + 1) + (m + 1) - 1 = n + m + 1 states, the -1 is because the initial state of F is replaced by the final states of E
    - case E?:
        - E? = E|ε, ε makes the initial state of E to be final, no additional state is added. Therefore, the number of states of E? is the same as E, which is n + 1
    - case E*:
        - E* makes the initial state of E to be final, it also adds transition that loop from the final state to initial state. However, no additional state added. Therefore, the number of state E* is the same as E, which is n + 1

Given a finite state automaton with `n` states, how many states does an equivalent deterministic finite state automaton have at most, when constructed as in the course notes? Give a brief explanation! [2 points]

An equivalent deterministic finite state automaton will have at most`2ⁿ` states, the new states are constructed from a set of old states. For a set of `n` old states, there can be `2ⁿ` states constructed

Given a finite state automaton with `n` states, how many states does an equivalent total finite state automaton have at most, when constructed as in the course notes? Give a brief explanation! [2 points]

An equivalent total finite state will have at most `n + 1` states as it only adds a trap state.

Given a deterministic and total finite state automaton with `n` states, how many states does an equivalent minimal finite state automaton have at most, when constructed as in the course notes? Give a brief explanation! [2 points]

An equivalent minimal finite state automaton have at most `n` states as the number of equivalent classes can be the same as the number of states in the worst case.

Summarizing the previous answers, given a regular expression with `n` occurrences of symbols, how many states does a deterministic, total, and minimal finite state automaton have at most? [1 point]

 - `n` occurences of symbols indicates there can `n + 1` states. 
 - making the FSA deterministic, it will result in `2^(n + 1)` states.
 - making the FSA total, it will result in `2^(n + 1) + 1` states.
 - making the FSA minimal, the number of states stay the same.