# Lab 1:  Finite-State Automata and Lexical Analysis

## Finite State Automata

Write a python function that accepts the same strings as the DFA presented.


![ab*(c|d)](http://www.cs.bu.edu/fac/snyder/cs320/Homeworks/Lab01.Diagram.jpg "ab*(c|d)")


In [11]:
def legal_example_dfa(string, verbose=True):
    state = "start"
    if verbose: print(state, end='')

    for letter in string:

        if state == "start" and letter == 'a':
            state = "read_b"
        elif state == "read_b" and letter == 'b':
            state = "read_b"  # redundant
        elif state == "read_b" and (letter == 'c' or letter == 'd'):
            state = "accept"
        elif state == "accept":                 
            state = "reject"
        else:
            state = "reject"

        if verbose: print(', ' + letter + '->' + str(state), end='')
    if verbose: print()
    return state == "accept"

Test the function to make sure it works!

In [12]:
assert (    legal_example_dfa('abbd'))
assert (    legal_example_dfa('ac'))
assert (    legal_example_dfa('abbbbbbd'))
assert (not legal_example_dfa('abbdd'))
assert (not legal_example_dfa('acd'))
assert (not legal_example_dfa('adcbbbb'))
assert (not legal_example_dfa('a'))

start, a->read_b, b->read_b, b->read_b, d->accept
start, a->read_b, c->accept
start, a->read_b, b->read_b, b->read_b, b->read_b, b->read_b, b->read_b, b->read_b, d->accept
start, a->read_b, b->read_b, b->read_b, d->accept, d->reject
start, a->read_b, c->accept, d->reject
start, a->read_b, d->accept, c->reject, b->reject, b->reject, b->reject, b->reject
start, a->read_b


We can also encode the DFA as a transition table

### DFAs as transition tables

In [4]:
transition_table = {1: {'a': 2, 'b': 4, 'c': 4, 'd': 4},
                    2: {'a': 4, 'b': 2, 'c': 3, 'd': 3},
                    3: {'a': 4, 'b': 4, 'c': 4, 'd': 4},
                    4: {'a': 4, 'b': 4, 'c': 4, 'd': 4}}

What is the alphabet of this language?

In [5]:
alphabet = {'a', 'b', 'c', 'd'}

Every state must handle every character in the alphabet

In [6]:
for state in transition_table:
    assert (set(transition_table[state].keys()) == alphabet)

What are the states in ```transition_table```?

In [7]:
states = {1, 2, 3, 4}

Every state must be in the table, and characters can only map to states

In [8]:
assert (set(transition_table.keys()) == states)
for state in states:
    assert (set(transition_table[state].keys()) == alphabet)
    assert (set(transition_table[state].values()).issubset(states))

Simlarly, how can we encode the DFA that defines a wellformated dollar amount? (warning, you may need to copy and paste a bit)

In [9]:
dollar_table = {
    "start":   {'$': "dollars", '.': "reject", '0': "reject",  '1': "reject",  '2': "reject",  '3': "reject",  '4': "reject",  '5': "reject",  '6': "reject",  '7': "reject",  '8': "reject",  '9': "reject"},
    "dollars": {'$': "reject",  '.': "cent1",  '0': "dollars", '1': "dollars", '2': "dollars", '3': "dollars", '4': "dollars", '5': "dollars", '6': "dollars", '7': "dollars", '8': "dollars", '9': "dollars"},
    "cent1":   {'$': "reject",  '.': "reject", '0': "cent2",   '1': "cent2",   '2': "cent2",   '3': "cent2",   '4': "cent2",   '5': "cent2",   '6': "cent2",   '7': "cent2",   '8': "cent2",   '9': "cent2"},
    "cent2":   {'$': "reject",  '.': "reject", '0': "accept",  '1': "accept",  '2': "accept",  '3': "accept",  '4': "accept",  '5': "accept",  '6': "accept",  '7': "accept",  '8': "accept",  '9': "accept"},
    "accept":  {'$': "reject",  '.': "reject", '0': "reject",  '1': "reject",  '2': "reject",  '3': "reject",  '4': "reject",  '5': "reject",  '6': "reject",  '7': "reject",  '8': "reject",  '9': "reject"},
    "reject":  {'$': "reject",  '.': "reject", '0': "reject",  '1': "reject",  '2': "reject",  '3': "reject",  '4': "reject",  '5': "reject",  '6': "reject",  '7': "reject",  '8': "reject",  '9': "reject"}}

What is the alphabet? What are the transitions?

In [10]:
states = {"start", "dollars", "cent1", "cent2", "accept", "reject"}
alphabet = {'$', '.', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'}

In [11]:
assert (set(dollar_table.keys()) == states)
for state in states:
    assert (set(dollar_table[state].keys()) == alphabet)
    assert (set(dollar_table[state].values()).issubset(states))

### Parsing DFAs

Write a general function that takes a transition table and determines if the string is accepted by that regular language

In [12]:
def accepts(string, table, start_state, accept_states, verbose=True):
    state = start_state
    if verbose: print(state, end='')
    for letter in string:
        state = table[state][letter]
        if verbose: print(', ' + letter + '->' + str(state), end='')
    # This can be used to short-circuit when a failure occurs
    #        if state == error_state:
    #            if verbose: print()
    #            return False
    if verbose: print()
    return (state in accept_states)

It should accept and reject the strings we expect

In [13]:
assert (    accepts('abbd',     transition_table, 1, {3}))
assert (    accepts('ac',       transition_table, 1, {3}))
assert (    accepts('abbbbbbd', transition_table, 1, {3}))
assert (not accepts('abbdd',    transition_table, 1, {3}))
assert (not accepts('acd',      transition_table, 1, {3}))
assert (not accepts('adcbbbb',  transition_table, 1, {3}))
assert (not accepts('a',        transition_table, 1, {3}))
assert (not accepts('',         transition_table, 1, {3}))

1, a->2, b->2, b->2, d->3
1, a->2, c->3
1, a->2, b->2, b->2, b->2, b->2, b->2, b->2, d->3
1, a->2, b->2, b->2, d->3, d->4
1, a->2, c->3, d->4
1, a->2, d->3, c->4, b->4, b->4, b->4, b->4
1, a->2
1


Create a function that will accept or reject pretty printed dollars using your function and your transition table

In [16]:
def is_dollar(string):
    return accepts(string, dollar_table, "start", {"accept"})

It should also work as we expect

In [15]:
assert (    is_dollar("$123.45"))
assert (    is_dollar("$.45"))
assert (    is_dollar("$000011111.45"))
assert (    is_dollar("$1234567890.45"))
assert (not is_dollar(""))
assert (not is_dollar("000011111.45"))
assert (not is_dollar("$00001111145"))
assert (not is_dollar("$000011111.4"))
assert (not is_dollar("$000011111.456"))

start, $->dollars, 1->dollars, 2->dollars, 3->dollars, .->cent1, 4->cent2, 5->accept
start, $->dollars, .->cent1, 4->cent2, 5->accept
start, $->dollars, 0->dollars, 0->dollars, 0->dollars, 0->dollars, 1->dollars, 1->dollars, 1->dollars, 1->dollars, 1->dollars, .->cent1, 4->cent2, 5->accept
start, $->dollars, 1->dollars, 2->dollars, 3->dollars, 4->dollars, 5->dollars, 6->dollars, 7->dollars, 8->dollars, 9->dollars, 0->dollars, .->cent1, 4->cent2, 5->accept
start
start, 0->reject, 0->reject, 0->reject, 0->reject, 1->reject, 1->reject, 1->reject, 1->reject, 1->reject, .->reject, 4->reject, 5->reject
start, $->dollars, 0->dollars, 0->dollars, 0->dollars, 0->dollars, 1->dollars, 1->dollars, 1->dollars, 1->dollars, 1->dollars, 4->dollars, 5->dollars
start, $->dollars, 0->dollars, 0->dollars, 0->dollars, 0->dollars, 1->dollars, 1->dollars, 1->dollars, 1->dollars, 1->dollars, .->cent1, 4->cent2
start, $->dollars, 0->dollars, 0->dollars, 0->dollars, 0->dollars, 1->dollars, 1->dollars, 1->dollar