Project 1 requires you to create a set of FSAs, one for each legal token type allowed in the _Datalog_ database language. Here are the description of the first token types from the project description.

| Token Type    | Description           |  
| :--           | :--                   |  
| COMMA         | The ',' character     |
| PERIOD        | The '.' character     |
| Q_MARK        | The '?' character     |
| LEFT_PAREN    | The '(' character     |
| RIGHT_PAREN   | The ')' character     |
| COLON         | The ':' character     |
| COLON-DASH    | The string ":-"       |

This means that you will have a collection of FSAs that you will need to manage. 

***

One way to implement project 1 is to run each FSA on the same input string and see what each FSA returns, as illustrated in the left side of the figure. 

![image](Managing_project1_FSAs.drawio.svg)

Each FSA outputs accept or reject, represented by the T or F. Each output can be collected into some sort of data structure like a list. That list is the input to a completely new finite state machine (FSM), one that decides which token to output based on the behaviors of all the FSAs. 

Recall from the textook and from the discussion in class that a FSM behaves differently from an FSA. Whereas an FSA only outputs T or F, depending on whether it accepts or not, FSMs give some kind of output each time a transition is taken. Thus, notice how the arrows on the FSM (right half of figure above) have two components: an input, which is the list represented by the sequence of true/false values in the square brackets, and an output, which is the token that the lexer should outpu. The input and output are separated by a comma, which is how the textbook differentiates between the input and output in the FSMs it draws.

The way this FSM operates is that it reads the list of true/false values and outputs the correct token. The top loop in the FSM says that the FSA for the left parenthesis succeeded and all others failed [TFFF], so the output should be the LEFT_PAREN token. The bottom right loop says that both the colon and colon-dash FSAs succeeded [FFTT], so the output should be the COLON-DASH token.

***
We can now start to think about how to program the collection of FSAs and the manager FSM.  First, define the base_class from which all FSAs inherit. The code below is copied straigh from the previous Jupyter notebook tutorial.

In [22]:
from typing import Callable as function
class FSA:
    """ FSA Base or Super class"""
    def __init__(self, name: str) -> None:
        """Class constructor"""
        ###############################################################
        # Define the five elements of the FSA
        ###############################################################
        # Set of states: Each state S0, S1, S2, Serr will be represented by its own method
        # Set of inputs: I is the set of alphanumeric characters, checked by isalnum()
        self.start_state: function = self.s0  # We'll always have the start state named S0
        self.accept_states: set[function] = set()  # No accept states defined
        
        ###############################################################
        # Define four variables that are used within the FSA, some
        # to make the FSA run and some that help us understand how 
        # the FSA works
        ###############################################################
        self.input_string: str = ""  # Default empty input
        self.fsa_name: str = name
        self.num_chars_read: int = 0

    #########################################################
    # Define each state as a function                       #
    # Within each function, define the transition function. #
    # Each transition function reads the current input and  #
    # choose the next state based on the input              #
    #########################################################
    def s0(self) -> None:
        """Every FSA must have a start state, and we'll always name 
        it S0. The method for the start state must be defined in the
        derived class since it's not defined here."""
        raise NotImplementedError()  # This line causes an error to 
                                    # occur if the child classes don't implement this method

    ############################
    # Public Manager Functions
    ############################
    def run(self, input_string: str) -> bool:
        ###############################################################
        # This function will be called to make the FSA execute
        # It records the input string,
        # sets the current state to the start state
        # and then calls a state method that returns the next state
        # Each state method accesses the input_string
        ###############################################################
        # Remember input_string
        self.input_string = input_string
        # Set current state to start state
        current_state: function = self.start_state
        # Call current state, which starts the FSA
        while self.num_chars_read < len(self.input_string):
            current_state = current_state()
        # Check whether the FSA ended in an accept state
        outcome: bool = False
        if current_state in self.accept_states:
            outcome = True  # Accept if the FSA ended in an accept state
        return outcome

    def reset(self) -> None:
        self.num_chars_read = 0
        self.history = []

    ############################
    # Public Getters and Setters
    ############################
    def get_name(self) -> str:
        return self.fsa_name

    def set_name(self, fsa_name: str) -> None:
        self.fsa_name = fsa_name

    ############################
    # Private Helper functions
    ############################
    def __get_current_input(self) -> str:
        current_input: str = self.input_string[self.num_chars_read]
        self.num_chars_read += 1
        return current_input

***
We now define an FSA class for each of the FSAs in the figure above. The Colon and ColonDash FSAs are copied from the previous tutorial.

In [21]:
class ColonDashFSA(FSA):
    def __init__(self):
        FSA.__init__(self,"ColonDashFSA") # You must invoke the __init__ of the parent class
        self.accept_states.add(self.s2) # Since self.accept_states is defined in parent class, I can use it here
    
    def s0(self):
        current_input: str = self._FSA__get_current_input()
        next_state: function = None
        if current_input == ':':
            next_state: function = self.s1
        else:
            next_state: function = self.s_err
        return next_state

    def s1(self):
        current_input: str = self._FSA__get_current_input()
        next_state: function = None
        if current_input == '-':
            next_state: function = self.s2
        else:
            next_state: function = self.s_err
        return next_state

    def s2(self):
        current_input: str = self._FSA__get_current_input()
        next_state: function = self.s2  # loop in state s2
        return next_state

    def s_err(self):
        current_input: str = self._FSA__get_current_input()
        next_state: function = self.s_err  # loop in state serr
        return next_state

***

In [20]:
class ColonFSA(FSA):
    def __init__(self):
        FSA.__init__(self,"ColonFSA") # You must invoke the __init__ of the parent class
        self.accept_states.add(self.s1) # Since self.accept_states is defined in parent class, I can use it here
    
    def s0(self):
        current_input: str = self._FSA__get_current_input()
        next_state: function = None
        if current_input == ':':
            next_state: function = self.s1
        else:
            next_state: function = self.s_err
        return next_state

    def s1(self):
        current_input: str = self._FSA__get_current_input()
        next_state: function = self.s1  # loop in state s1
        return next_state

    def s_err(self):
        current_input: str = self._FSA__get_current_input()
        next_state: function = self.s_err  # loop in state serr
        return next_state

***
The FSA for left and right parentheses tokens follow the pattern for the colon, but they look for either the left or right parenthesis instead of the colon.

In [19]:
class LeftParenFSA(FSA):
    def __init__(self):
        FSA.__init__(self,"LeftParenFSA") # You must invoke the __init__ of the parent class
        self.accept_states.add(self.s1) # Since self.accept_states is defined in parent class, I can use it here
    
    def s0(self):
        current_input: str = self._FSA__get_current_input()
        next_state: function = None
        if current_input == '(':
            next_state: function = self.s1
        else:
            next_state: function = self.s_err
        return next_state

    def s1(self):
        current_input: str = self._FSA__get_current_input()
        next_state: function = self.s1  # loop in state s1
        return next_state

    def s_err(self):
        current_input: str = self._FSA__get_current_input()
        next_state: function = self.s_err  # loop in state serr
        return next_state


***

In [18]:
class RightParenFSA(FSA):
    def __init__(self):
        FSA.__init__(self,"RightParenFSA") # You must invoke the __init__ of the parent class
        self.accept_states.add(self.s1) # Since self.accept_states is defined in parent class, I can use it here
    
    def s0(self):
        current_input: str = self._FSA__get_current_input()
        next_state: function = None
        if current_input == ')':
            next_state: function = self.s1
        else:
            next_state: function = self.s_err
        return next_state

    def s1(self):
        current_input: str = self._FSA__get_current_input()
        next_state: function = self.s1  # loop in state s1
        return next_state

    def s_err(self):
        current_input: str = self._FSA__get_current_input()
        next_state: function = self.s_err  # loop in state serr
        return next_state

I'm going to do a quick sanity check here. It's a simple set of tests to see if my code works like I expect.

In [17]:
my_colon_dash_fsa: ColonDashFSA = ColonDashFSA()
my_colon_fsa: ColonFSA = ColonFSA()
my_right_paren_fsa: RightParenFSA = RightParenFSA()
my_left_paren_fsa: LeftParenFSA = LeftParenFSA()
input_string: str = "()"
accept_status_colon_dash: bool = my_colon_dash_fsa.run(input_string)
accept_status_colon: bool = my_colon_fsa.run(input_string)
accept_status_right_paren: bool = my_right_paren_fsa.run(input_string)
accept_status_left_paren: bool = my_left_paren_fsa.run(input_string)

if accept_status_colon_dash: print("The ", my_colon_dash_fsa.get_name(), "FSA accepted the input string '",input_string,"'")
else: print("The ", my_colon_dash_fsa.get_name(), "FSA did not accept the input string '",input_string,"'")
if accept_status_colon: print("The ", my_colon_fsa.get_name(), "FSA accepted the input string '",input_string,"'")
else: print("The ", my_colon_fsa.get_name(), "FSA did not accept the input string '",input_string,"'")
if accept_status_right_paren: print("The ", my_right_paren_fsa.get_name(), "FSA accepted the input string '",input_string,"'")
else: print("The ", my_right_paren_fsa.get_name(), "FSA did not accept the input string '",input_string,"'")
if accept_status_left_paren: print("The ", my_left_paren_fsa.get_name(), "FSA accepted the input string '",input_string,"'")
else: print("The ", my_left_paren_fsa.get_name(), "FSA did not accept the input string '",input_string,"'")


The  ColonDashFSA FSA did not accept the input string ' () '
The  ColonFSA FSA did not accept the input string ' () '
The  RightParenFSA FSA did not accept the input string ' () '
The  LeftParenFSA FSA accepted the input string ' () '


*** 
We've implemented each FSA on the lefthand side of the figure above in their own classes using inheritance. Now we need to figure out how to run each FSA on the same input and collect the outputs. Let's use a _dictionary_ data structure to represent the collection of FSAs. Begin by creating instances of each FSA type.


In [16]:
right_paren_fsa: RightParenFSA = RightParenFSA()   # I'm using my own made up naming convention. It's not very good. Look up better styles online.
left_paren_fsa: LeftParenFSA = LeftParenFSA()     # The convention is that classes have capital letters and instances use all lower case. uggh
colon_fsa: ColonFSA = ColonFSA()
colon_dash_fsa: ColonDashFSA = ColonDashFSA()

The dictionary will be indexed (keyed) by the instances of the FSAs, and the dictionary values will be the return status of each FSA when run on the input. Chat GPT gives a good response to the prompt _how do i create a dictionary with only keys in python_, but something kinda bugs me about Chat GPT: it's using data generated by people who have posted information on the internet but it doesn't attribute credit to them. So, I put the same prompt into google and am using one of the responses from the geeksforgeeks tutorial https://www.geeksforgeeks.org/python-initialize-a-dictionary-with-only-keys-from-a-list/ .

In [23]:
fsa_keys: list[function] = [right_paren_fsa,left_paren_fsa,colon_fsa,colon_dash_fsa]
fsa_dict: dict[function, bool] = dict.fromkeys(fsa_keys, False)  # Initialize the outputs from each FSA to false
print(fsa_dict)

{<__main__.RightParenFSA object at 0x1052652d0>: False, <__main__.LeftParenFSA object at 0x105280b90>: False, <__main__.ColonFSA object at 0x105282990>: False, <__main__.ColonDashFSA object at 0x105283710>: False}


I'm taking advantage of something that I like in Python, specifically that methods and instantiated classes are treated as _first-class objects_. Quoting from Chat GPT in response to the prompt _why can we call methods first order elements in python_:

    In Python, methods are first-class objects, which means that they can be treated as any other object, such as variables, data types, and functions. This means that we can pass methods as arguments to other functions, return methods from functions, and store methods in data structures like lists and dictionaries.

That means that we can use the name of the FSA __objects__ as keys to the dictionary. As a reminder, the word __object__ in this context means an instance of a class, that is, the "thing" we create when we do something like:

    right_paren_FSA = RightParen_FSA()

The "RightParen_FSA()" is the way we create an object from a class, and the "right_paren_FSA" is the object we create.

The output of "print(fsa_dict)" should look something like (notice the word "object")

    {<_main.RightParen_FSA object at 0x???????>: False, ...}

The keys in the dictionary look like

    <_main.RightParen_FSA object at 0x???????>

which is just how Python says "You created an instance of the RightParen_FSA. I call that instance and object. And the object resides in the computer memory at location 0x???????."

***

We can iterate through each element in the dictionary and print out the values. I used the prompt 

    print all key value pairs in python using list comprehensions


In [24]:
for key,value in fsa_dict.items():
    print("value for FSA", key, " is ",value)

value for FSA <__main__.RightParenFSA object at 0x1052652d0>  is  False
value for FSA <__main__.LeftParenFSA object at 0x105280b90>  is  False
value for FSA <__main__.ColonFSA object at 0x105282990>  is  False
value for FSA <__main__.ColonDashFSA object at 0x105283710>  is  False


***

Let's choose an input string, run it through each FSA in our FSA dictionary, and print out the results. We'll be smart and print out the name of the FSA rather than the actual key.

In [25]:
input_string: str = ")"
for FSA in fsa_dict.keys():
    FSA.reset() # Better make sure I reset things before I try this
    fsa_dict[FSA] = FSA.run(input_string)
print("********\nInput string is",input_string)
for key,value in fsa_dict.items():
    print("value for FSA", key, " is ",value)
print("\n")

input_string = ":-"
for FSA in fsa_dict.keys():
    FSA.reset() # Better make sure I reset things before I try this
    fsa_dict[FSA] = FSA.run(input_string)
print("********\nInput string is",input_string)
for key,value in fsa_dict.items():
    print("value for FSA", key, " is ",value)
print("\n")


********
Input string is )
value for FSA <__main__.RightParenFSA object at 0x1052652d0>  is  True
value for FSA <__main__.LeftParenFSA object at 0x105280b90>  is  False
value for FSA <__main__.ColonFSA object at 0x105282990>  is  False
value for FSA <__main__.ColonDashFSA object at 0x105283710>  is  False


********
Input string is :-
value for FSA <__main__.RightParenFSA object at 0x1052652d0>  is  False
value for FSA <__main__.LeftParenFSA object at 0x105280b90>  is  False
value for FSA <__main__.ColonFSA object at 0x105282990>  is  True
value for FSA <__main__.ColonDashFSA object at 0x105283710>  is  True




***

We've built some pieces for assembling the FSAs into a list that can be managed by a FSM but we haven't built the FSM yet. I've collected all the FSAs that we've built into a single file for convenience, so the FSM manager begins by importing these FSAs.

Let's look at the figure we created to describe the relationships between the token-specific FSAs and the managing FSM.

![image](Managing_project1_FSAs.drawio.svg)



Look at how the order in which the token-specific FSAs are evaluated determines the logic we use in our management FSM. We want our code to match this. The figure says to run the FSAs in the following order

 1. left parenthesis
 2. right parenthesis
 3. colon
 4. colon dash

 We can implement this order in our initialization of the manager FSM and order the elements of our dictionary elements.

    self.FSA_keys = [self.left_paren_FSA,self.right_paren_FSA,self.colon_FSA,self.colon_dash_FSA]

and then initialize our dictionary using these keys

    self.fsa_dict = dict.fromkeys(self.FSA_keys, False)


***

We'll then run each of the FSAs (see the Lex method)

    # Run each FSA on the input and collect their outputs
    for FSA in self.fsa_dict.keys():
        self.fsa_dict[FSA] = FSA.Run(input_string)

and the output token will be selected using the logic shown in the figure above:

    if output == [True,False,False,False]: output_token = "LEFT_PAREN"
    elif output == [False,True,False,False]: output_token = "RIGHT_PAREN"
    elif output == [False,False,True,False]: output_token = "COLON"
    elif output == [False,False,True,True]: output_token = "COLON_DASH"
        

    


__TODO: make this match our style guide for project 1__

In [30]:
from fsa_classes_definitions import * 
from typing import Callable as function

class LexerFSM():
    def __init__(self) -> None:
        ##########################
        # Create each needed FSA #
        ##########################
        self.right_paren_fsa: RightParenFSA = RightParenFSA()   # I'm using my own made up naming convention. It's not very good. Look up better styles online.
        self.left_paren_fsa: LeftParenFSA = LeftParenFSA()     # The convention is that classes have capital letters and instances use all lower case. uggh
        self.colon_fsa: ColonFSA = ColonFSA()
        self.colon_dash_fsa: ColonDashFSA = ColonDashFSA()
        #####################################
        # Create the FSA manager dictionary #
        #####################################
        self.fsa_keys: list[function] = [self.left_paren_fsa,self.right_paren_fsa,self.colon_fsa,self.colon_dash_fsa]
        self.fsa_dict: dict[function, bool] = dict.fromkeys(self.fsa_keys, False)  # Initialize the outputs from each FSA to false
    
    ################
    # Lexer method #
    ################
    def lex(self, input_string: str) -> function:
        # Run each FSA on the input and collect their outputs
        for FSA in self.fsa_dict.keys():
            self.fsa_dict[FSA] = FSA.run(input_string)
        # Run the FSM that decides what to do with the outputs of the FSAs
        return self.__manager_fsm__()

    ###################
    # Private Methods #
    ###################
    def __manager_fsm__(self) -> str:
        # A finite state machine implemented as a sequence of if statements
        output_token: str = "UNDEFINED"
        # Turn the dictionary values into a list
        output_list: list[bool] = [value for value in self.fsa_dict.values()]
        if output_list == [True,False,False,False]: output_token = "LEFT_PAREN"
        elif output_list == [False,True,False,False]: output_token = "RIGHT_PAREN"
        elif output_list == [False,False,True,False]: output_token = "COLON"
        elif output_list == [False,False,True,True]: output_token = "COLON_DASH"
        return output_token

    ###################
    # Utility Methods #
    ###################
    def reset(self) -> None:
        for FSA in self.fsa_dict.keys(): FSA.reset()
    

***

WARNING: The code above is not efficient and is easy to mess up since it uses implicit order in the dictionary. I wrote the code to make it easy to understand where the FSMs and FSAs belong and how they fit together, not to make the code efficient.

Let's test it on some inputs.

In [31]:
my_lexer: LexerFSM = LexerFSM()
input_string: str = ":"
my_lexer.reset()
print("On input",input_string, "The lexer output ",my_lexer.lex(input_string))

input_string = ":-"
my_lexer.reset()
print("On input",input_string, "The lexer output ",my_lexer.lex(input_string))

input_string = "("
my_lexer.reset()
print("On input",input_string, "The lexer output ",my_lexer.lex(input_string))

input_string = ")"
my_lexer.reset()
print("On input",input_string, "The lexer output ",my_lexer.lex(input_string))

input_string = "(:-:)"
my_lexer.reset()
print("On input",input_string, "The lexer output ",my_lexer.lex(input_string))



On input : The lexer output  COLON
On input :- The lexer output  COLON_DASH
On input ( The lexer output  LEFT_PAREN
On input ) The lexer output  RIGHT_PAREN
On input (:-:) The lexer output  LEFT_PAREN
