#### Recursive Descent Parsing
CS 236 <br>
Fall 2023

Michael A. Goodrich <br>
Brigham Young University <br>
February 2023
***

Consider the following LL(1) Grammar
* $E \rightarrow N \ | \ OEE$
* $O \rightarrow +\ |\ *$ 
* $N \rightarrow 0\ |\ 1\ |\ 2\ |\ 3$

The starting non-terminal is $E$

The first sets are
* $FIRST(OEE) = FIRST(O) = \{+,*\}$
* $FIRST(N) = \{0,1,2,3\}$

***

Let's construct a class for the recursive descent parser (RDP). There will be a method for each nonterminal and a method to test whether the current input character what the grammar expects. We'll call this testing method ___current_input_matches_target_.

***

Notes about the project: 

Note 1: For project 2, this code will look a little different, as you will have to compare from a list of Token classes instead of a string input (these Tokens should have been created in project 1). In the code below, each production function compares two characters in the string, but for project 2 you will have to compare two token types, such as "COMMA" or "PERIOD". 

Note 2: For project 2, you do not have to create a self.FIRST entry for every single nonterminal. You only need to create an entry for the nonterminals that need it (aka it has a choice, and that choice doesn't include lambda as an option).

Note 3: For project 2, you will not be required to keep track of a trace or print out a trace. That is for example purposes only.

In [66]:
class RDP: # RDP stands for recursive descent parser
    def __init__(self):
        ###################################
        # Tuple defining an LL(1) grammar #
        # • set of nonterminals           #
        # • set of terminals              #
        # • starting nonterminal          #
        # • set of productions            #
        ###################################
        self.nonterminals = {'E','O','N'}           # set of nonterminals. Each nonterminal will have its own method
        self.starting_nonterminal = self.E          # Starting nonterminal
        self.terminals = {'+','*','0','1','2','3'}  # set of terminals
        # Productions                               # Defined within the nonterminal methods
        
        ##########################################
        # Define FIRST sets for each nonterminal #
        ##########################################
        self.FIRST = dict()
        self.FIRST['O'] = {'+','-'}
        self.FIRST['N'] = {'0','1','2','3'}

        ###########################################
        # Variables for Managing the input string #
        ###########################################
        self.input = None
        self.num_chars_read = 0

        ###########################################
        # Variables for printing the trace        #
        ###########################################
        self.tree_depth = 0

    def ParseInput(self,input):
        """ Call this method from main.
            It gets the input, calls the starting non-terminal,
            and does the accounting to see if the parse was successful
        """
        print("Parsing string",input)
        self.input = input
        self.starting_nonterminal() # Run the RDP by calling the starting non terminal
        if self.num_chars_read == self.input.__len__():
            print("Successfully parsed string")
        else:
            raise ValueError("End of parse and these characters have't been read: " + str(self.input[self.num_chars_read:]))
    
    ##############################################################
    # Each nonterminal gets its own method.                      #
    # The method knows which productions have the nonterminal on #
    # the left hand side of the production. The correct right    #
    # hand side of the production is chosen by looking at the    #
    # current input and the FIRST set of the right hand side     #
    ##############################################################
    def E(self):
        # Production E--> N | OEE
        self.__printEntryMessage("E")
        print(self.__getTabString(),"Trying to read input", self.__getCurrentInput())
        current_input = self.__getCurrentInput()
        if current_input in self.FIRST['N']:
            self.N()
        elif current_input in self.FIRST['O']:
            self.O()
            self.E()
            self.E()
        else: # error
            raise ValueError("Current input is " + str(current_input) + ", which cannot be produced by 'E'")
        self.__printExitMessage("E")
    def N(self):
        # Production N --> 0 | 1 | 2 | 3
        self.__printEntryMessage("N")
        print(self.__getTabString(),"Trying to read input", self.__getCurrentInput())
        if self.__current_input_matches_target('0') or \
            self.__current_input_matches_target('1') or \
            self.__current_input_matches_target('2') or \
            self.__current_input_matches_target('3'):
            self.__advanceInput() # move to the next current input character
            pass
        else:
            raise ValueError("Current input is " + str(self.__getCurrentInput()) + ", which cannot be produced by 'N'")
        self.__printExitMessage("N")
    def O(self):
        # Production O --> + | * 
        self.__printEntryMessage("O")
        print(self.__getTabString(),"Trying to read input", self.__getCurrentInput())
        if self.__current_input_matches_target('+') or \
            self.__current_input_matches_target('*'):
            self.__advanceInput() # move to the next current input character
            pass
        else:
            raise ValueError("Current input is " + str(self.__getCurrentInput()) + ", which cannot be produced by 'O'")
        self.__printExitMessage("O")

    ############################################################################
    # Helper methods for managing the input                                    #
    # One looks at the current input                                           #
    # Another reads the input and advances to the nexts input                  #
    # A third looks to see if the current input character matches a target     #
    # Convention in python is to prefix private methods by a double underscore #
    # https://www.geeksforgeeks.org/private-methods-in-python/                 #
    ############################################################################
    def __getCurrentInput(self):
        if self.num_chars_read == len(self.input):
            raise ValueError("Expected to read another input character but no inputs left to read")
        return self.input[self.num_chars_read]
    def __advanceInput(self):
        if self.num_chars_read == len(self.input):
            raise ValueError("Expected to advance to next input character but no inputs left to read")
        self.num_chars_read += 1
    def __current_input_matches_target(self,target_input):
        return self.__getCurrentInput() == target_input

    ########################
    # Other public methods #
    ########################
    def Reset(self):
        self.num_chars_read = 0
        self.input = ""

    ###############################
    # Parse tree printing methods #
    ###############################
    def __printEntryMessage(self,method_name):
        print(self.__getTabString(),"In", method_name,"method.")
        self.tree_depth += 1
    def __printExitMessage(self,method_name):
        self.tree_depth -= 1
        print(self.__getTabString(),"Returning from", method_name,".")
    def __getTabString(self):
        tab_string = ""
        for d in range(self.tree_depth):
            tab_string+="\t"
        return tab_string

In [67]:
my_RDP = RDP()
try:
    my_RDP.ParseInput('+12')
except ValueError as inst:
    message = inst.args
    print(message)


Parsing string +12
 In E method.
	 Trying to read input +
	 In O method.
		 Trying to read input +
	 Returning from O .
	 In E method.
		 Trying to read input 1
		 In N method.
			 Trying to read input 1
		 Returning from N .
	 Returning from E .
	 In E method.
		 Trying to read input 2
		 In N method.
			 Trying to read input 2
		 Returning from N .
	 Returning from E .
 Returning from E .
Successfully parsed string


In [60]:
my_RDP.Reset()
try:
    my_RDP.ParseInput('+123')
except ValueError as inst:
    message = inst.args
    print(message)

Parsing string +123
 In E method.
	 Trying to read input +
	 In O method.
		 Trying to read input +
	 Returning from O .
	 In E method.
		 Trying to read input 1
		 In N method.
			 Trying to read input 1
		 Returning from N .
	 Returning from E .
	 In E method.
		 Trying to read input 2
		 In N method.
			 Trying to read input 2
		 Returning from N .
	 Returning from E .
 Returning from E .
("End of parse and these characters have't been read: 3",)


In [61]:
my_RDP.Reset()
try:
    my_RDP.ParseInput('+1')
except ValueError as inst:
    message = inst.args
    print(message)

Parsing string +1
 In E method.
	 Trying to read input +
	 In O method.
		 Trying to read input +
	 Returning from O .
	 In E method.
		 Trying to read input 1
		 In N method.
			 Trying to read input 1
		 Returning from N .
	 Returning from E .
	 In E method.
('Expected to read another input character but no inputs left to read',)


In [68]:
my_RDP.Reset()
try:
    my_RDP.ParseInput('+1A')
except ValueError as inst:
    message = inst.args
    print(message)

Parsing string +1A
 In E method.
	 Trying to read input +
	 In O method.
		 Trying to read input +
	 Returning from O .
	 In E method.
		 Trying to read input 1
		 In N method.
			 Trying to read input 1
		 Returning from N .
	 Returning from E .
	 In E method.
		 Trying to read input A
("Current input is A, which cannot be produced by 'E'",)


In [None]:
my_RDP.Reset()
try:
    my_RDP.ParseInput('-31')
except ValueError as inst:
    message = inst.args
    print(message)