#### Recursive Descent Parsing with FOLLOW Sets
CS 236 <br>
Fall 2023

Michael A. Goodrich <br>
Brigham Young University <br>
February 2023
***

Consider the following LL(1) Grammar which uses tail recursion
* $E \rightarrow nDI \ | \ OEE$
* $I \rightarrow DI\ |\ \lambda$ 
* $O \rightarrow +\ |\ *$ 
* $D \rightarrow 0\ |\ 1\ |\ 2\ |\ 3$

The starting non-terminal is $E$.

The terminals are $\{0,1,2,3,+,-,n\}$, where the $'n'$ indicates the start of a multi-digit number.

The first sets are
* $FIRST(nDI) = \{n\}$
* $FIRST(OEE) = FIRST(O) = \{+,*\}$
* $FIRST(I) = \{0,1,2,3\}$
* $FIRST(D) = \{0,1,2,3\}$

***


The follow set for the production that has tail recursion is
 $FOLLOW(I) = \{\#,n,+,*\}$

 where $\#$ indicates the end of the input string. 

 You can derive the FOLLOW set from the following three parse trees:

 ![Parse trees used to derive FOLLOW(I)](ParseTreesForFollowSet.png)

 The terminal to the right of $I$ in the parse tree goes into the FOLLOW set. The parse tree on the left has not terminals that follow, so we use the \# character to indicate that the end of input has been reached. Thus, \# means end of input and belongs in the follow set of $I$.

***

Let's construct a class for the recursive descent parser (RDP). There will be a method for each nonterminal and a method to test whether the current input character what the grammar expects. We'll call this testing method ___match_.

***

In [28]:
class RDP: # RDP stands for recursive descent parser
    def __init__(self):
        ###################################
        # Tuple defining an LL(1) grammar #
        # • set of nonterminals           #
        # • set of terminals              #
        # • starting nonterminal          #
        # • set of productions            #
        ###################################
        self.nonterminals = {'E','O','I', 'D'}          # set of nonterminals. Each nonterminal will have its own method
        self.starting_nonterminal = self.E              # Starting nonterminal
        self.terminals = {'+','*','0','1','2','3','#'}  # set of terminals
        # Productions                                   # Defined within the nonterminal methods
        
        ##########################################
        # Define FIRST sets for each nonterminal #
        ##########################################
        self.FIRST = dict()
        self.FIRST['O'] = {'+','-'}
        self.FIRST['I'] = {'0','1','2','3'}
        self.FIRST['D'] = {'0','1','2','3'}

        ##########################################
        # Define FOLLOW sets for nonterminal I   #
        ##########################################
        self.FOLLOW = dict()
        self.FOLLOW['I'] = {'#','n','+','*'}

        ###########################################
        # Variables for Managing the input string #
        ###########################################
        self.input = None
        self.num_chars_read = 0

        ###########################################
        # Variables for printing the trace        #
        ###########################################
        self.tree_depth = 0

    def ParseInput(self,input):
        """ Call this method from main.
            It gets the input, calls the starting non-terminal,
            and does the accounting to see if the parse was successful
        """
        print("Parsing string",input, "***")
        self.input = input
        self.starting_nonterminal() # Run the RDP by calling the starting non terminal
        if self.__getCurrentInput() == '#':
            print("Successfully parsed string")
        else:
            raise ValueError("End of parse and these characters have't been read: " + str(self.input[self.num_chars_read:]))
    
    ##############################################################
    # Each nonterminal gets its own method.                      #
    # The method knows which productions have the nonterminal on #
    # the left hand side of the production. The correct right    #
    # hand side of the production is chosen by looking at the    #
    # current input and the FIRST set of the right hand side     #
    ##############################################################
    def E(self):
        # Production E--> nDI | OEE
        self.__printEntryMessage("E")
        current_input = self.__getCurrentInput()
        self.__printMessageAboutCurrentString("trying to parse input character",current_input)
        if self.__match(current_input,'n'):
            print(self.__getTabString(),"Terminal 'n' matched by input character", self.__getCurrentInput())
            self.__advanceInput()
            self.D()
            self.I()
        elif current_input in self.FIRST['O']:
            self.O()
            self.E()
            self.E()
        else: # error
            raise ValueError("Current input character is " + str(current_input) + ", which cannot be produced by 'E'")
        self.__printExitMessage("E")
    def D(self):
        # Production D --> 0 | 1 | 2 | 3
        self.__printEntryMessage("D")
        current_input = self.__getCurrentInput()
        self.__printMessageAboutCurrentString("trying to parse input character", current_input)
        if self.__match(current_input,'0') or \
            self.__match(current_input,'1') or \
            self.__match(current_input,'2') or \
            self.__match(current_input,'3'):
            print(self.__getTabString(),"successfully parsed input character", current_input)
            self.__advanceInput() # move to the next current input character
            pass
        else:
            raise ValueError("Current input character is " + str(current_input) + ", which cannot be produced by 'D'")
        self.__printExitMessage("D")
    def O(self):
        # Production O --> + | * 
        self.__printEntryMessage("O")
        current_input = self.__getCurrentInput()
        self.__printMessageAboutCurrentString("trying to parse input character", current_input)
        if self.__match(current_input,'+') or \
            self.__match(current_input,'*'):
            print(self.__getTabString(),"successfully parsed input character", current_input)
            self.__advanceInput() # move to the next current input character
            pass
        else:
            raise ValueError("Current input character is " + str(current_input) + ", which cannot be produced by 'O'")
        self.__printExitMessage("O")
    def I(self):
        # Production I --> DI | lambda 
        self.__printEntryMessage("I")
        current_input = self.__getCurrentInput()
        self.__printMessageAboutCurrentString("trying to parse input character", current_input)
        if current_input in self.FIRST['D']: 
            self.D()
            self.I()  # If the input was in the first set then you must execute the prouction I --> DI
            pass
        elif current_input in self.FOLLOW['I']:
            self.__printMessageAboutCurrentString("Exiting 'I' tail recursion because input is in FOLLOW set",current_input)
            pass
        else:
            raise ValueError("Current input character is " + str(current_input) + ", which cannot be produced by 'I'")
        self.__printExitMessage("I")

    ############################################################################
    # Helper methods for managing the input                                    #
    # One looks at the current input                                           #
    # Another reads the input and advances to the nexts input                  #
    # A third looks to see if the current input character matches a target     #
    # Convention in python is to prefix private methods by a double underscore #
    # https://www.geeksforgeeks.org/private-methods-in-python/                 #
    ############################################################################
    def __getCurrentInput(self):
        if self.num_chars_read > len(self.input):
            raise ValueError("Expected to read another input character but no inputs left to read")
        elif self.num_chars_read == len(self.input):
            self.__printMessageAboutCurrentString("Reading the end of string symbol in getCurrentInput","")
            return '#' # return end of string
        else:
            return self.input[self.num_chars_read]
    def __advanceInput(self):
        if self.num_chars_read > len(self.input):
            raise ValueError("Expected to advance to next input character but reached end of input")
        self.num_chars_read += 1
    def __match(self,current_input, target_input):
        return current_input == target_input

    ########################
    # Other public methods #
    ########################
    def Reset(self):
        self.num_chars_read = 0
        self.input = ""

    ###############################
    # Parse tree printing methods #
    ###############################
    def __printEntryMessage(self,method_name):
        print(self.__getTabString(),"In", method_name,"method.")
        self.tree_depth += 1
    def __printExitMessage(self,method_name):
        self.tree_depth -= 1
        print(self.__getTabString(),"Returning from", method_name,".")
    def __printMessageAboutCurrentString(self,message, current_input):
        print(self.__getTabString(),message, current_input)
    def __getTabString(self):
        tab_string = ""
        for d in range(self.tree_depth):
            tab_string+="\t"
        return tab_string

In [29]:
my_RDP = RDP()
try:
    my_RDP.ParseInput('n12')
    # Test case to see if E ==> nDI ==> n1I ==> n1DI ==> n12I ==> n12 lambda
    # sucessfully uses the follow set
except ValueError as inst:
    message = inst.args
    print(message)


Parsing string n12 ***
 In E method.
	 trying to parse input character n
	 Terminal 'n' matched by input character n
	 In D method.
		 trying to parse input character 1
		 successfully parsed input character 1
	 Returning from D .
	 In I method.
		 trying to parse input character 2
		 In D method.
			 trying to parse input character 2
			 successfully parsed input character 2
		 Returning from D .
		 In I method.
			 Reading the end of string symbol in getCurrentInput 
			 trying to parse input character #
			 Exiting 'I' tail recursion because input is in FOLLOW set #
		 Returning from I .
	 Returning from I .
 Returning from E .
 Reading the end of string symbol in getCurrentInput 
Successfully parsed string


In [30]:
my_RDP.Reset()
try:
    my_RDP.ParseInput('+n12n31')
except ValueError as inst:
    message = inst.args
    print(message)


Parsing string +n12n31 ***
 In E method.
	 trying to parse input character +
	 In O method.
		 trying to parse input character +
		 successfully parsed input character +
	 Returning from O .
	 In E method.
		 trying to parse input character n
		 Terminal 'n' matched by input character n
		 In D method.
			 trying to parse input character 1
			 successfully parsed input character 1
		 Returning from D .
		 In I method.
			 trying to parse input character 2
			 In D method.
				 trying to parse input character 2
				 successfully parsed input character 2
			 Returning from D .
			 In I method.
				 trying to parse input character n
				 Exiting 'I' tail recursion because input is in FOLLOW set n
			 Returning from I .
		 Returning from I .
	 Returning from E .
	 In E method.
		 trying to parse input character n
		 Terminal 'n' matched by input character n
		 In D method.
			 trying to parse input character 3
			 successfully parsed input character 3
		 Returning from D .
		 In I method.


In [31]:
my_RDP.Reset()
try:
    my_RDP.ParseInput('+123')
except ValueError as inst:
    message = inst.args
    print(message)

Parsing string +123 ***
 In E method.
	 trying to parse input character +
	 In O method.
		 trying to parse input character +
		 successfully parsed input character +
	 Returning from O .
	 In E method.
		 trying to parse input character 1
("Current input character is 1, which cannot be produced by 'E'",)


In [32]:
my_RDP.Reset()
try:
    my_RDP.ParseInput('+1')
except ValueError as inst:
    message = inst.args
    print(message)

Parsing string +1 ***
		 In E method.
			 trying to parse input character +
			 In O method.
				 trying to parse input character +
				 successfully parsed input character +
			 Returning from O .
			 In E method.
				 trying to parse input character 1
("Current input character is 1, which cannot be produced by 'E'",)


In [33]:
my_RDP.Reset()
try:
    my_RDP.ParseInput('+n11n')
except ValueError as inst:
    message = inst.args
    print(message)

Parsing string +n11n ***
				 In E method.
					 trying to parse input character +
					 In O method.
						 trying to parse input character +
						 successfully parsed input character +
					 Returning from O .
					 In E method.
						 trying to parse input character n
						 Terminal 'n' matched by input character n
						 In D method.
							 trying to parse input character 1
							 successfully parsed input character 1
						 Returning from D .
						 In I method.
							 trying to parse input character 1
							 In D method.
								 trying to parse input character 1
								 successfully parsed input character 1
							 Returning from D .
							 In I method.
								 trying to parse input character n
								 Exiting 'I' tail recursion because input is in FOLLOW set n
							 Returning from I .
						 Returning from I .
					 Returning from E .
					 In E method.
						 trying to parse input character n
						 Terminal 'n' matched by input character n
						 In D method.
				

In [34]:
my_RDP.Reset()
try:
    my_RDP.ParseInput('-31')
except ValueError as inst:
    message = inst.args
    print(message)

Parsing string -31 ***
							 In E method.
								 trying to parse input character -
								 In O method.
									 trying to parse input character -
("Current input character is -, which cannot be produced by 'O'",)
