#### Recursive Descent Parsing with FOLLOW Sets
CS 236 <br>
Fall 2023

Michael A. Goodrich <br>
Brigham Young University <br>
February 2023
***

Consider the following LL(1) Grammar which uses tail recursion
* $E \rightarrow nDI \ | \ OEE$
* $I \rightarrow DI\ |\ \lambda$ 
* $O \rightarrow +\ |\ *$ 
* $D \rightarrow 0\ |\ 1\ |\ 2\ |\ 3$

The starting non-terminal is $E$.

The terminals are $\{0,1,2,3,+,-,n\}$, where the $'n'$ indicates the start of a multi-digit number.

The first sets are
* $FIRST(nDI) = \{n\}$
* $FIRST(OEE) = FIRST(O) = \{+,*\}$
* $FIRST(I) = \{0,1,2,3\}$
* $FIRST(D) = \{0,1,2,3\}$

***


The follow set for the production that has tail recursion is
 $FOLLOW(I) = \{\#,n,+,*\}$

 where $\#$ indicates the end of the input string. 

 You can derive the FOLLOW set from the following three parse trees:

 ![Parse trees used to derive FOLLOW(I)](ParseTreesForFollowSet.png)

 The terminal to the right of $I$ in the parse tree goes into the FOLLOW set. The parse tree on the left has not terminals that follow, so we use the \# character to indicate that the end of input has been reached. Thus, \# means end of input and belongs in the follow set of $I$.

***

Let's construct a class for the recursive descent parser (RDP). There will be a function for each nonterminal and a function to test whether the current input character what the grammar expects. We'll call this testing function ___match_.

***

In [1]:
from typing import Callable
class RDP:  # RDP stands for recursive descent parser
    def __init__(self) -> None:
        ###################################
        # Tuple defining an LL(1) grammar #
        # • set of nonterminals           #
        # • set of terminals              #
        # • starting nonterminal          #
        # • set of productions            #
        ###################################
        self.nonterminals: set[str] = {'E', 'O', 'I', 'D'}          # set of nonterminals. Each nonterminal will have its own function
        self.starting_nonterminal: Callable[[], None] = self.e              # Starting nonterminal
        self.terminals: set[str] = {'+', '*', '0', '1', '2', '3', '#'}  # set of terminals
        # Productions                                   # Defined within the nonterminal functions

        ##########################################
        # Define FIRST sets for each nonterminal #
        ##########################################
        self.first: dict[str, set[str]] = dict()
        self.first['O'] = {'+', '-'}
        self.first['I'] = {'0', '1', '2', '3'}
        self.first['D'] = {'0', '1', '2', '3'}

        ##########################################
        # Define FOLLOW sets for nonterminal I   #
        ##########################################
        self.follow: dict[str, set[str]] = dict()
        self.follow['I'] = {'#', 'n', '+', '*'}

        ###########################################
        # Variables for Managing the input string #
        ###########################################
        self.input: str = None
        self.num_chars_read: int = 0

        ###########################################
        # Variables for printing the trace        #
        ###########################################
        self.tree_depth: int = 0

    def parse_input(self, input: str) -> None:
        """ Call this function from main.
            It gets the input, calls the starting non-terminal,
            and does the accounting to see if the parse was successful
        """
        print("Parsing string", input, "***")
        self.input = input
        self.starting_nonterminal()  # Run the RDP by calling the starting nonterminal
        if self.__get_current_input() == '#':
            print("Successfully parsed string")
        else:
            raise ValueError("End of parse and these characters haven't been read: " + str(self.input[self.num_chars_read:]))

    ##############################################################
    # Each nonterminal gets its own function.                      #
    # The function knows which productions have the nonterminal on #
    # the left hand side of the production. The correct right    #
    # hand side of the production is chosen by looking at the    #
    # current input and the FIRST set of the right hand side     #
    ##############################################################
    def e(self) -> None:
        # Production E--> nDI | OEE
        self.__print_entry_message("E")
        current_input: str = self.__get_current_input()
        self.__print_message_about_current_string("trying to parse input character", current_input)
        if self.__match(current_input, 'n'):
            print(self.__get_tab_string(), "Terminal 'n' matched by input character", self.__get_current_input())
            self.__advance_input()
            self.d()
            self.i()
        elif current_input in self.first['O']:
            self.o()
            self.e()
            self.e()
        else:  # error
            raise ValueError("Current input character is " + str(current_input) + ", which cannot be produced by 'E'")
        self.__print_exit_message("E")

    def d(self) -> None:
        # Production D --> 0 | 1 | 2 | 3
        self.__print_entry_message("D")
        current_input: str = self.__get_current_input()
        self.__print_message_about_current_string("trying to parse input character", current_input)
        if self.__match(current_input, '0') or \
                self.__match(current_input, '1') or \
                self.__match(current_input, '2') or \
                self.__match(current_input, '3'):
            print(self.__get_tab_string(), "successfully parsed input character", current_input)
            self.__advance_input()  # move to the next current input character
            pass
        else:
            raise ValueError("Current input character is " + str(current_input) + ", which cannot be produced by 'D'")
        self.__print_exit_message("D")

    def o(self) -> None:
        # Production O --> + | *
        self.__print_entry_message("O")
        current_input: str = self.__get_current_input()
        self.__print_message_about_current_string("trying to parse input character", current_input)
        if self.__match(current_input, '+') or \
                self.__match(current_input, '*'):
            print(self.__get_tab_string(), "successfully parsed input character", current_input)
            self.__advance_input()  # move to the next current input character
            pass
        else:
            raise ValueError("Current input character is " + str(current_input) + ", which cannot be produced by 'O'")
        self.__print_exit_message("O")

    def i(self) -> None:
        # Production I --> DI | lambda
        self.__print_entry_message("I")
        current_input: str = self.__get_current_input()
        self.__print_message_about_current_string("trying to parse input character", current_input)
        if current_input in self.first['D']:
            self.d()
            self.i()  # If the input was in the first set then you must execute the production I --> DI
            pass
        elif current_input in self.follow['I']:
            self.__print_message_about_current_string("Exiting 'I' tail recursion because input is in FOLLOW set",
                                                      current_input)
            pass
        else:
            raise ValueError("Current input character is " + str(current_input) + ", which cannot be produced by 'I'")
        self.__print_exit_message("I")

    ############################################################################
    # Helper functions for managing the input                                    #
    # One looks at the current input                                           #
    # Another reads the input and advances to the next input                  #
    # A third looks to see if the current input character matches a target     #
    # Convention in python is to prefix private functions by a double underscore #
    # https://www.geeksforgeeks.org/private-functions-in-python/                 #
    ############################################################################
    def __get_current_input(self) -> str:
        if self.num_chars_read > len(self.input):
            raise ValueError("Expected to read another input character but no inputs left to read")
        elif self.num_chars_read == len(self.input):
            self.__print_message_about_current_string("Reading the end of string symbol in getCurrentInput", "")
            return '#'  # return end of string
        else:
            return self.input[self.num_chars_read]

    def __advance_input(self) -> None:
        if self.num_chars_read > len(self.input):
            raise ValueError("Expected to advance to the next input character but reached the end of input")
        self.num_chars_read += 1

    def __match(self, current_input: str, target_input: str) -> bool:
        return current_input == target_input

    ########################
    # Other public functions #
    ########################
    def reset(self) -> None:
        self.num_chars_read = 0
        self.input = ""

    ###############################
    # Parse tree printing functions #
    ###############################
    def __print_entry_message(self, function_name: str) -> None:
        print(self.__get_tab_string(), "In", function_name, "function.")
        self.tree_depth += 1

    def __print_exit_message(self, function_name: str) -> None:
        self.tree_depth -= 1
        print(self.__get_tab_string(), "Returning from", function_name, ".")

    def __print_message_about_current_string(self, message: str, current_input: str) -> None:
        print(self.__get_tab_string(), message, current_input)

    def __get_tab_string(self) -> str:
        tab_string: str = ""
        for d in range(self.tree_depth):
            tab_string += "\t"
        return tab_string


In [2]:
my_rdp: RDP = RDP()
try:
    my_rdp.parse_input('+12')
except ValueError as inst:
    message: tuple[str] = inst.args
    print(message)


Parsing string +12 ***
 In E function.
	 trying to parse input character +
	 In O function.
		 trying to parse input character +
		 successfully parsed input character +
	 Returning from O .
	 In E function.
		 trying to parse input character 1
("Current input character is 1, which cannot be produced by 'E'",)


In [3]:
my_rdp.reset()
try:
    my_rdp.parse_input('+n12n31')
except ValueError as inst:
    message: tuple[str] = inst.args
    print(message)


Parsing string +n12n31 ***
		 In E function.
			 trying to parse input character +
			 In O function.
				 trying to parse input character +
				 successfully parsed input character +
			 Returning from O .
			 In E function.
				 trying to parse input character n
				 Terminal 'n' matched by input character n
				 In D function.
					 trying to parse input character 1
					 successfully parsed input character 1
				 Returning from D .
				 In I function.
					 trying to parse input character 2
					 In D function.
						 trying to parse input character 2
						 successfully parsed input character 2
					 Returning from D .
					 In I function.
						 trying to parse input character n
						 Exiting 'I' tail recursion because input is in FOLLOW set n
					 Returning from I .
				 Returning from I .
			 Returning from E .
			 In E function.
				 trying to parse input character n
				 Terminal 'n' matched by input character n
				 In D function.
					 trying to parse input character 3
		

In [4]:
my_rdp.reset()
try:
    my_rdp.parse_input('+123')
except ValueError as inst:
    message: tuple[str] = inst.args
    print(message)

Parsing string +123 ***
		 In E function.
			 trying to parse input character +
			 In O function.
				 trying to parse input character +
				 successfully parsed input character +
			 Returning from O .
			 In E function.
				 trying to parse input character 1
("Current input character is 1, which cannot be produced by 'E'",)


In [5]:
my_rdp.reset()
try:
    my_rdp.parse_input('+1')
except ValueError as inst:
    message: tuple[str] = inst.args
    print(message)


Parsing string +1 ***
				 In E function.
					 trying to parse input character +
					 In O function.
						 trying to parse input character +
						 successfully parsed input character +
					 Returning from O .
					 In E function.
						 trying to parse input character 1
("Current input character is 1, which cannot be produced by 'E'",)


In [6]:
my_rdp.reset()
try:
    my_rdp.parse_input('+n11n')
except ValueError as inst:
    message: tuple[str] = inst.args
    print(message)

Parsing string +n11n ***
						 In E function.
							 trying to parse input character +
							 In O function.
								 trying to parse input character +
								 successfully parsed input character +
							 Returning from O .
							 In E function.
								 trying to parse input character n
								 Terminal 'n' matched by input character n
								 In D function.
									 trying to parse input character 1
									 successfully parsed input character 1
								 Returning from D .
								 In I function.
									 trying to parse input character 1
									 In D function.
										 trying to parse input character 1
										 successfully parsed input character 1
									 Returning from D .
									 In I function.
										 trying to parse input character n
										 Exiting 'I' tail recursion because input is in FOLLOW set n
									 Returning from I .
								 Returning from I .
							 Returning from E .
							 In E function.
								 trying to parse input character n
			

In [7]:
my_rdp.reset()
try:
    my_rdp.parse_input('-31')
except ValueError as inst:
    message: tuple[str] = inst.args
    print(message)


Parsing string -31 ***
									 In E function.
										 trying to parse input character -
										 In O function.
											 trying to parse input character -
("Current input character is -, which cannot be produced by 'O'",)
