

```
{'toks': set(token), 'vars': dict(var: definition), 'hvar': var}
token : (class, value)
class : int
value : str
var : str                 # non-terminal name
definition : list(rule)
rule : list(var | token)  # right side of the rule```


FIRST(X) is a set of terminal symbols that strings derived from X begin with

In [8]:
def FIRST(tokens, productions, symbol):
  result = set()
  for rule in productions[symbol]:
    # take first symbol of each rule and add it to the set
    # if the symbol is non terminal, otherwise call FIRST recursively
    if rule[0] in tokens:
      result.add(rule[0])
    else:
      result |= FIRST(tokens, productions, rule[0])
  return result

Testing the FIRST on the non-terminal A from the grammar below

In [11]:
GRAMMAR = {
    'toks': {(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')},
    'vars': {
        'A': [['B'], ['C']],
        'C': [[(0, 'a'), 'B'], [(3, 'd'), (2, 'c')]],
        'B': [['D']],
        'D': [[(1, 'b')], [(3, 'd')]],
    },
    'hvar': 'A',
}

print(FIRST(GRAMMAR['toks'], GRAMMAR['vars'], GRAMMAR['hvar']))

{(3, 'd'), (1, 'b'), (0, 'a')}


FOLLOW(X) is a set of terminals that may appear right after X in some derivation. To calculate FOLLOW(X) find all productions including X in the right side, and get the following symbols in those productions. If there are no symbols following X in some production A -> pX, then FOLLOW(X) is the same as FOLLOW(A)

In [21]:
def FOLLOW(tokens, productions, symbol):
  useful_productions = []
  for nt, production in productions.items():
    for rule in production:
      if symbol in rule:
        useful_productions.append((nt, rule))
  
  result = set()
  for p in useful_productions:
    lhs, rhs = p
    idx = rhs.index(symbol)
    if idx + 1 < len(rhs):
      next_symbol = rhs[idx + 1]
      if next_symbol in tokens:
        result.add(next_symbol)
      else:
        result |= FIRST(tokens, productions, next_symbol)
    else:
      # find FOLLOW for the left side of this production, as there is nothing following 'symbol'
      if lhs != symbol:
        result |= FOLLOW(tokens, productions, lhs)
  return result

To test FOLLOW function, I will create another grammar

In [22]:
GRAMMAR = {
    'toks': {(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'f'), (5, 'g'), (6, 'h'), (7, 'e')},
    'vars': {
        'A': [[(0, 'a') ,'B', 'D', (6, 'h')]],
        'B': [[(2, 'c'), 'C']], 
        'C': [[(1, 'b'), 'C']],
        'D': [['E', 'F']],
        'E': [[(5, 'g')], [(7, 'e')]],
        'F': [[(4, 'f')], [(7, 'e')]],
    },
    'hvar': 'A',
}
print('FOLLOW(A): ', FOLLOW(GRAMMAR['toks'], GRAMMAR['vars'], 'A'))
print('FOLLOW(B): ', FOLLOW(GRAMMAR['toks'], GRAMMAR['vars'], 'B'))
print('FOLLOW(C): ', FOLLOW(GRAMMAR['toks'], GRAMMAR['vars'], 'C'))
print('FOLLOW(D): ', FOLLOW(GRAMMAR['toks'], GRAMMAR['vars'], 'D'))
print('FOLLOW(E): ', FOLLOW(GRAMMAR['toks'], GRAMMAR['vars'], 'E'))
print('FOLLOW(F): ', FOLLOW(GRAMMAR['toks'], GRAMMAR['vars'], 'F'))

FOLLOW(A):  set()
FOLLOW(B):  {(5, 'g'), (7, 'e')}
FOLLOW(C):  {(5, 'g'), (7, 'e')}
FOLLOW(D):  {(6, 'h')}
FOLLOW(E):  {(7, 'e'), (4, 'f')}
FOLLOW(F):  {(6, 'h')}
