<a href="https://colab.research.google.com/github/AYena07/DSL/blob/main/lab2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Firstly, here is provided an implementation of function that removes useless(outsider) non-terminal symbols. The algorithm is that we iteratively fill our set of useful non-terminal symbols, which can lead us to sequence, where all symbols are terminal. On each iteration we search for non-terminals that have matching rule to sequence, where all symbols are terminals, or already known useful non-terminals. We stop our iterations after we cannot find new useful non-terminals after whatching through all set of non-terminals.

The structure of our context free grammar is such that:


```
{'toks': set(token), 'vars': dict(var: definition), 'hvar': var}
token : (class, value)    # for now - it is just terminal symbol
class : int
value : str
var : str                 # name of non-terminal
definition : list(rule)
rule : list(var | token)  # right part of matching rule 
```



In [12]:
def remove_all_useless_non_terminals(grammar):

  usefull_non_terminals = set()   # In the beggining it's empty set
  tokens = grammar['toks']
  variables = grammar['vars']
  start_non_terminal = grammar['hvar']
  found_new_usefull_non_terminal = True # setting true just for entering the loop

  while found_new_usefull_non_terminal:
    found_new_usefull_non_terminal = False
    for non_terminal, definition in variables.items():
      if non_terminal not in usefull_non_terminals:
        for rule in definition:
          # if for all symbols from rule symbol is terminal or useful non terminal
          if all(map(lambda symbol: symbol in tokens or symbol in usefull_non_terminals, rule)): 
            found_new_usefull_non_terminal = True
            usefull_non_terminals.add(non_terminal)
            break   # we already found out that current symbol is useful, so we don't need to iterate more

  # new set of non-terminals. pair[0] is key in key-value pair of dict
  new_variables = dict(filter(lambda pair: pair[0] in usefull_non_terminals, variables.items()))

  # clearing all rules that have useless non-termials in our grammar
  for non_terminal in new_variables.keys():
    old_definition = new_variables[non_terminal]
    new_variables[non_terminal] = list(
        filter(lambda rule: all(
            map(lambda symbol: symbol in tokens or symbol in usefull_non_terminals, rule)
        ), old_definition)
    )
  
  #removing start non-terminal if it's no longer in new set of non-terminals
  if start_non_terminal not in usefull_non_terminals:
    start_non_terminal = None

  return {'toks': tokens, 'vars': new_variables, 'hvar': start_non_terminal}

Next is an implementation of function that removes all expiring non-terminals. The algorithm is very simillar to to the previous one. We iteratively fill set of expiring non-terminals untill it stops growing after passing through all matching rules. We add non-termial to the set if it is not already there and if there is a rule, in right part of which all symbols are in set of expiring non-terminals.

In [5]:
def find_all_expiring_non_terminals(grammar):

  expiring_non_terminals = set()   # In the beggining it's empty set
  tokens = grammar['toks']
  variables = grammar['vars']
  start_non_terminal = grammar['hvar']
  found_new_expiring_non_terminal = True # setting true just for entering the loop

  while found_new_expiring_non_terminal:
    found_new_expiring_non_terminal = False
    for non_terminal, definition in variables.items():
      if non_terminal not in expiring_non_terminals:
        for rule in definition:
          # if for all symbols from rule symbol is expiring
          if all(map(lambda symbol: symbol in expiring_non_terminals, rule)): 
            found_new_expiring_non_terminal = True
            expiring_non_terminals.add(non_terminal)
            break   # we already found out that current symbol is expiring, so we don't need to iterate more

  return expiring_non_terminals

Let's provide an example

In [22]:
import yaml; # for pretty print

GRAMMAR = {
    'toks': set( [
        ('class1', 'a1'), 
        ('class1', 'a2'), 
        ('class1', 'a3'), 
        ('class2', 'b1'), 
        ('class2', 'b2'), 
        ('class3', 'c1')
    ] ),
    'vars': {
        'S' : [['N', ('class3', 'c1')], 
               ['N', 'M'],
               ['B', ('class1', 'a3'), 'C'],
               ['F', ('class2', 'b1'), ('class1', 'a2')]],
        'A' : [[]],
        'B' : [['A']],
        'C' : [['B']],
        'N' : [['M']],
        'M' : [['N']],
        'F' : [[('class1', 'a3'), ('class2', 'b1'), ('class2', 'b2'), ('class3', 'c1')]]
    },
    'hvar': 'S'
}


NEW_GRAMMAR = remove_all_useless_non_terminals(GRAMMAR)
toks = NEW_GRAMMAR['toks']
del NEW_GRAMMAR['toks'] # set is not serializable
print(json.dumps(NEW_GRAMMAR, sort_keys=True, indent=4))
NEW_GRAMMAR['toks'] = toks
EXPIRING_NON_TERMINALS = find_all_expiring_non_terminals(NEW_GRAMMAR)
print(list(EXPIRING_NON_TERMINALS))

{
    "hvar": "S",
    "vars": {
        "A": [
            []
        ],
        "B": [
            [
                "A"
            ]
        ],
        "C": [
            [
                "B"
            ]
        ],
        "F": [
            [
                [
                    "class1",
                    "a3"
                ],
                [
                    "class2",
                    "b1"
                ],
                [
                    "class2",
                    "b2"
                ],
                [
                    "class3",
                    "c1"
                ]
            ]
        ],
        "S": [
            [
                "B",
                [
                    "class1",
                    "a3"
                ],
                "C"
            ],
            [
                "F",
                [
                    "class2",
                    "b1"
                ],
                [
                    "class1",
     