# Grammar Generation

Grammar is generated based on levels from `levels/cleaned_maps/` which are generated from super mario bros levels.

In [1]:
MAP_DIRECTORY = 'levels/cleaned_maps'

We are going to generate 5 grammars, where grammar 1 looks back one column. Grammar 2 looks back two columns and may be:

```
ab -> c
bc -> a,b,d
```

I'm going to diverge from this, though, and apply totals as well. For example, in the case of `ab->c` the weight for `c` would be 1. However, in the other case it would be based on the number of appearances that it made while looking through the maps. Therefore, let's say after `bc`, `a` showed up 10 times, `b` 2 times, and `d` 4 times. This means the weights would be `10/16`, `2/16`, and `4/16` respectively.

In [2]:
MAX_SIZE = 5

#### Getting Grammar Pieces

In [3]:
import json

def read_json(file_path):
    f = open(file_path)
    j = json.loads('\n.'.join(f.readlines()))
    f.close()
    
    return j

In [4]:
grammar = read_json('grammar/grammar_reversed.json')

### Creating Grammar

In [5]:
from tqdm import tqdm_notebook
import queue
import os

In [6]:
CLEANED_MAPS = 'levels/cleaned_maps/'

In [7]:
grammars = [{} for i in range(MAX_SIZE)]

In [8]:
def convert_to_level_matrix(lvl_matrix):
    for i in range(len(lvl_matrix)):
        lvl_matrix[i] = str(grammar[lvl_matrix[i].strip()])

In [9]:
def convert_to_grammar(lvl_matrix, lvl_grammar, size):
    q = queue.Queue()
    for i in range(size):
        q.put('0')
    
    for col in lvl_matrix[size:]:
        grammar_value = ','.join(list(q.queue))
        
        if grammar_value not in lvl_grammar:
            lvl_grammar[grammar_value] = {}
        
        if col not in lvl_grammar[grammar_value]:
            lvl_grammar[grammar_value][col] = 0
            
        lvl_grammar[grammar_value][col] += 1
        
        last_val = q.get()
        q.put(col)

In [10]:
files = os.listdir(CLEANED_MAPS)

for file in tqdm_notebook(files):
    f = open(os.path.join(CLEANED_MAPS, file), 'r')
    matrix = f.readlines()
    f.close()
    convert_to_level_matrix(matrix)
    
    for size in range(MAX_SIZE):
        convert_to_grammar(matrix, grammars[size], size+1)

HBox(children=(IntProgress(value=0, max=12), HTML(value='')))




### Converting Grammar to Percentages

In [11]:
for i in tqdm_notebook(range(len(grammars))):
    size_g = grammars[i]
    for g in size_g:
        total = 0
        for key in size_g[g]:
            total += size_g[g][key]

        for key in size_g[g]:
            size_g[g][key] /= float(total)

HBox(children=(IntProgress(value=0, max=5), HTML(value='')))




### Saving Grammar

In [12]:
for i in range(len(grammars)):
    f = open('grammar/%s_grammar.json' % (str(i + 1)), 'w')
    f.write(json.dumps(grammars[i]))
    f.close()