In [1]:
import inspect

# Code Demo/Walkthrough

Contents:
1. Cleaned version
2. UX improvements
3. External files
4. Nonneural tweaks
5. The results

### Cleaned Code (nonneural_myteam.py)

We decided to rewrite the original nonneural.py code from scratch, really just to see how well we understood it, and in the process decided that a lot of what they had written was not very nice. The goal of this notebook is not really to explain the full extent of the minor tweaks, but some of them feel significant enough to mention:

#### Levenshtein memoizer

Their version:

In [None]:
# DON'T RUN ME

def memolrec(func):
    """Memoizer for Levenshtein."""
    cache = {}
    
    def wrap(sp, tp, sr, tr, cost):
        if (sr,tr) not in cache:
            res = func(sp, tp, sr, tr, cost) # why call it using the past values...
            
            cache[(sr,tr)] = (res[0][len(sp):], res[1][len(tp):], res[4] - cost)
            # ...only to immediately remove them from the output?
            
        return sp + cache[(sr,tr)][0], tp + cache[(sr,tr)][1], '', '', cost + cache[(sr,tr)][2]
    
    return wrap

Our version:

In [None]:
# DON'T RUN ME EITHER
def memolrec(func):
    """Wrapper function/memoizer for recursive levenshtein implementation. Returns 'decorated' version
    of levenshtein."""

    cache = {}

    @wraps(func)
    def wrap(spast, tpast, srem, trem, cost):
        
        if (srem, trem) not in cache:
            res = func('', '', srem, trem, 0) # this function call does not need to know the previous values
            
            cache[(srem, trem)] = (res[0], res[1], res[4]) # now this line can directly store the output
        
        aln_srem, aln_trem, rem_cost = cache[(srem, trem)]
        return spast+aln_srem, tpast+aln_trem, '', '', cost + rem_cost
    
    return wrap # return decorated function

It might seem trivial, but for whatever reason, we noticed the original code had a really odd tendency to
insert a bunch of characters before the final 'r' when aligning a lemma and a form, even if the inflected
form had no r's in the suffix (like 'abjure___r__' to 'abjurassions'). This had no significant impact on the
ultimate performance of the code, but curiously when run with none of our other changes, it actually guesses correctly on exactly one extra word (an inflected form of ouïr).

#### Prefix and suffix redundancies

This was part of the assignment, but I was personally a fan of this change because it made so much of the code look a lot nicer. We established two global constants (so as to prevent potential difficult-to-track-down typos from hard-coding the values everywhere)
```python
...
# global constants
PRE = 0
SUF = 1
...
```
and then everything that involved storing prefix/suffix rules in similar ways or applying almost identical algorithms to the same thing twice could be done a bit more elegantly with lists/tuples/for-loops, e.g.
```python
rules[PRE].add((inpre, outpre))
```
or
```python
for fix in (PRE, SUF):
    
    if msd not in allrules[fix]:
        ...
```

#### Command-line argument parsing

We just love being able to define our own options/arguments and then later accessing them as object attributes! It is super neat. This also happens to be the recommended library for retrieving comman-line arguments in python3. The `getopt` module is designed for those who are accustomed to C-style option retrieval and too stubborn to change their ways. We, however, are young, and we believe in progress. The accepted practices of C programming won't keep us down!

That's how this:

In [2]:
# DO NOT RUN ME PLEASE!

def main(argv):
    options, remainder = getopt.gnu_getopt(argv[1:], 'ohp:', ['output','help','path='])
    TEST, OUTPUT, HELP, path = False,False, False, '../data/' # I hate this kind of variable assignment, tbch
    for opt, arg in options:
        if opt in ('-o', '--output'):
            OUTPUT = True
        if opt in ('-t', '--test'):
            TEST = True
        if opt in ('-h', '--help'):
            HELP = True
        if opt in ('-p', '--path'):
            path = arg

    if HELP:
        # there were a bunch of print statements here that you had to keep editing by hand
        # every time you wanted to add new options
        pass
    
    # REST OF MAIN FUNCTION WAS HERE

if __name__ == "__main__":
    main(sys.argv)

NameError: name 'sys' is not defined

became this:

In [None]:
# I KNOW I'M COOL, BUT DON'T RUN ME EITHER

def main(parsed):
    # wow, i'm so neat and uncluttered
    out = parsed.out
    evl_ext = parsed.eval # and then access it here! (see below)
    path = parsed.path
    
    # no help print-statements? no problem! the argparse library takes care of it for you!
    
    # REST OF MAIN FUNCTION WAS HERE

if __name__ == '__main__':
    parser = argparse.ArgumentParser(prog='NonNeuralMyTeam',
                                     description='Our cleaned and very slightly edited version'
                                        + ' of the original nonneural.py code.',
                                     epilog='Generating this help message was done with the argparse library'
                                        + ' instead of manually with a bunch of print-statements.')
    
    parser.add_argument('-o', '--output',
                        dest='out',
                        action='store_true',
                        help='generate output files with guesses. files are written to the same place as'
                            + ' the path argument (or default value if no path was specified) under <lang>.out')
    parser.add_argument('-t', '--test',
                        dest='eval', # look, see? you can tell it what the variable is called (see above)
                        action='store_const',
                        const='.tst',
                        default='.dev',
                        help='evaluate models on test instead of dev data.')
    parser.add_argument('-p', '--path',
                        dest='path',
                        default='data',
                        help='path to the directory containing data files. defaults to \'data\'.')