# Day 3: Markov Groening

Inspired by - and reusing entire chunks of code from - the post/iPython notebook "The unreasonable effectiveness of Character-level Language Models" (by [Yoav Goldberg](http://www.cs.bgu.ac.il/~yoavg/uni/)), in this notebook, I train a n-order character-level Markov model on synopses of a number of "couch gags" from the opening credits of _The Simpsons_, which were obtained [here](http://www.simpsoncrazy.com/lists/couch). More comprehensive lists are available elswhere (e.g. Wikia, other fan sites). Apparently the algorithm is similar to that referred to [on Wikpedia, here](https://en.wikipedia.org/wiki/Dissociated_press).

This code is in dire need of refactoring. The concept of a "model" is ripe for characterization as an object, containing the defaultdict that defines it, plus a bunch of methods that train it, parameterize it, and request output from it. Perhaps for a later notebook. After all, it works. Other extensions might include figuring out a way for quirky outputs to be flagged as such, and to reject outputs that are too short, or patently unsyntactic

I added a tiny tweak to the implementation. The text generation function takes a seed that should be < `(len(data) - 7)` to stop the implementation starting at the start of the data for its first `history`.

## Sample output (some favorites)

- Each family are rotates 180 degrees Matrix-style, and a statue of a Buddha resembling Agnes Skinner flies in last and living room floor is chosen.

- The family rush in and against the edge of the couch made of gingerbread; Homer stands in the floor with black and white, everybody swings in on vines, however, doesn't make it very far and shoes but it turns the TV.

- In black belts, everybody sits, then Bart puts a whoopee cushion on the right and bandages.


In [1]:
from collections import *
import random

def train_char_lm(fname, order=4):
    data = file(fname).read()
    lm = defaultdict(Counter)
    
    pad = "~" * order
    data = pad + data

    for i in xrange(len(data)-order):
        history, char = data[i:i+order], data[i+order]
        lm[history][char]+=1
        
    def normalize(counter):
        s = float(sum(counter.values()))
        return [(c,cnt/s) for c,cnt in counter.iteritems()]
    
    outlm = {hist:normalize(chars) for hist, chars in lm.iteritems()}
    
    return outlm

def generate_letter(lm, history, order):
    history = history[-order:]
        
    dist = lm[history]
    x = random.random()
        
    for c,v in dist:
        x = x - v
        if x <= 0: return c
            
def generate_text(lm, order, nletters=1000, seed=None):
    if seed is None:
        history = "~" * order
    else:
        history = data[seed:seed+7]
    
    out = []
    
    for i in xrange(nletters):
        c = generate_letter(lm, history, order)
        history = history[-order:] + c
        out.append(c)
    
    return "".join(out)


In [2]:
ORDER = 7 
lm = train_char_lm("gags.txt", order=ORDER)

In [3]:
data = file("gags.txt").read()
print generate_text(lm, ORDER, seed=random.randint(0, len(data) - ORDER)).replace('\n', '\n' * 2)

e Simpsons onto the couch is a paper shredder; everybody swings in on water, hits an ice-hockey rink; the family transforms into view, with the bill. It turns out to be the sailboat painting above the couch.

Everybody water skis to the couch tracks into a sofa and sits on the wall and the headline: COUCH GAG THRILLS NATION and a giant baby picks the couch.

A game hunter sits on the couch dressed in karate gear with black and white, everybody stops as the couch takes a bite of the family is pinned over their head up from behind the family enter... and eventually emerging from a girder at a construction site.

In place of the room and sit in mid-air.

Everyone slips on banana peels, but it manages to suck her pacifier drops it off and he gives it back, battle the Flanders who is shown as a 7.

A gardener trims a hedge into the couch and it falls over backwards a shotgun and sits on an ancient-looking fish's tentacle. The top halves all land on the couch transformer toy vehicles enterin