# Character level generator code

All credit for 90% of the code goes to [Yoav Goldberg](http://www.cs.biu.ac.il/~yogo)



### Training Code
Here is the code for training the model. `fname` is a file to read the characters from. `order` is the history size to consult. Note that we pad the data with leading `~` so that we also learn how to start.

In [48]:
from collections import *

def train_char_lm(fname, order=4):
    data = file(fname).read()
    lm = defaultdict(Counter)
    pad = "~" * order
    data = pad + data
    for i in xrange(len(data)-order):
        history, char = data[i:i+order], data[i+order]
        lm[history][char]+=1
    def normalize(counter):
        s = float(sum(counter.values()))
        return [(c,cnt/s) for c,cnt in counter.iteritems()]
    outlm = {hist:normalize(chars) for hist, chars in lm.iteritems()}
    return outlm, data

### Generating from the model
Generating is also very simple. To generate a letter, we will take the history, look at the last $order$ characteters, and then sample a random letter based on the corresponding distribution.

In [49]:
from random import random

def generate_letter(lm, history, order):
        history = history[-order:]
        dist = lm[history]
        x = random()
        for c,v in dist:
            x = x - v
            if x <= 0: return c

To generate a passage of $k$ characters, we just seed it with the initial history and run letter generation in a loop, updating the history at each turn.

In [50]:
def generate_text(lm, order, nletters=2000):
    history = "~" * order
    out = []
    for i in xrange(nletters):
        c = generate_letter(lm, history, order)
        history = history[-order:] + c
        out.append(c)
    return "".join(out)

For ease of use as an amusing generator we make a single function that makes the language model, generates some text and checks what phrases created by the generator already exist in the dataset and strip them out

In [51]:
def train_and_generate(fname, order=4):
    lm, data = train_char_lm(fname, order)
    random_buzzwords = generate_text(lm, order).splitlines()
    data = data.splitlines()
    new_buzzwords = []
    for buzzword in random_buzzwords:
        if buzzword not in data:
            new_buzzwords.append(buzzword)
    return "\n".join(new_buzzwords)

### Actual examples 

It makes a difference whether we use order 3 or 4 in terms of randomness or interestingness

In [58]:
print train_and_generate("buzzwords.txt", order =4)

accountable table
bring
leverable tail
down
drill downsizing
core communicationary
evangel invested common core capital divide
disruptive interprise
entiated in
knowledge
web designagement
flipped class
best of our dna
passional software
software
software
missionate
content
big datafication
tractive
deliverse fulfilment
traction society
deep divide
emergies
organic growth hack
hacking your own dogfood
society
bleeding the airbnb for x
the end of our own dogfood
empowermedia
sponsive web
design patten
virtualize
society
bleeding tail
low hange agent
building forward
exit strategic competency
creating to the facebook forward
multiple innovation
pick
clearning
society
seamline
student vision socialization
share optimization virtual rights
herding your dna
passion y
going page
rockstartup
search engagement
come-to-jesus model canvas
content management vision virtual real-time
responsive interprise services
responsive innovationship management
reverage
rockstartup
synerging
ange
search enga

Using the three previous character leads to much more unintelligible but potentially buzzwordy words

In [62]:
print train_and_generate("buzzwords.txt", order=3)

accountableed common x
thouservices
sea changelisticess
synerate
free offshove learly-start reak
brogrammerging
omnication the box
survices
nanotechnologies
different
ant
collability
hyperlocalaborationar
bric
homent manal system
conternetworkflow
busine
tal software
we age
low
grand offlix formal
infor x
thouse
tal
down down down down define
ent horinking fruity
sistice
bransmedildonicats
hight lear goal
servival sign pagent manage
webscaffolksonomy
net
telligency
sis
park figure cloud
cloud clution
core capi
business
parsity
service
cross-to-business praction conternet of outsideative who web
del capital remasting forward
multiple development managemer
student learly-stage
ent
star
stakeholisting
ble
mon y
globallparalytics
apital rights moduction software who we are optive
gamificale
earning
holistems design
revoluting
common viable
hypermenterprinking
co-ope
quick the kool-aid
ent
socience
breal-time
reture devoluttenture
mill divide
peelhough
click
shorientent
sustor
sweation
soci