Skip to content
This repository


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Random fun with statistical language models.

branch: master
Currently here: a Markov random sonnet generator. There's sample
output at
(The program does somewhat better now than what's shown off there.)
To generate it:
$ python sonnet  # or limerick or other verse form it knows about

Currently missing: the data it works from. You need two files:

* 2gm-common6: from
  (lines like "word1 word2\tcount" for common bigrams)
  (word1 can be "<S>" for start of sentence)
* cmudict.0.7a: from

I'd like to add I don't normally publish code in such a crap state.

Some other hacks thrown in here:

* generates multiword anagrams

* helps to sort anagrams by quality (using n-gram
statistics and brute force)

* breaks down the Gutenberg Project's KJ Bible into raw material for other hacks here

* generate random Web2.0 company names, along with a plausibility rating for each.

* reverses disemvoweling

* tries to invent mnemonics like pi's "How I wish I could enumerate pi easily..."

* finds pairs of words that blend nicely, like book + hookup --> bookup

* generates chapter 'summaries' for a book, like

* is a super-crude sentence segmenter

* writes HTML that highlights words with increasing intensity the more unlikely they are according to a language model

* described above

See also for verse-making
rewritten in Javascript.
Something went wrong with that request. Please try again.