A simple javascript library for Markov text generation based on word-level trigraph models.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
js
.gitignore
LICENSE.txt
README.md
example.html

README.md

JSMarkov

A simple javascript library for Markov text generation based on word-level trigraph models.

Click here for a live example.

Markov Text Generation

A word-level trigraph model can be constructed from an arbitrary text corpus.

We begin by splitting the text into all its three-word subsequences. We typically consider each punctuation symbol to be a word.

For example, Moby-Dick begins

Call me Ishmael. Some years ago--never mind how long precisely--having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little ....

and contains the word trigraphs

  1. Call me Ishmael
  2. me Ishmael .
  3. Ishmael . Some
  4. . Some years
  5. Some years ago etc

What can we do with trigraphs? Well, we can count the number of different ones that occur. Then, given any two words that actually occur in the text, count the number of different words that appear in the third position over all trigraphs.

In a small example like this one, we see that the only thing that follows "Call me" is "Ishmael". In a larger body of text, common two-graphs such as "of the" or "for the" will have several different observed followers.

Using the counts, we can then write software that picks between them according to their frequency in the corpus we started with.

This gives a way to generate new text that follows the general sense of the original texts in shorter passages, but tends to drift into non-sequiturs in longer ones.

And when we throw multiple text corpuses into one trigraph model, we get output that reads somewhat well on the word level but lapses into occasionally entertaining hybrid sentences in longer paragraphs.

API

Build a word-level trigraph model for an input text (multiple input texts may be concatenated to build a combined model):

model = markovModel(inputText)

Generate random text based on a model:

text = markovText(model, targetLen)

Generate random text based on the union of multiple models, tagging each word with the model that was used to generate it:

taggedText = markovTaggedText([model, ...], targetLen)

Example Code

Generate random text based on the contents of element 'input':

<script type="text/javascript" src="js/markov.js"></script>

<textarea id="input">...</textarea>

<div id="output"></div>

<button onClick="document.getElementById('output').innerHTML = markovText(markovModel(document.getElementById('input').value), 1024);">Generate Text</button>

Generate random text based on the combined contents of elements 'input1' and 'input2':

<script type="text/javascript" src="js/markov.js"></script>

<textarea id="input1">...</textarea>
<textarea id="input2">...</textarea>

<div id="output"></div>

<button onClick="document.getElementById('output').innerHTML = markovText(markovModel(document.getElementById('input1').value+" "+document.getElementById('input2').value), 1024);">Generate Text</button>

Generate random text based on the union of models for 'input1' and 'input2', coloring words generated by the 'input1' model red and words generated by the 'input2' model blue:

<script type="text/javascript" src="js/markov.js"></script>
<style type="text/css">
span.tag1 {color:red}
span.tag2 {color:blue}
</style>

<textarea id="input1">...</textarea>
<textarea id="input2">...</textarea>

<div id="output"></div>

<button onClick="document.getElementById('output').innerHTML = markovTaggedText([markovModel(document.getElementById('input1').value), markovModel(document.getElementById('input2').value)], 1024);">Generate Text</button>

License

JSMarkov is licensed under the MIT License.