✒️ Markov process-based procedural name and word generator
Switch branches/tags
Clone or download

README.md

Markov Namegen logo

Travis Build Status License Badge

Markov Namegen is a Markov chain-based procedural name generator written in Haxe. Run it in your browser.

Demonstrates the markov-namegen haxelib. Read the docs here.

Features

  • Hundreds of editable training data presets.
  • Configurable corpus, order and prior model parameter settings.
  • Filter results by length, start, end, content and regex match.
  • Sort by Damerau-Levenshtein distance to list results by similarity.
  • Save and share custom data, settings and results with one click.

Usage

Try the demo to generate your own words. For example:

Training Dataset: English Towns
Order: 5
Prior: 0.01
Words To Generate: 100
Max Processing Time: 500ms
Length: 8-12
Starts with: b
Ends with:
Include: ham
Exclude:
Similarity To: birmingham
Matches Regex:

A list of results will be displayed. Here are the first 10 results from this run:

Barkingham
Basingham
Birkenham
Bebingham
Bollingham
Bridlingham
Billenham
Berwickham
Botteringham
Bradnincham

Hit one of the sharing buttons to share results and settings via a generated URL. Note that large training data sets generate URLs too long for some browsers and servers - reduce the amount of training data to work around this.

Screenshots

Here is the demo in action:

Screenshot

Screenshot

Screenshot

How It Works

The markov-namegen haxelib uses Markov chains to procedurally generate original words.

Using a set of words as training data, the library calculates the conditional probability of a letter coming up after a sequence of letters chosen so far. It looks back up to "n" characters, where "n" is the order of the model.

The generator can use several orders of models, each with memory n. Starting with the highest order models (models with bigger memories), it tries to get a new character, falling back to lower order models if necessary - an approach known as Katz's back-off model.

A Dirichlet prior is used to add a constant probability that any letter may be picked as the next letter. This acts as an additive smoothing factor and adds a bit more "randomness" to the generated output.

Countless words are generated, and are then filtered and sorted according to several tweakable criteria like length, start and end characters, similarity to a target word, and so on.

Library Install

Get the Markov Namegen library from GitHub or through haxelib.

Include it in your .hxml

-lib markov-namegen

Or add it to your Project.xml:

<haxelib name="markov-namegen" />

Notes

  • Many of the concepts used for the generator were suggested in this article by Jeffrey Lund.
  • If you have any questions or suggestions then get in touch or open an issue.
  • Remember to read the documentation.

License

The website and demo code are licensed under CC BY-NC. The haxelib library itself is MIT licensed. The noUiSlider settings sliders are WTFPL. Most of the training data was compiled from sites like Wikipedia and census data sources.