Skip to content

Akin01/markov-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

markovify-rs

Crates.io Documentation License: MIT

A Rust implementation of a Markov chain text generator, inspired by markovify.

markovify-rs is a simple, extensible Markov chain generator. Its primary use is for building Markov models of large corpora of text and generating random sentences from that.

Features

  • 🚀 Fast - Native Rust performance for text generation
  • 📦 Simple API - Easy to use with sensible defaults
  • 🔧 Extensible - Override key methods for custom behavior
  • 💾 JSON Serialization - Save and load models for later use
  • 🎯 Configurable - Adjustable state size, overlap detection, and more
  • 🔗 Model Combination - Combine multiple models with weights

Installation

Add this to your Cargo.toml:

[dependencies]
markovify-rs = "0.1.2"

Basic Usage

use markovify_rs::Text;

// Get raw text as string
let text = r#"
    Sherlock Holmes was a consulting detective. He solved crimes in London.
    His friend Dr. Watson helped him. They lived on Baker Street.
    Holmes was very clever and observant.
"#;

// Build the model
let text_model = Text::new(text, 2, true, true, None).unwrap();

// Print five randomly-generated sentences
for _ in 0..5 {
    if let Some(sentence) = text_model.make_sentence(None, None, None, None, None, None, None) {
        println!("{}", sentence);
    }
}

// Print three randomly-generated sentences of no more than 100 characters
for _ in 0..3 {
    if let Some(sentence) = text_model.make_short_sentence(100, None, None, None, None, None, None, None, None) {
        println!("Short: {}", sentence);
    }
}

Advanced Usage

Specifying the Model's State Size

State size is the number of words the probability of a next word depends on.

// Default state size is 2
let model = Text::new(text, 2, true, true, None).unwrap();

// Use a state size of 3
let model = Text::new(text, 3, true, true, None).unwrap();

Combining Models

Combine two or more Markov chains with optional weights:

use markovify_rs::{Text, utils::combine_texts};

let model_a = Text::new(text_a, 2, true, true, None).unwrap();
let model_b = Text::new(text_b, 2, true, true, None).unwrap();

// Combine with equal weights
let combined = combine_texts(vec![&model_a, &model_b], None).unwrap();

// Combine with custom weights (50% more weight on model_a)
let combined = combine_texts(vec![&model_a, &model_b], Some(vec![1.5, 1.0])).unwrap();

Compiling a Model

Compile a model for improved text generation speed:

let text_model = Text::new(text, 2, true, true, None).unwrap();
let compiled_model = text_model.compile();

// Or compile in place
let mut text_model = Text::new(text, 2, true, true, None).unwrap();
text_model.compile_inplace();

Working with Newline-Delimited Text

For text where sentences are separated by newlines instead of punctuation:

use markovify_rs::NewlineText;

let text = r#"
Line one here
Line two there
Line three everywhere
"#;

let model = NewlineText::new(text, 2, true, true, None).unwrap();

Exporting and Importing Models

Save and load models using JSON:

// Generate and save
let text_model = Text::new(corpus, 3, true, true, None).unwrap();
let model_json = text_model.to_json().unwrap();

// Save to file (optional)
std::fs::write("model.json", model_json).unwrap();

// Load from JSON
let model_json = std::fs::read_to_string("model.json").unwrap();
let reconstituted_model = Text::from_json(&model_json).unwrap();

// Generate a sentence
if let Some(sentence) = reconstituted_model.make_short_sentence(280, None, None, None, None, None, None, None, None) {
    println!("{}", sentence);
}

Custom Sentence Rejection

Override the default rejection pattern:

// Use a custom regex to reject sentences containing specific patterns
let model = Text::new(
    text,
    2,
    true,
    true,
    Some(r"badword|anotherbad"),  // Custom rejection pattern
).unwrap();

// Or disable well-formed checking entirely
let model = Text::new(text, 2, true, false, None).unwrap();

Sentence Generation Options

let model = Text::new(text, 2, true, true, None).unwrap();

// Generate with custom parameters
let sentence = model.make_sentence(
    None,                       // init_state: optional starting words
    Some(100),                  // tries: maximum attempts
    Some(0.5),                  // max_overlap_ratio: overlap threshold
    Some(10),                   // max_overlap_total: max overlap words
    Some(true),                 // test_output: whether to test for overlap
    Some(20),                   // max_words: maximum word count
    Some(5),                    // min_words: minimum word count
);

// Generate sentence starting with specific words
let sentence = model.make_sentence_with_start(
    "Sherlock Holmes",  // beginning phrase
    true,               // strict: only sentences starting with phrase
    Some(50),           // tries
    None, None, None, None, None,
).unwrap();

API Reference

Text

The main text model struct.

  • new(input_text, state_size, retain_original, well_formed, reject_reg) - Create a new model
  • make_sentence(...) - Generate a random sentence
  • make_short_sentence(max_chars, ...) - Generate a sentence with character limit
  • make_sentence_with_start(beginning, ...) - Generate sentence starting with specific words
  • compile() - Compile for faster generation
  • to_json() / from_json() - Serialize/deserialize

Chain

The underlying Markov chain (non-text-specific).

  • new(corpus, state_size) - Create a chain from corpus
  • walk(init_state) - Generate a sequence
  • compile() - Compile for faster generation
  • to_json() / from_json() - Serialize/deserialize

NewlineText

Text model that splits on newlines.

Same API as Text, but uses newline-based sentence splitting.

Performance

Rust provides significant performance improvements over the Python implementation:

Operation Python (markovify) Rust (markovify-rs) Speedup
Model Creation 50-100 ms 5-15 ms 5-10x
Sentence Generation 1-5 ms 0.01-0.1 ms 50-100x
Compiled Generation 0.5-2 ms 0.01-0.05 ms 20-50x
Model Compilation 10-30 ms 1-5 ms 5-10x
JSON Serialize 5-15 ms 1-3 ms 3-5x

Running Benchmarks

# Run Rust benchmarks
cargo bench

# Run Python benchmarks
python3 benchmarks/python_benchmark.py

# Run both and compare
cd benchmarks
./run_benchmarks.sh --iterations 10

See benchmarks/BENCHMARKS.md for detailed documentation.

Notes

  • Markovify works best with large, well-punctuated texts
  • By default, make_sentence tries 10 times to generate a valid sentence
  • The default overlap check rejects sentences that overlap by 15 words or 70% of the sentence length
  • Setting retain_original = false reduces memory usage for large corpora

License

MIT License - see LICENSE for details.

Acknowledgments

This is a Rust port of the excellent markovify Python library by Jeremy Singer-Vine.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors