Restore correct letter casings in arbitrary text using a statistical model
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Build Status Latest Version docs is a simple statistical truecaser written in Rust.

Truecasing is restoration of original letter cases in text: for example, turning all-uppercase, or all-lowercase text into one that has proper sentence casing (capital first letter, capitalized names etc).

This crate attempts to solve this problem by gathering statistics from a set of training sentences, then using those statistics to restore correct casings in broken sentences. It comes with a command-line utility that makes training the statistical model easy.

Quick usage example

use truecase::{Model, ModelTrainer};

// build a statistical model from sample sentences
let mut trainer = ModelTrainer::new();
trainer.add_sentence("There are very few writers as good as Shakespeare");
trainer.add_sentence("You and I will have to disagree about this");
trainer.add_sentence("She never came back from USSR");
let model = trainer.into_model();

// use gathered statistics to restore case in caseless text
let truecased_text = model.truecase("i don't think shakespeare was born in ussr");
assert_eq!(truecased_text, "I don't think Shakespeare was born in USSR");

See documentation for more details.

License is licensed under the terms of the MIT License or the Apache License 2.0, at your choosing.