Skip to content

Restore correct letter casings in arbitrary text using a statistical model

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

despawnerer/truecase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

truecase.rs

Latest Version docs

truecase.rs is a simple statistical truecaser written in Rust.

Truecasing is restoration of original letter cases in text: for example, turning all-uppercase, or all-lowercase text into one that has proper sentence casing (capital first letter, capitalized names etc).

This crate attempts to solve this problem by gathering statistics from a set of training sentences, then using those statistics to restore correct casings in broken sentences. It comes with a command-line utility that makes training the statistical model easy.

Quick usage example

use truecase::{Model, ModelTrainer};

// build a statistical model from sample sentences
let mut trainer = ModelTrainer::new();
trainer.add_sentence("There are very few writers as good as Shakespeare");
trainer.add_sentence("You and I will have to disagree about this");
trainer.add_sentence("She never came back from USSR");
let model = trainer.into_model();

// use gathered statistics to restore case in caseless text
let truecased_text = model.truecase("i don't think shakespeare was born in ussr");
assert_eq!(truecased_text, "I don't think Shakespeare was born in USSR");

See documentation for more details.

License

truecase.rs is licensed under the terms of the MIT License or the Apache License 2.0, at your choosing.

About

Restore correct letter casings in arbitrary text using a statistical model

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Packages

No packages published

Languages