- README: https://github.com/david-mccullars/ruby-splitta
- Documentation: http://www.rubydoc.info/github/david-mccullars/ruby-splitta
- Bug Reports: https://github.com/david-mccullars/ruby-splitta/issues
Splitta Includes proper tokenization and models for very high accuracy sentence boundary detection (English only for now). The models are trained from Wall Street Journal news combined with the Brown Corpus which is intended to be widely representative of written English. Error rates on test news data are near 0.25%.
gem install splitta
- Ruby 2.5.1 or higher
require 'splitta'
Splitta.sentences("Some text goes here.")
MIT. See the LICENSE
file.
Dan Gillick, “Sentence Boundary Detection and the Problem with the U.S.” at NAACL 2009, http://dgillick.com/resource/sbd_naacl_2009.pdf