Skip to content

Ruby Implementation of Splitta library for splitting text into sentences

License

Notifications You must be signed in to change notification settings

david-mccullars/ruby-splitta

Repository files navigation

Ruby Splitta

Status

Gem Version Build Status Code Climate Test Coverage MIT License

Description

Splitta Includes proper tokenization and models for very high accuracy sentence boundary detection (English only for now). The models are trained from Wall Street Journal news combined with the Brown Corpus which is intended to be widely representative of written English. Error rates on test news data are near 0.25%.

Installation

gem install splitta

Requirements

  • Ruby 2.5.1 or higher

Usage

require 'splitta'

Splitta.sentences("Some text goes here.")

License

MIT. See the LICENSE file.

References

Dan Gillick, “Sentence Boundary Detection and the Problem with the U.S.” at NAACL 2009, http://dgillick.com/resource/sbd_naacl_2009.pdf

About

Ruby Implementation of Splitta library for splitting text into sentences

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published