Skip to content
High speed text tokenization for Ruby
Ruby
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
lib Allocate less Mar 9, 2020
test Download models Feb 24, 2020
vendor First commit Feb 24, 2020
.gitignore Download models Feb 24, 2020
.travis.yml Download models Feb 24, 2020
CHANGELOG.md First commit Feb 24, 2020
Gemfile First commit Feb 24, 2020
LICENSE.txt First commit Feb 24, 2020
README.md Added AppVeyor Feb 24, 2020
Rakefile Download models Feb 24, 2020
appveyor.yml Download models Feb 24, 2020
blingfire.gemspec First commit Feb 24, 2020

README.md

BlingFire

BlingFire - high speed text tokenization - for Ruby

Build Status Build status

Installation

Add this line to your application’s Gemfile:

gem 'blingfire'

Getting Started

Create a model

model = BlingFire::Model.new

Tokenize words

model.text_to_words(text)

Tokenize sentences

model.text_to_sentences(text)

Pre-trained Models

BlingFire comes with a default model that follows the tokenization logic of NLTK with a few changes. You can also download other models:

Load a model

model = BlingFire.load_model("bert_base_tok.bin")

Convert text to ids

model.text_to_ids(text)

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/blingfire.git
cd blingfire
bundle install
bundle exec rake vendor:all
bundle exec rake test
You can’t perform that action at this time.