Bling Fire Ruby

Bling Fire - high speed text tokenization - for Ruby

Installation

Add this line to your application’s Gemfile:

gem "blingfire"

Getting Started

Create a model

model = BlingFire::Model.new

Tokenize words

model.text_to_words(text)

Tokenize sentences

model.text_to_sentences(text)

Get offsets for words

words, start_offsets, end_offsets = model.text_to_words_with_offsets(text)

Get offsets for sentences

sentences, start_offsets, end_offsets = model.text_to_sentences_with_offsets(text)

Pre-trained Models

Bling Fire comes with a default model that follows the tokenization logic of NLTK with a few changes. You can also download other models:

BERT Base, BERT Base Cased, BERT Chinese, BERT Multilingual Cased
GPT-2
Laser 100k, Laser 250k, Laser 500k
RoBERTa
Syllab
URI 100k, URI 250k, URI 500k
XLM-RoBERTa
XLNet, XLNet No Norm
WBD

Load a model

model = BlingFire.load_model("bert_base_tok.bin")

Convert text to ids

model.text_to_ids(text)

Get offsets for ids

ids, start_offsets, end_offsets = model.text_to_ids_with_offsets(text)

Disable prefix space

model = BlingFire.load_model("roberta.bin", prefix: false)

Ids to Text

Load a model

model = BlingFire.load_model("bert_base_tok.i2w")

Convert ids to text

model.ids_to_text(ids)

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

Report bugs
Fix bugs and submit pull requests
Write, clarify, or fix documentation
Suggest or add new features

To get started with development:

git clone https://github.com/ankane/blingfire-ruby.git
cd blingfire-ruby
bundle install
bundle exec rake vendor:all download:models
bundle exec rake test

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github/workflows		.github/workflows
lib		lib
test		test
vendor		vendor
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
blingfire.gemspec		blingfire.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

lib

lib

test

test

vendor

vendor

.gitignore

.gitignore

CHANGELOG.md

CHANGELOG.md

Gemfile

Gemfile

LICENSE.txt

LICENSE.txt

README.md

README.md

Rakefile

Rakefile

blingfire.gemspec

blingfire.gemspec

Repository files navigation

Bling Fire Ruby

Installation

Getting Started

Pre-trained Models

Ids to Text

History

Contributing

About

Releases

Packages

Languages

License

ankane/blingfire-ruby

Folders and files

Latest commit

History

Repository files navigation

Bling Fire Ruby

Installation

Getting Started

Pre-trained Models

Ids to Text

History

Contributing

About

Resources

License

Stars

Watchers

Forks

Languages