Skip to content
This repository has been archived by the owner on Apr 6, 2023. It is now read-only.

Fast state-of-the-art tokenizers for Ruby

License

Notifications You must be signed in to change notification settings

airopshq/tokenizers-ruby

 
 

Repository files navigation

Tokenizers Ruby

🙂 Fast state-of-the-art tokenizers for Ruby

Build Status

Installation

Add this line to your application’s Gemfile:

gem "tokenizers"

Getting Started

Load a pretrained tokenizer

tokenizer = Tokenizers.from_pretrained("bert-base-cased")

Encode

encoded = tokenizer.encode("I can feel the magic, can you?")
encoded.ids
encoded.tokens

Decode

tokenizer.decode(ids)

Load a tokenizer from files

tokenizer = Tokenizers::CharBPETokenizer.new("vocab.json", "merges.txt")

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/tokenizers-ruby.git
cd tokenizers-ruby
bundle install
bundle exec rake compile
bundle exec rake download:files
bundle exec rake test

About

Fast state-of-the-art tokenizers for Ruby

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 59.2%
  • Ruby 40.8%