Skip to content

Textualization/RophertaTokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BPE Tokenizer for Ropherta (subclass of GPT3Tokenizer)

This is just a wrapper around GPT3Tokenizer using the HuggingFace RoBERTa vocab and merge files.

See GPT3 documentation for example use (or the generated test case under tests/).

XLM Tokenizer

To use the multilingual version, the SentencePiece dependency needs to be initialized and an aditional model file needs to be downloaded:

composer exec -- php -r "require 'vendor/autoload.php'; Textualization\SentencePiece\Vendor::check();"
composer exec -- php -r "require 'vendor/autoload.php'; Textualization\Ropherta\Tokenizer\Vendor::check();"

Sponsors

We thank our sponsor: