Skip to content

Textualization/php-sentencepiece

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PHP SentencePiece

This is a minimal wrapper on top of Google SentencePiece to enable executing the XLMRobertaTokenizer encode method.

It needs the dynamic library for SentencePiece built with aditional C wrapper functions, see the fork at [https://github.com/textualization/sentencepiece/].

A binary for the library can be downloaded by doing:

composer exec -- php -r "require 'vendor/autoload.php'; Textualization\SentencePiece\Vendor::check();"

but depending on platform and GLIBC you might need to compile it yourself and copy to vendor/textualization/sentencepiece/lib (create the folder if it doesn't exist). See src/Vendor.php for details.

Running the tests

To run the tests you'll need to install the library per the instructions above.

To fully test it, download this file sentencepiece.bpe.model and place it in tests/.