Skip to content

Sorrow321/huggingface_tokenizer_cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HuggingFace WordPiece Tokenizer in C++

Takes as an input .json of the pre-trained tokenizer. Only inference mode is available at the moment.

Change your .json path here:

WordPieceTokenizer tokenizer("tokenizer.json");

Build:

Requires International Components for Unicode library:

sudo apt-get install libicu-dev

Compile:

g++ tokenizer.cpp -licuuc -o tokenizer

Run:

./tokenizer

About

HuggingFace Transformers WordPiece Tokenizer in C++

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages