Use built in function to access special tokens and ids #16
Labels
bug
Something isn't working
enhancement
New feature or request
feature request
New feature or functionality wanted
Current implementation for mapping the tokens to their ids caused some problems when there were new words containing "token" in them. Currently, we map from the vocab file all tokens containing the word token. However, for (at lease non-SentencePiece) tokenizers in Huggignface transformers, there are already two argmuments for this:
tokenizer.all_special_tokens
tokenizer.all_special_ids
Let's test and replace our implementation with the officially supported vocab arguments
tftokenizers/tftokenizers/utils.py
Line 8 in 14dc752
The text was updated successfully, but these errors were encountered: