Use built in function to access special tokens and ids #16

MarkusSagen · 2022-03-13T10:29:50Z

Current implementation for mapping the tokens to their ids caused some problems when there were new words containing "token" in them. Currently, we map from the vocab file all tokens containing the word token. However, for (at lease non-SentencePiece) tokenizers in Huggignface transformers, there are already two argmuments for this:

tokenizer.all_special_tokens
tokenizer.all_special_ids

Let's test and replace our implementation with the officially supported vocab arguments

tftokenizers/tftokenizers/utils.py

Line 8 in 14dc752

def map_special_tokens_to_ids(

The text was updated successfully, but these errors were encountered:

MarkusSagen self-assigned this Mar 13, 2022

MarkusSagen added bug Something isn't working enhancement New feature or request feature request New feature or functionality wanted labels Mar 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use built in function to access special tokens and ids #16

Use built in function to access special tokens and ids #16

MarkusSagen commented Mar 13, 2022

Use built in function to access special tokens and ids #16

Use built in function to access special tokens and ids #16

Comments

MarkusSagen commented Mar 13, 2022