You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We may implement our own tokenizer rather than using BertTokenizerFast.
Our own tokenizer should have the following features:
Disable word piece. Convert text to token ID character by character (e.g. tokenizer.convert_tokens_to_ids(list(input_text)))
Reimplement clean_up_tokenization method. The default method is implemented for English only. Our method may remove whitespaces and convert half-width punctuations to full-width ones.
The text was updated successfully, but these errors were encountered:
We may implement our own tokenizer rather than using BertTokenizerFast.
Our own tokenizer should have the following features:
tokenizer.convert_tokens_to_ids(list(input_text))
)clean_up_tokenization
method. The default method is implemented for English only. Our method may remove whitespaces and convert half-width punctuations to full-width ones.The text was updated successfully, but these errors were encountered: