Code breaks using different model like roberta, xlm-roberta in finetuning #61

Tacacs-1101 · 2021-11-18T08:57:02Z

Code breaks using a different model other than BERT. I debugged into the code and found that the code is written with respect to BERT tokenizer only while the tokenizers of other transformer models are different. Below snippet in helpers.py

if BERT_TOKENIZER is None:  # gets initialized during the first call to this method
    if bert_pretrained_name_or_path:
        BERT_TOKENIZER = transformers.BertTokenizer.from_pretrained(bert_pretrained_name_or_path)
        BERT_TOKENIZER.do_basic_tokenize = True
        BERT_TOKENIZER.tokenize_chinese_chars = False
    else:
        BERT_TOKENIZER = transformers.BertTokenizer.from_pretrained('bert-base-cased')
        BERT_TOKENIZER.do_basic_tokenize = True
        BERT_TOKENIZER.tokenize_chinese_chars = False

The text was updated successfully, but these errors were encountered:

UthpalaIsiru · 2023-04-01T17:28:43Z

I also got the same issue. Anyone resolved this?

RK-BAKU · 2023-04-20T05:40:58Z

Code breaks using a different model other than BERT. I debugged into the code and found that the code is written with respect to BERT tokenizer only while the tokenizers of other transformer models are different. Below snippet in helpers.py
if BERT_TOKENIZER is None:  # gets initialized during the first call to this method
    if bert_pretrained_name_or_path:
        BERT_TOKENIZER = transformers.BertTokenizer.from_pretrained(bert_pretrained_name_or_path)
        BERT_TOKENIZER.do_basic_tokenize = True
        BERT_TOKENIZER.tokenize_chinese_chars = False
    else:
        BERT_TOKENIZER = transformers.BertTokenizer.from_pretrained('bert-base-cased')
        BERT_TOKENIZER.do_basic_tokenize = True
        BERT_TOKENIZER.tokenize_chinese_chars = False

Faced with the same issue. This is a quick fix:

BERT_TOKENIZER = transformers.BertTokenizer.from_pretrained(bert_pretrained_name_or_path)

Replace BertTokenizer with XLMRobertaTokenizer:

BERT_TOKENIZER = transformers.XLMRobertaTokenizer.from_pretrained(bert_pretrained_name_or_path)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code breaks using different model like roberta, xlm-roberta in finetuning #61

Code breaks using different model like roberta, xlm-roberta in finetuning #61

Tacacs-1101 commented Nov 18, 2021

UthpalaIsiru commented Apr 1, 2023

RK-BAKU commented Apr 20, 2023 •

edited

Code breaks using different model like roberta, xlm-roberta in finetuning #61

Code breaks using different model like roberta, xlm-roberta in finetuning #61

Comments

Tacacs-1101 commented Nov 18, 2021

UthpalaIsiru commented Apr 1, 2023

RK-BAKU commented Apr 20, 2023 • edited

RK-BAKU commented Apr 20, 2023 •

edited