DocBERT #59

Zhou3983 · 2020-05-18T15:04:10Z

I tried to run DocBERT with
python -m models.bert --dataset Reuters --model bert-base-uncased --max-seq-length 256 --batch-size 16 --lr 2e-5 --epochs 30
, but got the following error:

r66xu@hedwig:~/RX/hedwig$ python -m models.bert --dataset Reuters --model bert-base-uncased --max-seq-length 256 --batch-size 16 --lr 2e-5 --epochs 30
Device: CUDA
Number of GPUs: 1
FP16: False
Traceback (most recent call last):
  File "/jet/var/python/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/jet/var/python/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/r66xu/RX/hedwig/models/bert/__main__.py", line 75, in <module>
    tokenizer = BertTokenizer.from_pretrained(pretrained_vocab_path)
  File "/jet/var/python/lib/python3.6/site-packages/transformers/tokenization_utils.py", line 282, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "/jet/var/python/lib/python3.6/site-packages/transformers/tokenization_utils.py", line 346, in _from_pretrained
    list(cls.vocab_files_names.values())))
OSError: Model name '../hedwig-data/models/bert_pretrained/bert-base-uncased-vocab.txt' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased). We assumed '../hedwig-data/models/bert_pretrained/bert-base-uncased-vocab.txt' was a path or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.

To fix it, I downloaded the pre-trained model for BERT and moved it in Hedwig-data, then it's able to run. Is it the correct way to fix this?

The text was updated successfully, but these errors were encountered:

mrkarezina · 2020-05-20T01:01:26Z

I encountered this as well. Would it be possible to just add a command line argument for an option to download the hugging face model weights for the specified BERT model instead of looking for them locally?

I tried implementing this to make it easier to work on a Colab notebook. #58

achyudh · 2020-05-21T20:46:25Z

@richard3983 yup that's the right way to fix it. The simplest way to solve this issue would be to add the pre-trained model weights to https://git.uwaterloo.ca/jimmylin/hedwig-data

achyudh · 2020-05-29T21:19:08Z

Just an update. I've added the weights to the data repo (https://git.uwaterloo.ca/jimmylin/hedwig-data) so it should work out of the box now without needing the fix suggested by @mrkarezina

Zhou3983 closed this as completed May 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DocBERT #59

DocBERT #59

Zhou3983 commented May 18, 2020

mrkarezina commented May 20, 2020

achyudh commented May 21, 2020

achyudh commented May 29, 2020

DocBERT #59

DocBERT #59

Comments

Zhou3983 commented May 18, 2020

mrkarezina commented May 20, 2020

achyudh commented May 21, 2020

achyudh commented May 29, 2020