triviaqa results not reproducible #134

songwang41 · 2020-11-10T04:47:36Z

python3 -m scripts.triviaqa  --train_dataset $QA_PATH/processed/squad-wikipedia-train-4096.json      \
--dev_dataset $QA_PATH/processed/squad-wikipedia-dev-4096.json      \
--gpus 0  --num_workers 4 \
--max_seq_len 4096 --doc_stride -1  \
--save_prefix triviaqa-longformer-large  \
--model_path models/longformer-large-4096  \
--test

during evaluation, I only got this score
{'exact_match': 0.025021894157387713, 'f1': 4.5473948151449575, 'common': 7993, 'denominator': 7993, 'pred_len': 7993, 'gold_len': 7993}

The text was updated successfully, but these errors were encountered:

ibeltagy · 2020-11-30T22:52:21Z

Are you sure models/longformer-large-4096 is the triviaqa pretrained checkpoint or the vanilla longformer?

antoniogois · 2021-01-09T19:08:21Z

I'm running into the same issue, did anyone find what was wrong?

according to cheatsheet.txt:

    --save_prefix triviaqa-longformer-large  \  # pretrained pytorch-lighting checkpoint
    --model_path path/to/pretrained/longformer-large-4096  \  # loaded but not used

but from @ibeltagy 's comment, it seems like the checkpoint should be in --model_path , which one is correct?

--model_path expects a path to a directory containing a config.json file, so I can't provide the downloaded checkpoint to that flag. But if I provide it to --save_prefix, following the cheatsheet's instructions, I get very low results, similar to @songwanguw

antoniogois · 2021-01-09T20:01:33Z

ok it seems like --save_prefix is ignored, not --model_path.
now I tried grabbing the downloaded triviaqa-longformer-large/checkpoints/_ckpt_epoch_4_v2.ckpt, renaming it to pytorch_model.bin and placing it in the folder of --model_path (which overwrites the vanilla model that I had there).

However I still get very low values. Any ideas of what to try?

antoniogois · 2021-01-13T20:09:19Z

solved.
Indeed, those very low results come from a model that wasn't fine-tuned for triviaqa. To properly load the provided checkpoint, follow cheatsheet.txt with these exceptions:

--saved_prefix choose-a-name-for-output-dir
--model_path path/to/pretrained/longformer-large-4096 # path to folder of downloaded model pretrained with Masked LM, creating your own roberta-large-4096 following "convert_model_to_long.ipynb" will not work here
--resume_ckpt path/to/triviaqa-longformer-large/checkpoints/fixed_ckpt_epoch_4_v2.ckpt  # path to downloaded model finetuned for triviaqa

however, fixed_ckpt_epoch_4_v2.ckpt will fail to load. To fix, use a python console to load the file (with torch.load()), apply these changes and save them (with torch.save()):

checkpoint["state_dict"]["model.embeddings.position_ids"] = torch.arange(4098).to('cuda').unsqueeze(0)
checkpoint["checkpoint_callback_best_model_path"]=""  # some versions of pytorch lightning may not need this

I'll create a pull request adding these comments to cheatsheet.txt

antoniogois mentioned this issue Jan 9, 2021

fail to reproduce the base model result of the TriviaQA Dataset with scripts/trivia.py. #138

Open

antoniogois added a commit to antoniogois/longformer that referenced this issue Jan 13, 2021

fix instructions to load triviaqa checkpoint allenai#134

465e76b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

triviaqa results not reproducible #134

triviaqa results not reproducible #134

songwang41 commented Nov 10, 2020 •

edited

ibeltagy commented Nov 30, 2020

antoniogois commented Jan 9, 2021

antoniogois commented Jan 9, 2021

antoniogois commented Jan 13, 2021

triviaqa results not reproducible #134

triviaqa results not reproducible #134

Comments

songwang41 commented Nov 10, 2020 • edited

ibeltagy commented Nov 30, 2020

antoniogois commented Jan 9, 2021

antoniogois commented Jan 9, 2021

antoniogois commented Jan 13, 2021

songwang41 commented Nov 10, 2020 •

edited