The hyper-param tuning used in your paper #20

danyaljj · 2020-01-27T21:20:22Z

I have tried your code for multiple datasets:

> python multiqa.py train --datasets SQuAD1-1  --cuda_device 0,1
> python multiqa.py train --datasets NewsQA  --cuda_device 0,1
> python multiqa.py train --datasets SearchQA  --cuda_device 0,1

Following by corresponding evaluation:

> python multiqa.py evaluate --model model --datasets SQuAD1-1 --cuda_device 0  --models_dir  'models/SQuAD1-1/'
> python multiqa.py evaluate --model model --datasets NewsQA --cuda_device 0  --models_dir  'models/NewsQA/'
> python multiqa.py evaluate --model model --datasets SearchQA --cuda_device 0  --models_dir  'models/SearchQA/'

I am getting relatively bad scores (EM/F1):

Squad1.1: 77.19 | 85.28
NewsQA: 19.51 | 30.51
SearchQA: 35.68 | 41.02

which suggests that I am not using proper hyper-params. Do you think that explains it?
If so, I would appreciate more clarify on this sentence from your paper: "We emphasize that in all our experiments we use exactly the same training procedure for all datasets, with minimal hyper-parameter tuning." especially with respect to "minimal hyper-parameter tuning".

The text was updated successfully, but these errors were encountered:

danyaljj changed the title ~~Not getting "good" scores~~ The hyper-param tuning used in your paper Jan 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The hyper-param tuning used in your paper #20

The hyper-param tuning used in your paper #20

danyaljj commented Jan 27, 2020 •

edited

The hyper-param tuning used in your paper #20

The hyper-param tuning used in your paper #20

Comments

danyaljj commented Jan 27, 2020 • edited

danyaljj commented Jan 27, 2020 •

edited