Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The hyper-param tuning used in your paper #20

Open
danyaljj opened this issue Jan 27, 2020 · 0 comments
Open

The hyper-param tuning used in your paper #20

danyaljj opened this issue Jan 27, 2020 · 0 comments

Comments

@danyaljj
Copy link

danyaljj commented Jan 27, 2020

I have tried your code for multiple datasets:

> python multiqa.py train --datasets SQuAD1-1  --cuda_device 0,1
> python multiqa.py train --datasets NewsQA  --cuda_device 0,1
> python multiqa.py train --datasets SearchQA  --cuda_device 0,1

Following by corresponding evaluation:

> python multiqa.py evaluate --model model --datasets SQuAD1-1 --cuda_device 0  --models_dir  'models/SQuAD1-1/'
> python multiqa.py evaluate --model model --datasets NewsQA --cuda_device 0  --models_dir  'models/NewsQA/'
> python multiqa.py evaluate --model model --datasets SearchQA --cuda_device 0  --models_dir  'models/SearchQA/'

I am getting relatively bad scores (EM/F1):

  • Squad1.1: 77.19 | 85.28
  • NewsQA: 19.51 | 30.51
  • SearchQA: 35.68 | 41.02

which suggests that I am not using proper hyper-params. Do you think that explains it?
If so, I would appreciate more clarify on this sentence from your paper: "We emphasize that in all our experiments we use exactly the same training procedure for all datasets, with minimal hyper-parameter tuning." especially with respect to "minimal hyper-parameter tuning".

@danyaljj danyaljj changed the title Not getting "good" scores The hyper-param tuning used in your paper Jan 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant