-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lower performance with retrained model #2
Comments
We do not employ iBLEU to evaluate our model, so I think you may have chosen the wrong evaluation matric. |
Thanks for your response - iBLEU is just a weighted difference between BLEU and self-BLEU, which you do report in the paper. I get the following scores on MSCOCO when I train your model from scratch (using the command above), after 10 rounds: I'm trying to train your model on another dataset (that you don't use in your paper), and the performance is currently much worse than the other comparison systems. So I wanted to check that I was training in the correct way, to make a fair comparison - please let me know if I should be doing anything different? |
I cannot hand your problem as lacking of your datasets. But I notice your batch size is too small, so I suggest you to increase it. |
Or you can try to close the diversity coef, in line85 of utils/run.py. This is used to solve the problem of lack of diversity in the first word. |
Thanks - I will try reducing the length limit and increasing the batch size. |
I've been able to train to completion using a batch size of 32 - but I now get BLEU and Self-BLEU scores of 0. It looks like training is stable at the start, but validation scores go to 0 about halfway through. Does the training script not use early stopping? How should I pick the number of training epochs? |
When I use a checkpoint that I've trained from scratch instead of the checkpoint downloaded from here, performance is ~2 iBLEU lower. The command used to train the model was:
Are there additional hyperparameters that I need to set?
The text was updated successfully, but these errors were encountered: