Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyper-parameters of DeBERTa for EUR-LEX #20

Closed
cooelf opened this issue Jun 24, 2022 · 2 comments
Closed

Hyper-parameters of DeBERTa for EUR-LEX #20

cooelf opened this issue Jun 24, 2022 · 2 comments

Comments

@cooelf
Copy link

cooelf commented Jun 24, 2022

Hi, my reproduced results for EUR-LEX are quite far from the reported ones. Could you provide the hyper-parameters of DeBERTa for EUR-LEX? And which version of DeBERTa is used, V2/V3, Base/Large?

Looking forward to your reply. Thanks!

@iliaschalkidis
Copy link
Collaborator

Hi @cooelf, we used the microsoft/deberta-base from HuggingFace, which I guess is the base configuration of V1? I see that both V2 and V3 use the deberta-v2 prototype (model type) from HuggingFace.

We used a learning rate of 3e-5 across all base-sized models. No warm-up or anything else special. We also use early stopping up to 20 epochs in total with a patience for 3 epochs.

We were only able to benchmark the large version of RoBERTa, you may find the result on the Appendix of our paper. In this case, we used a learning rate of 1e-5, warm-up ratio of 0.06, and weight decay of 0.1, since we found that larger models are very unstable, and "degenerate" with larger learning rates and no warm-up.

@cooelf
Copy link
Author

cooelf commented Jun 24, 2022

Hi @iliaschalkidis, Thanks a lot for the quick reply. Yeah, I also found large models work unstably on the dataset. Maybe it is because I was using microsoft/deberta-v3-large. I will check the appendix and try the recommended settings :)

Thanks!

@cooelf cooelf closed this as completed Jun 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants