You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, my reproduced results for EUR-LEX are quite far from the reported ones. Could you provide the hyper-parameters of DeBERTa for EUR-LEX? And which version of DeBERTa is used, V2/V3, Base/Large?
Looking forward to your reply. Thanks!
The text was updated successfully, but these errors were encountered:
Hi @cooelf, we used the microsoft/deberta-base from HuggingFace, which I guess is the base configuration of V1? I see that both V2 and V3 use the deberta-v2 prototype (model type) from HuggingFace.
We used a learning rate of 3e-5 across all base-sized models. No warm-up or anything else special. We also use early stopping up to 20 epochs in total with a patience for 3 epochs.
We were only able to benchmark the large version of RoBERTa, you may find the result on the Appendix of our paper. In this case, we used a learning rate of 1e-5, warm-up ratio of 0.06, and weight decay of 0.1, since we found that larger models are very unstable, and "degenerate" with larger learning rates and no warm-up.
Hi @iliaschalkidis, Thanks a lot for the quick reply. Yeah, I also found large models work unstably on the dataset. Maybe it is because I was using microsoft/deberta-v3-large. I will check the appendix and try the recommended settings :)
Hi, my reproduced results for EUR-LEX are quite far from the reported ones. Could you provide the hyper-parameters of DeBERTa for EUR-LEX? And which version of DeBERTa is used, V2/V3, Base/Large?
Looking forward to your reply. Thanks!
The text was updated successfully, but these errors were encountered: