Hyper-parameters of DeBERTa for EUR-LEX #20

cooelf · 2022-06-24T07:39:52Z

Hi, my reproduced results for EUR-LEX are quite far from the reported ones. Could you provide the hyper-parameters of DeBERTa for EUR-LEX? And which version of DeBERTa is used, V2/V3, Base/Large?

Looking forward to your reply. Thanks!

iliaschalkidis · 2022-06-24T07:54:52Z

Hi @cooelf, we used the microsoft/deberta-base from HuggingFace, which I guess is the base configuration of V1? I see that both V2 and V3 use the deberta-v2 prototype (model type) from HuggingFace.

We used a learning rate of 3e-5 across all base-sized models. No warm-up or anything else special. We also use early stopping up to 20 epochs in total with a patience for 3 epochs.

We were only able to benchmark the large version of RoBERTa, you may find the result on the Appendix of our paper. In this case, we used a learning rate of 1e-5, warm-up ratio of 0.06, and weight decay of 0.1, since we found that larger models are very unstable, and "degenerate" with larger learning rates and no warm-up.

cooelf · 2022-06-24T08:06:14Z

Hi @iliaschalkidis, Thanks a lot for the quick reply. Yeah, I also found large models work unstably on the dataset. Maybe it is because I was using microsoft/deberta-v3-large. I will check the appendix and try the recommended settings :)

Thanks!

cooelf closed this as completed Jun 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyper-parameters of DeBERTa for EUR-LEX #20

Hyper-parameters of DeBERTa for EUR-LEX #20

cooelf commented Jun 24, 2022

iliaschalkidis commented Jun 24, 2022

cooelf commented Jun 24, 2022

Hyper-parameters of DeBERTa for EUR-LEX #20

Hyper-parameters of DeBERTa for EUR-LEX #20

Comments

cooelf commented Jun 24, 2022

iliaschalkidis commented Jun 24, 2022

cooelf commented Jun 24, 2022