Few-shot NLU: learning rate for model parameters vs. embedding parameters #7

nelson-liu · 2021-04-04T23:56:34Z

Hi!

Thanks for the interesting paper and releasing this nice codebase! I had a quick question with respect to the learning rate used for the fewshot NLU experiments. The paper mentions (Section 4.2) that:

We perform grid search of hyper-parameters and take the best combination on Ddev or Ddev32. Specifically, we take learning rates from 1e-5, 2e-5, 3e-5 and batch sizes from 16, 32

However, it seems like the model is updated with a fixed learning rate of 1e-5 in the code ( https://github.com/THUDM/P-tuning/blob/main/PT-Fewshot/pet/wrapper.py#L312 ) , and the learning rate taken from the CLI is only used for the embedding parameters.

Given that the paper and code seem to differ in this regard, I'm not sure if this is a bug in the code (i.e., the model and the embedding parameters should always use the LR taken from the CLI) or if the paper omits this detail (i.e., in reality, the LR grid search is only done on the embedding parameters, and 1e-5 is always used for the model). Could yo clarify which approach was taken in your experiments?

Thanks again!

nelson-liu · 2021-04-05T04:02:13Z

ah, rereading that passage, am I correct in that the grid-search is not used in the few-shot setting (and the default hyperparameters from PET are used)?

zheng-yanan · 2021-04-05T10:34:18Z

ah, rereading that passage, am I correct in that the grid-search is not used in the few-shot setting (and the default hyperparameters from PET are used)?

Hi!

Yes, in the few-shot setting, the hyperparameters from PET are used and we additionally select hyperparameters for prompt-related ones. Actually, we've experimented to use the same/different learning rates for both the backbone and prompt embeddings, and find that using different learning rates yields better performance in the few-shot setting. The grid-search mentioned in the paper was used in the fully supervised setting.

Thank you.

Xiao9905 closed this as completed Apr 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Few-shot NLU: learning rate for model parameters vs. embedding parameters #7

Few-shot NLU: learning rate for model parameters vs. embedding parameters #7

nelson-liu commented Apr 4, 2021 •

edited

nelson-liu commented Apr 5, 2021

zheng-yanan commented Apr 5, 2021

Few-shot NLU: learning rate for model parameters vs. embedding parameters #7

Few-shot NLU: learning rate for model parameters vs. embedding parameters #7

Comments

nelson-liu commented Apr 4, 2021 • edited

nelson-liu commented Apr 5, 2021

zheng-yanan commented Apr 5, 2021

nelson-liu commented Apr 4, 2021 •

edited