few-shot实验encoder换成bert-base-cased效果差很多 #12

Life-0-1 · 2021-04-20T02:21:35Z

你好，非常感谢你们的开源代码。
在复现过程中，我产生了以下两个疑问，望解答：

few-shot实验中，把encoder从albert-xxlarge-v2改成bert-base-cased，其他不变，效果下降非常多（在wic, rte数据集上acc只有50%上下）。这仅仅是由于encoder容量的关系吗？是否还有一些重要参数需要调节？
我在用开源代码复现论文结果的时候，发现CB这个数据上结果差别很大，如图，左边是我的结果，右边是论文结果，这可能是什么原因呢？

zheng-yanan · 2021-04-21T06:36:34Z

你好，非常感谢你们的开源代码。
在复现过程中，我产生了以下两个疑问，望解答：

few-shot实验中，把encoder从albert-xxlarge-v2改成bert-base-cased，其他不变，效果下降非常多（在wic, rte数据集上acc只有50%上下）。这仅仅是由于encoder容量的关系吗？是否还有一些重要参数需要调节？

我在用开源代码复现论文结果的时候，发现CB这个数据上结果差别很大，如图，左边是我的结果，右边是论文结果，这可能是什么原因呢？

Hi!

Thanks for your attention.

In the few-shot experiments, both PET and P-tuning use albert-xxlarge-v2 to gain their respective best performance. Generally, the performance is closely related to the FLOPs of pretrained models. Since ALBERT enforces parameter-sharing, it gains better FLOPs and appears to be more efficient than BERT (of any scales).
Thanks for pointing it out. I'm sorry that I find the CB script was mistaken, and I will update it as soon as possible.

The results of current script was also recorded through comments within, which shows that your reported f1-macro is a little bit lower than the commented one. According to previous experiences, several factors could have huge influence on the final performance:

Please set exact the same version of environments as is given.
Experimental results show that in few-shot setting, number of GPUs affect a lot. For example, given batch_size = 16, the following settings would lead to totally different results.
a. 2 per_gpu_batch_size * 8 n_gpu * 1 accumulation_steps
b. 8 per_gpu_batch_size * 2 n_gpu * 1 accumulation_steps
c. 4 per_gpu_batch_size * 2 n_gpu * 2 accumulation_steps
Please keep the seed as default value.

Please feel free to share with us if there're other problems. Thank you.

Riroaki · 2021-04-29T12:53:30Z

你好，非常感谢你们的开源代码。
在复现过程中，我产生了以下两个疑问，望解答：

few-shot实验中，把encoder从albert-xxlarge-v2改成bert-base-cased，其他不变，效果下降非常多（在wic, rte数据集上acc只有50%上下）。这仅仅是由于encoder容量的关系吗？是否还有一些重要参数需要调节？

我在用开源代码复现论文结果的时候，发现CB这个数据上结果差别很大，如图，左边是我的结果，右边是论文结果，这可能是什么原因呢？

Hi!

Thanks for your attention.

In the few-shot experiments, both PET and P-tuning use albert-xxlarge-v2 to gain their respective best performance. Generally, the performance is closely related to the FLOPs of pretrained models. Since ALBERT enforces parameter-sharing, it gains better FLOPs and appears to be more efficient than BERT (of any scales).

Thanks for pointing it out. I'm sorry that I find the CB script was mistaken, and I will update it as soon as possible.

The results of current script was also recorded through comments within, which shows that your reported f1-macro is a little bit lower than the commented one. According to previous experiences, several factors could have huge influence on the final performance:

Please set exact the same version of environments as is given.

Experimental results show that in few-shot setting, number of GPUs affect a lot. For example, given batch_size = 16, the following settings would lead to totally different results.
a. 2 per_gpu_batch_size * 8 n_gpu * 1 accumulation_steps
b. 8 per_gpu_batch_size * 2 n_gpu * 1 accumulation_steps
c. 4 per_gpu_batch_size * 2 n_gpu * 2 accumulation_steps

Please keep the seed as default value.

Please feel free to share with us if there're other problems. Thank you.

你好，请问cb的训练脚本什么时候可以更新？我的结果和复现效果和论文中也有差距

zheng-yanan closed this as completed Apr 21, 2021

lovekittynine mentioned this issue Nov 21, 2022

Doesn't anyone see a problem with the code about the prompt construction for BERT-style transformer??? #40

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

few-shot实验encoder换成bert-base-cased效果差很多 #12

few-shot实验encoder换成bert-base-cased效果差很多 #12

Life-0-1 commented Apr 20, 2021

zheng-yanan commented Apr 21, 2021

Riroaki commented Apr 29, 2021

few-shot实验encoder换成bert-base-cased效果差很多 #12

few-shot实验encoder换成bert-base-cased效果差很多 #12

Comments

Life-0-1 commented Apr 20, 2021

zheng-yanan commented Apr 21, 2021

Riroaki commented Apr 29, 2021