Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

few-shot实验encoder换成bert-base-cased效果差很多 #12

Closed
Life-0-1 opened this issue Apr 20, 2021 · 2 comments
Closed

few-shot实验encoder换成bert-base-cased效果差很多 #12

Life-0-1 opened this issue Apr 20, 2021 · 2 comments

Comments

@Life-0-1
Copy link

你好,非常感谢你们的开源代码。
在复现过程中,我产生了以下两个疑问,望解答:

  1. few-shot实验中,把encoder从albert-xxlarge-v2改成bert-base-cased,其他不变,效果下降非常多(在wic, rte数据集上acc只有50%上下)。这仅仅是由于encoder容量的关系吗?是否还有一些重要参数需要调节?
  2. 我在用开源代码复现论文结果的时候,发现CB这个数据上结果差别很大,如图,左边是我的结果,右边是论文结果,这可能是什么原因呢?
    image
@zheng-yanan
Copy link
Contributor

你好,非常感谢你们的开源代码。
在复现过程中,我产生了以下两个疑问,望解答:

  1. few-shot实验中,把encoder从albert-xxlarge-v2改成bert-base-cased,其他不变,效果下降非常多(在wic, rte数据集上acc只有50%上下)。这仅仅是由于encoder容量的关系吗?是否还有一些重要参数需要调节?
  2. 我在用开源代码复现论文结果的时候,发现CB这个数据上结果差别很大,如图,左边是我的结果,右边是论文结果,这可能是什么原因呢?
    image

Hi!

Thanks for your attention.

  1. In the few-shot experiments, both PET and P-tuning use albert-xxlarge-v2 to gain their respective best performance. Generally, the performance is closely related to the FLOPs of pretrained models. Since ALBERT enforces parameter-sharing, it gains better FLOPs and appears to be more efficient than BERT (of any scales).

  2. Thanks for pointing it out. I'm sorry that I find the CB script was mistaken, and I will update it as soon as possible.

The results of current script was also recorded through comments within, which shows that your reported f1-macro is a little bit lower than the commented one. According to previous experiences, several factors could have huge influence on the final performance:

  1. Please set exact the same version of environments as is given.
  2. Experimental results show that in few-shot setting, number of GPUs affect a lot. For example, given batch_size = 16, the following settings would lead to totally different results.
    a. 2 per_gpu_batch_size * 8 n_gpu * 1 accumulation_steps
    b. 8 per_gpu_batch_size * 2 n_gpu * 1 accumulation_steps
    c. 4 per_gpu_batch_size * 2 n_gpu * 2 accumulation_steps
  3. Please keep the seed as default value.

Please feel free to share with us if there're other problems. Thank you.

@Riroaki
Copy link

Riroaki commented Apr 29, 2021

你好,非常感谢你们的开源代码。
在复现过程中,我产生了以下两个疑问,望解答:

  1. few-shot实验中,把encoder从albert-xxlarge-v2改成bert-base-cased,其他不变,效果下降非常多(在wic, rte数据集上acc只有50%上下)。这仅仅是由于encoder容量的关系吗?是否还有一些重要参数需要调节?
  2. 我在用开源代码复现论文结果的时候,发现CB这个数据上结果差别很大,如图,左边是我的结果,右边是论文结果,这可能是什么原因呢?
    image

Hi!

Thanks for your attention.

  1. In the few-shot experiments, both PET and P-tuning use albert-xxlarge-v2 to gain their respective best performance. Generally, the performance is closely related to the FLOPs of pretrained models. Since ALBERT enforces parameter-sharing, it gains better FLOPs and appears to be more efficient than BERT (of any scales).
  2. Thanks for pointing it out. I'm sorry that I find the CB script was mistaken, and I will update it as soon as possible.

The results of current script was also recorded through comments within, which shows that your reported f1-macro is a little bit lower than the commented one. According to previous experiences, several factors could have huge influence on the final performance:

  1. Please set exact the same version of environments as is given.
  2. Experimental results show that in few-shot setting, number of GPUs affect a lot. For example, given batch_size = 16, the following settings would lead to totally different results.
    a. 2 per_gpu_batch_size * 8 n_gpu * 1 accumulation_steps
    b. 8 per_gpu_batch_size * 2 n_gpu * 1 accumulation_steps
    c. 4 per_gpu_batch_size * 2 n_gpu * 2 accumulation_steps
  3. Please keep the seed as default value.

Please feel free to share with us if there're other problems. Thank you.

你好,请问cb的训练脚本什么时候可以更新?我的结果和复现效果和论文中也有差距

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants