关于论文的一些疑惑 #5

bojone · 2021-03-24T10:43:18Z

你好，有幸读到贵作《GPT Understands, Too》，确实很不错。在阅读过程中，主要有两个疑问，烦请指点。

1、prompt直接通过embedding优化生成，跟原论文中使用LSTM生成，效果差距有多大呢？论文似乎并没有对比两者的差距。

2、关于superglue的各个任务的template，能否简单罗列一下？我只看到LAMA那里写了(3, sub, 3, obj, 3)和(3, sub, 3, obj)，其他任务未见。

bojone · 2021-03-24T14:57:14Z

此外，我发现如果固定预训练模型权重不变，只优化prompt，那么对于特定的预训练模型来说，性能会存在上限，即全量数据训练的情况下比不上直接finetune整个模型（哪怕想过拟合都不行）。按照我的想法，这个现象应该是很普遍的，但我看贵作在superglue的实验结果，多数任务下p-tuning还优于直接finetune，这点应该怎么理解。

Xiao9905 · 2021-03-25T02:29:43Z

你好，有幸读到贵作《GPT Understands, Too》，确实很不错。在阅读过程中，主要有两个疑问，烦请指点。

1、prompt直接通过embedding优化生成，跟原论文中使用LSTM生成，效果差距有多大呢？论文似乎并没有对比两者的差距。

2、关于superglue的各个任务的template，能否简单罗列一下？我只看到LAMA那里写了(3, sub, 3, obj, 3)和(3, sub, 3, obj)，其他任务未见。

这部分我们会补充实验结果，比较明确的结论是对于小模型主要是训练速度更慢，最终效果接近；对于大模型直接embedding会比LSTM差。There will be additional experiment results on the comparison of direct embedding and LSTM encoder. A general conclusion is that for small pre-trained models, direct embedding converges slower but has a similar performance to LSTM, while in large models direct embedding shows a poorer performance.
我们会之后补充在附录部分。We will present it in the appendix later.

Xiao9905 · 2021-03-25T02:32:45Z

此外，我发现如果固定预训练模型权重不变，只优化prompt，那么对于特定的预训练模型来说，性能会存在上限，即全量数据训练的情况下比不上直接finetune整个模型（哪怕想过拟合都不行）。按照我的想法，这个现象应该是很普遍的，但我看贵作在superglue的实验结果，多数任务下p-tuning还优于直接finetune，这点应该怎么理解。

在SuperGlue部分的实验设定中，我们已经明确了需要同时fine-tune预训练模型，可以参考#4。
In the section for the SuperGlue experiment setting, we clarify that we need to fine-tune pre-trained models while using P-tuning. For more details please refer to #4.

bojone · 2021-03-25T02:37:58Z

此外，我发现如果固定预训练模型权重不变，只优化prompt，那么对于特定的预训练模型来说，性能会存在上限，即全量数据训练的情况下比不上直接finetune整个模型（哪怕想过拟合都不行）。按照我的想法，这个现象应该是很普遍的，但我看贵作在superglue的实验结果，多数任务下p-tuning还优于直接finetune，这点应该怎么理解。

在SuperGlue部分的实验设定中，我们已经明确了需要同时fine-tune预训练模型，可以参考#4。
In the section for the SuperGlue experiment setting, we clarify that we need to fine-tune pre-trained models while using P-tuning. For more details please refer to #4.

谢谢，是我疏忽了。我以为SuperGlue和LAMA的设置是一样的，而LAMA的Table 2显然是固定了语言模型的。

bojone · 2021-03-25T03:15:50Z

#4

你好，那最后再确认一下，在superglue任务中，是先固定预训练模型找出prompt然后再finetune整个模型，还是找prompt和finetune模型同时做的呢？

zheng-yanan · 2021-03-25T06:58:42Z

#4

你好，那最后再确认一下，在superglue任务中，是先固定预训练模型找出prompt然后再finetune整个模型，还是找prompt和finetune模型同时做的呢？

在SuperGLUE任务中，找prompt和fine-tune模型是同时进行的。
In the SuperGLUE experiments, searching prompts and fine-tuning are conducted simultaneously.

chestnut111 · 2023-05-24T02:27:12Z

想了解一下，现在有chatgpt 3.5， 4这种比较强的工具了是不是可以不需要p-tuning这种技巧了？

Xiao9905 closed this as completed Apr 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于论文的一些疑惑 #5

关于论文的一些疑惑 #5

bojone commented Mar 24, 2021

bojone commented Mar 24, 2021 •

edited

Loading

Xiao9905 commented Mar 25, 2021

Xiao9905 commented Mar 25, 2021

bojone commented Mar 25, 2021

bojone commented Mar 25, 2021

zheng-yanan commented Mar 25, 2021

chestnut111 commented May 24, 2023

关于论文的一些疑惑 #5

关于论文的一些疑惑 #5

Comments

bojone commented Mar 24, 2021

bojone commented Mar 24, 2021 • edited Loading

Xiao9905 commented Mar 25, 2021

Xiao9905 commented Mar 25, 2021

bojone commented Mar 25, 2021

bojone commented Mar 25, 2021

zheng-yanan commented Mar 25, 2021

chestnut111 commented May 24, 2023

bojone commented Mar 24, 2021 •

edited

Loading