Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用ERNIE进行长文本分类,若max_seq_len=512则无法fine-tune且超慢 #187

Closed
Biaocsu opened this issue Jul 2, 2019 · 11 comments
Assignees

Comments

@Biaocsu
Copy link

Biaocsu commented Jul 2, 2019

因为是长文本,我将max_seq_len设置为512,由于显存限制batch_size设置10,epoch=3。利用ERNIE进行fine-tune时15个小时一个epoch都没跑完(大概1/2个epoch),然而利用pytorch_bert同样时间已经fine-tune结束了。请问有什么好的建议吗?

@Biaocsu
Copy link
Author

Biaocsu commented Jul 2, 2019

数据量有点大,train.tsv大概有17万数据,其他设置默认不变,pytorch_bert也是一样的设置

@Biaocsu
Copy link
Author

Biaocsu commented Jul 4, 2019

@kuke what's more, I checked the log this morning, I have fine-tuned it for there days, and now it runs at epoch: 1, progress: 127260/177258, step: 30400, ave loss: 0.628227, ave acc: 0.500000, speed: 0.636651 steps/s. what really makes me confused is that the result is good by refering to dev evaluation and test evaluation about 0.95 at epoch 0. now it is at epoch 1 and it looks like this [dev evaluation] ave loss: 0.673317, ave acc: 0.492638, data_num: 17727, elapsed time: 705.489035 s [test evaluation] ave loss: 0.671617, ave acc: 0.500254, data_num: 1969, elapsed time: 78.672848 s . so what's wrong?

@tianjie491
Copy link

升级paddle1.5.0试试吧,我做dbqa用1.3.0训练3轮,用时26小时,现在用6小时就可以了,数据量也挺大的,5倍负采样,总数据大约28万。

@Biaocsu
Copy link
Author

Biaocsu commented Jul 4, 2019

@tianjie491 我用的就是paddle1.5.0。做的文本分类任务,现在将skip_steps设置为100要快点,但还没fine-tune结束。前几天fine-tune时,epoch 0 日志完全正常,到epoch 1 日志就显示微调出问题了,准确率直接50%

@Biaocsu
Copy link
Author

Biaocsu commented Jul 4, 2019

@tianjie491 另外你做的dbpa文本比较短啊,我之前利用ERNIE做过语义相似度任务,也没问题

@tianxin1860
Copy link
Collaborator

@tianjie491 我用的就是paddle1.5.0。做的文本分类任务,现在将skip_steps设置为100要快点,但还没fine-tune结束。前几天fine-tune时,epoch 0 日志完全正常,到epoch 1 日志就显示微调出问题了,准确率直接50%

检查一下 loss 收敛曲线,看看是否在某一步出现了 loss 陡增或者 nan 的情况;

@tianjie491
Copy link

可以调小学习率试试,adam学习率大了也会出问题。或者还可以试试加大batch size试试

@Biaocsu
Copy link
Author

Biaocsu commented Jul 4, 2019

我调小学习率试试,batch_size无法再调大了,显存只有16G,max_seq_len=512,之前试过batch_size=12 会OOM

@Biaocsu Biaocsu changed the title 使用ERNIE进行文本分类fine-tune耗时太久 使用ERNIE进行文本分类,若max_seq_len=512则无法fine-tune且超慢 Jul 5, 2019
@Biaocsu
Copy link
Author

Biaocsu commented Jul 5, 2019

将参数进行更改,多次fine-tune,发现当max_seq_len=512时,都会出现epoch=1时acc直接降为0.5。下面是log信息:
nohup.txt

@Biaocsu Biaocsu changed the title 使用ERNIE进行文本分类,若max_seq_len=512则无法fine-tune且超慢 使用ERNIE进行长文本分类,若max_seq_len=512则无法fine-tune且超慢 Jul 5, 2019
@tianxin1860
Copy link
Collaborator

将参数进行更改,多次fine-tune,发现当max_seq_len=512时,都会出现epoch=1时acc直接降为0.5。下面是log信息:
nohup.txt

你的数据训练过程中 loss 频繁发生抖动,检查下你的训练数据是否包含脏数据;

@Biaocsu
Copy link
Author

Biaocsu commented Jul 9, 2019

嗯嗯,我发现了,已经训练完了,唯一缺点就是微调太慢了

@Biaocsu Biaocsu closed this as completed Jul 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants