New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pretrain代码执行不通问题 #642
Comments
这份代码之前是在 |
@Meiyim 那里continue的话是数据没做好还是需要改动哪里呢 |
可能是每一个doc的sentence太短,或者是max_seqlen太大。导致前处理代码无法填充满buffer,所以一直在continue。 |
@Meiyim doc sentence长度到315, max_seqlen设置128、8还是一样呢 |
一个buffer不能包含一个完整句子,这样也是不行的。所以你可以把max_seqlen设置为512试试 |
@Meiyim 512也不行呢 |
@Meiyim 方便复现看看这个问题吗,数据就是readme中的两句,python 3.6.2,paddlepaddle-gpu 2.0.1,cuda 10.0 |
这个数据的格式是,换行分句。空行分doc。 你可以按照格式 重新组织一下你的数据:
|
@Meiyim 数据为这样还是一样呢 |
@Meiyim 可以麻烦再看一下吗 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reopen it. Thank you for your contributions. |
哈咯,你最终解决这个问题了吗?我也想重新预训练。按照你的操作,卡的位置跟你一样。 |
我也没得到回复呢,最后使用原repo代码跑的 |
这个repro可以吗,可以直接用develop处理好的.gz数据吗? |
Code中存在下面2个错误
from propeller.data import feature_pb2, example_pb2
路径错误
修改后执行一直卡在如图界面,python 3.6.2,paddlepaddle-gpu 2.0.1
执行顺序按照readme
python3 ./demo/pretrain/make_pretrain_data.py ./demo/pretrain/test.txt ./demo/pretrain/output_file.gz --vocab /home/ytkj/root1/lizheng/2021April/MRC/experiment/DuReader-Checklist-BASELINE/finetuned_model/vocab.txt
python3 -m paddle.distributed.launch ./demo/pretrain/pretrain.py --data_dir "/home/ytkj/root1/lizheng/2021April/pretrain/ERNIE-develop/demo/pretrain/output_file.gz" --from_pretrained /home/ytkj/root1/lizheng/2021April/model/model-ernie1.0.1 --save_dir ./demo/pretrain/Output
使用数据为readme给出示例
执行view_pretrian_gz.py显示如下,未出现空的问题
降低paddle版本到1.8又出现版本报错
RuntimeError: propeller 0.2 requires paddle 2.0+, got 1.8.5
The text was updated successfully, but these errors were encountered: