Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

无法找到知识增强预训练的数据 #29

Open
nuoma opened this issue May 9, 2023 · 2 comments
Open

无法找到知识增强预训练的数据 #29

nuoma opened this issue May 9, 2023 · 2 comments

Comments

@nuoma
Copy link

nuoma commented May 9, 2023

你好,我无法找到文件: data_path=/wjn/nlp_task_datasets/kg-pre-trained-corpus/total_pretrain_kgicl_gpt,感觉看的有点模糊,麻烦指个路,谢谢!

@wjn1996
Copy link
Contributor

wjn1996 commented May 9, 2023

您好,这个数据对应的工作还在投中,所以暂未开源。数据格式本质上和gpt的训练语料一样。

@nuoma
Copy link
Author

nuoma commented May 13, 2023

是指预训练阶段的语料(wudao,pile),一堆txt文件,每个文件里每行就是一句话这种吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants