-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[论文] 面向开源领域指令微调数据集的构建以及大模型的实现 #263
Comments
拟定题目:OSATG-GPT:Instruction-tuning Large Language Models with Open Source Atom Tasks on GitHub |
discusstion数据可以通过GraphQL的方式获取,返回的数据大致如下:
具体文档:https://docs.github.com/zh/graphql/guides/using-the-graphql-api-for-discussions |
6 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
因为后续会讨论很多实验方面以及数据集构建方面的细节,故将开源领域大模型的科研进展放到open-research仓库中讨论,我根据现有实验室资源以及GitHub的现有的功能,将任务大致划分为以下部分:
任务还是有点多,可能会适当删减,我认为可以先确定方法的有效性,再一次次扩大数据集并添加更多任务,这样更稳一些
同时 @衍童 最近在看LLaMA-Factory,可以支持后续大模型的微调以及开发。
今天王老师组会上提到,针对不同的仓库可能模型设计的问题和答案是不同的,这个我也确实考虑到了,因为同一个问题在不同的仓库下是有可能都会被问到的,但因为仓库不同,所以答案不一定相同,这个需要一开始设计任务时做好处理,我计划从仓库入手获取数据集,先考虑以下几个热门仓库:
https://github.com/vercel/next.js
https://github.com/gatsbyjs/gatsby
https://github.com/nodejs/node
https://github.com/tailwindlabs/tailwindcss
https://github.com/laravel/framework
The text was updated successfully, but these errors were encountered: