Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PaddlePaddle Hackathon 2】78、基于 PaddleNLP 语义索引实现 Gradient Cache 策略,实现超大 batch 语义索引模型训练 #1736

Closed
TCChenlong opened this issue Mar 8, 2022 · 5 comments
Assignees

Comments

@TCChenlong
Copy link

TCChenlong commented Mar 8, 2022

(此 ISSUE 为 PaddlePaddle Hackathon 第二期活动的任务 ISSUE,更多详见 【PaddlePaddle Hackathon 第二期】任务总览

【任务说明】

  • 任务标题:基于 PaddleNLP 语义索引实现 Gradient Cache 策略,实现超大 batch 语义索引模型训练

  • 技术标签:python、语义索引

  • 任务难度:困难

  • 详细描述:语义索引模型的效果受 batch_size 影响很大,一般 batch_size 越大模型效果越好,但是受限于 GPU 显存大小,batch_size 在普通硬件上往往无法开到很大;这篇 paper(Paper: https://arxiv.org/pdf/2101.06983.pdf) 提出的 Gradient Cache 算法可以有效扩展 batch_size , 在显存较小条件下也能实现大 batch 语义索引模型训练。

【提交内容】

  • 任务 PR 到 PaddleNLP

  • 相关技术文档(模型效果验证符合预期)

【技术要求】

  • 熟练掌握 python

  • 理解深度学习模型原理

  • 了解语义索引模型基础算法(非必须)

【参考资料】

【答疑交流】

  • 如果在开发中对于上述任务有任何问题,欢迎在本 ISSUE 下留言交流。
  • 对于开发中的共性问题,在活动过程中,会定期组织答疑,请大家关注官网&QQ群的通知,及时参与。
@Zhiyuan-Fan
Copy link
Contributor

请问Gradient Cache 策略是在baseline上实现还是将其加入到HardestNeg,In-batch negatives策略的训练过程中?

@tianxin1860
Copy link

Baseline 上实现,验证效果符合预期。

@Zhiyuan-Fan
Copy link
Contributor

已提交pr

@github-actions
Copy link

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

@github-actions github-actions bot added the stale label Jan 11, 2023
@github-actions
Copy link

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants