-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
训练loss异常 #5
Comments
你好,对于第一个问题,是因为我在这个commit里进行过修改 4ad76d1 第二个问题的话,是因为事实上每隔 |
关于log_freq的设置,我在 911fc6b这个新的commit里有设置,你可以仿照,但看整个epoch的loss可能会更靠谱哈 |
好的,多谢哈,那如果llm切换成Qwen-14B是不是仅需要修改llm_model名称就可?我切换为14B,loss和7B的类似,但生成的效果差别非常大,基本上不可读; |
我没有使用过Qwen-14B的模型,暂时想到你可以试着去修改、检查一下这三个地方:
|
已经更新14B版本的训练,采用DeepSpeed的流水线并行在双卡3090上训练:commit |
solved |
按照您的步骤进行复现,训练时发现loss下降很慢,此外训练完的train_loss相比您提供的也不是一个数量级(高一个数量级)
The text was updated successfully, but these errors were encountered: