-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Customized loss value #52
Comments
你好,我建议不要使用 |
你好啊!想请教一下怎么自己gather呢?我看ds_tensor里面貌似是打乱过的parameters。这里我还是想知道某个参数具体在llama里面的位置的(比如在mlp/self_attention的哪个layer)。谢谢! |
还是可以使用 |
Sorry for the late response! 我用了推荐的办法,gather好了param.ds_tensor。可是在做backward的时候,还是遇到了跟最开始一样的问题(lomo.py里面grad_norm()的 我猜还是deepspeed在做backward的时候,找不到这些ds_tensor。不知道我理解的是否正确或者有什么解决办法吗?谢谢! |
full parameter update里面,我最近在试一种新的loss function,就是在原有的next token prediction上面加一个regularized term,希望某些layer的weights能尽可能小。可是总会遇到一些奇怪的bug。下面是我加的代码和遇到的error:
比如在lomo_trainer.py中:
可是这样做完,会造成lomo.py里面grad_norm()的
loss.backward(retain_graph=True)
产生RuntimeError: The size of tensor a (0) must match the size of tensor b (4096) at non-singleton dimension 1
的错误。我猜是backward的时候找不到我新加的那些layer的weights。想请问一下,该怎么解决这个bug或者有没有更好的implementation?非常感谢!
The text was updated successfully, but these errors were encountered: