Customized loss value #52

ZN1010 · 2023-08-22T06:44:03Z

full parameter update里面，我最近在试一种新的loss function，就是在原有的next token prediction上面加一个regularized term，希望某些layer的weights能尽可能小。可是总会遇到一些奇怪的bug。下面是我加的代码和遇到的error：

比如在lomo_trainer.py中：

lamda, regularization = 1, torch.tensor(0, requires_grad=True, dtype=torch.float32)
self.model.train()
for name, param in self.model.named_parameters():
    if "self_attn.q_proj" in name:
        with GatheredParameters(param):
            regularization = regularization + torch.mean(param)
...
loss = get_loss(outs.logits, batch['labels'], self.training_args.clip_loss_value) + lamda * regularization

可是这样做完，会造成lomo.py里面grad_norm()的loss.backward(retain_graph=True)产生RuntimeError: The size of tensor a (0) must match the size of tensor b (4096) at non-singleton dimension 1的错误。我猜是backward的时候找不到我新加的那些layer的weights。想请问一下，该怎么解决这个bug或者有没有更好的implementation？

非常感谢！

The text was updated successfully, but these errors were encountered:

KaiLv69 · 2023-08-22T09:18:31Z

你好，我建议不要使用GatheredParameters，而是用torch.mean(param.ds_tensor)然后自己gather

ZN1010 · 2023-08-22T18:13:00Z

你好，我建议不要使用GatheredParameters，而是用torch.mean(param.ds_tensor)然后自己gather

你好啊！想请教一下怎么自己gather呢？我看ds_tensor里面貌似是打乱过的parameters。这里我还是想知道某个参数具体在llama里面的位置的（比如在mlp/self_attention的哪个layer）。谢谢！

KaiLv69 · 2023-08-25T12:37:56Z

还是可以使用if "self_attn.q_proj" in name:来判断参数的名字。ds_tensor里面是被切分过的参数，大小是参数本来大小除以GPU的数量。gather的话可以使用torch的gather API就行

ZN1010 · 2023-09-08T11:14:33Z

Sorry for the late response! 我用了推荐的办法，gather好了param.ds_tensor。可是在做backward的时候，还是遇到了跟最开始一样的问题（lomo.py里面grad_norm()的loss.backward(retain_graph=True)产生RuntimeError: The size of tensor a (0) must match the size of tensor b (4096) at non-singleton dimension 1的）。

我猜还是deepspeed在做backward的时候，找不到这些ds_tensor。不知道我理解的是否正确或者有什么解决办法吗？谢谢！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customized loss value #52

Customized loss value #52

ZN1010 commented Aug 22, 2023

KaiLv69 commented Aug 22, 2023

ZN1010 commented Aug 22, 2023 •

edited

Loading

KaiLv69 commented Aug 25, 2023

ZN1010 commented Sep 8, 2023

Customized loss value #52

Customized loss value #52

Comments

ZN1010 commented Aug 22, 2023

KaiLv69 commented Aug 22, 2023

ZN1010 commented Aug 22, 2023 • edited Loading

KaiLv69 commented Aug 25, 2023

ZN1010 commented Sep 8, 2023

ZN1010 commented Aug 22, 2023 •

edited

Loading