You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I tried to reproduce the experiment on RoBERTa, I found that if I implemented lora in the output layer of the model as described in the paper, it would trigger an error in the input_hook function of the save_input_hook function in the kfac.py file.
def input_hook(_module: nn.Module, pos_args: tuple[t.Tensor]) -> None:
if not _hooks_enabled or _input_hooks_disabled:
return
# Select the first positional argument given to this layer (the input
# activation), then the last token in the token sequence [:, -1]. `a`
# should be a [batch, l_in] tensor.
a: Float[Tensor, "batch l_in"] = pos_args[0].detach().clone()[:, -1]
if has_bias:
a = t.hstack((a, t.ones_like(a[:, :1])))
assert a.dim() == 2
Moreover, the hyperparameter of the prior var on RoBERTa is also unknown
Could you provide the code for the paper on the RoBERTa model?
The text was updated successfully, but these errors were encountered: