Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

examples of roberta #1

Closed
flyleeee opened this issue Apr 22, 2024 · 1 comment
Closed

examples of roberta #1

flyleeee opened this issue Apr 22, 2024 · 1 comment

Comments

@flyleeee
Copy link

Could you provide the code for the paper on the RoBERTa model?

@flyleeee
Copy link
Author

flyleeee commented Apr 24, 2024

When I tried to reproduce the experiment on RoBERTa, I found that if I implemented lora in the output layer of the model as described in the paper, it would trigger an error in the input_hook function of the save_input_hook function in the kfac.py file.

def input_hook(_module: nn.Module, pos_args: tuple[t.Tensor]) -> None:
        if not _hooks_enabled or _input_hooks_disabled:
            return
        # Select the first positional argument given to this layer (the input
        # activation), then the last token in the token sequence [:, -1]. `a`
        # should be a [batch, l_in] tensor.
        a: Float[Tensor, "batch l_in"] = pos_args[0].detach().clone()[:, -1]
        if has_bias:
            a = t.hstack((a, t.ones_like(a[:, :1])))
        assert a.dim() == 2

Moreover, the hyperparameter of the prior var on RoBERTa is also unknown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant