You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In section 7.4, it conducts an initialization experiment with real words. I am just wondering, does this initialization applies to prompts in every layer? Or just the prompts in the first layer? And how does this work together with the re-parameterization method since the input dimension of re-param is much smaller?
And I also noticed that in your code, instead of directly adding prompts to the input of each layer (as described in ur paper), what u actually did is appending vectors to key value matrices directly via the past_key_values argument. Just wondering, how does the initialization experiment work in this setup/implementation? Directly initialize the key/value vectors? But seems that the dimension doesn't match?
Thanks!
The text was updated successfully, but these errors were encountered:
Hi, thanks for the great work!
In section 7.4, it conducts an initialization experiment with real words. I am just wondering, does this initialization applies to prompts in every layer? Or just the prompts in the first layer? And how does this work together with the re-parameterization method since the input dimension of re-param is much smaller?
And I also noticed that in your code, instead of directly adding prompts to the input of each layer (as described in ur paper), what u actually did is appending vectors to key value matrices directly via the
past_key_values
argument. Just wondering, how does the initialization experiment work in this setup/implementation? Directly initialize the key/value vectors? But seems that the dimension doesn't match?Thanks!
The text was updated successfully, but these errors were encountered: