-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why is there an F.conv1d in the forward() of the MergedLinear? #17
Comments
The short answer is: I am not sure 🤷 . I have three suspicions:
None of the options above explains why for nanoGPTplus/src/model/gpt_language_model/peft/lora.py Lines 332 to 346 in fb208fb
Perhaps it's something that is obvious for any experienced FAANG ML engineer. Like a secret code, something like you need to say when entering some upscale speak-easy club or something like that 😄 . Or this approach worked well on some previous project and someone decided to use it here. Or maybe with help of almost unlimited computational resources Microsoft engineers just used grid search and this combo of Linear and Conv1d layers worked the best 🤷♂️ . Basically that's why I hope in the future researches will release not only implementation of a paper but also additional notes of the implementation. Maybe we should keep this issue opened so maybe I'll revisit it when I have more info. |
You are welcome. |
Hello @vgoklani In a nutshell:
So having number of groups equivalent to the number of enabled LoRA layers we can process each part of the weight matrix independently in a single pass. That's essentially the same that I wrote before. The only question was why do we have Linear layer and right after it - Conv layer (and not let's say Conv+Conv). Well the answer turned out to be pretty simple. 😄 |
I saw this in the original implementation too:
Why do we need to use an F.conv1d as opposed to just using a linear method? That would avoid all these transposes too?
Thanks for sharing your work!
The text was updated successfully, but these errors were encountered: