Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to quantize LLama in fine-tuning ? #470

Open
sfarzi opened this issue Nov 13, 2023 · 0 comments
Open

How to quantize LLama in fine-tuning ? #470

sfarzi opened this issue Nov 13, 2023 · 0 comments

Comments

@sfarzi
Copy link

sfarzi commented Nov 13, 2023

I wanna fine-tune ( Lora and Prefix tuning) LLama 70B over 4 GPU A40.
My plan is using quantized version of LLAMA in fine-tuning phase. But I have not found any implementation for this purpose in the source codes provided by Lit-llama. As I know, using quantization is only implemented in inference, in the generate method, and there is no implementation for fine-tuning. So My clear question is how to implement quantized fine-tuning ? is there any sample?
Tnx in advance.
Saeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant