-
Notifications
You must be signed in to change notification settings - Fork 130
optimize eora for multi-gpu and memory usage #2046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is awesome, appreciate the effort! |
bc46691 to
5762215
Compare
I am going to merge this for now. There is a slight regression in Eora quality. Not sure if this PR is fault or another, commit in the last 24 hours. I will backtract to fix in another PR. Once Eora regressoin (slight quality drop when it should be slight quality uplift) eora will finally join lower vram + multi-gpu data parallel quantization. =) |
|
@nbasyl There appears to be no regression but the way I changed Changes:
I think that # act_group_aware = True + desc_act = False
--------Eval METHOD.GPTQ Result---------
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|-------------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_challenge| 1|none | 0|acc |↑ |0.3148|± |0.0136|
| | |none | 0|acc_norm|↑ |0.3370|± |0.0138|
--------Eval METHOD.GPTQ + EoRA Result---------
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|-------------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_challenge| 1|none | 0|acc |↑ |0.3123|± |0.0135|
| | |none | 0|acc_norm|↑ |0.3481|± |0.0139|# act_group_aware = False + desc_act = False
--------Eval METHOD.GPTQ Result---------
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|-------------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_challenge| 1|none | 0|acc |↑ |0.3046|± |0.0134|
| | |none | 0|acc_norm|↑ |0.3404|± |0.0138|
--------Eval METHOD.GPTQ + EoRA Result---------
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|-------------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_challenge| 1|none | 0|acc |↑ |0.3166|± |0.0136|
| | |none | 0|acc_norm|↑ |0.3447|± |0.0139|Notice how Also reference |
|
Hi @Qubitium, thanks for the update! I’ll help run the MMLU test over the weekend. I have a quick question though — if the results show that EoRA + act_group_aware still degrade MMLU performance, how do we plan to address that? Do you think it’s more of an engineering issue or a methodological one? I’m asking since I’m not very familiar with GAR. |
Make sure run with latest I have not tested the full range of lm-eval tests beyond Maybe once we get a full run of how how |
|
Hi @Qubitium, apologies for the late response — I was completely swamped last week. I finally have some time to run the experiment, but I’m running into issues installing the latest version of GPTQModel. Do you happen to have a Docker image I could use directly? |
|
@nbasyl I just release 5.0 to pypi last night with wheels for pytorch 2.8 2.9 and 3.0. Can you directly install from pypi? pip install -U gptqmodel --no-build-isolation If you get install errors, can you post the errors to see what's wrong? Unfortunately don't have a docker image created but we should. |
@nbasyl During forwarding hook eora accumulation, eora code is now synced to gptq code where accumlation is done per gpu and then merged at end. Test added and atol diff is around ~3e-6 so I think it's good to use.