Conversation
| @@ -1,13 +1,13 @@ | |||
| import pytest | |||
|
|
|||
| from pruna.algorithms.quantization.gptq_model import GPTQQuantizer | |||
There was a problem hiding this comment.
while you at it (sorry but it needs to be done) can you add a proper post_smash_hook for gptq please? thanks!
| device_map="auto", | ||
| torch_dtype="auto", | ||
| ) | ||
| model = imported_modules["GPTQModel"].load(temp_dir, gptq_config) |
There was a problem hiding this comment.
check with the handler - execute at least one evaluation. This will likely require a handler exception :/
|
Linking #21 |
llcnt
left a comment
There was a problem hiding this comment.
Thank you for the switch! I added only 2 minors comments, and we are good :)
| "use_exllama", | ||
| default=True, | ||
| meta=dict(desc="Whether to use exllama for quantization."), | ||
| ), |
There was a problem hiding this comment.
I think we can remove this hyperparameter, if it is deprecated in gptqmodel ;)
There was a problem hiding this comment.
if this is unused we should still deprecate it properly, meaning it stays here and there is a warning in the apply and the argument is not used anymore
There was a problem hiding this comment.
I think it still exists
| @@ -44,6 +43,7 @@ class GPTQQuantizer(PrunaQuantizer): | |||
| run_on_cuda = True | |||
There was a problem hiding this comment.
gptqmodel repo mentions that it can run on cpu also. However I did not test it, do you think we should change to run_on_cpu=True?
There was a problem hiding this comment.
I tried to run it on cpu and it failed during inference time. This needs more investigation as it should be possible to make it work but the way we quantize the models currently might not be the best.
3465f95 to
50b6d28
Compare
Description
AutoGPTQ is now deprecated. This PR switches our GPTQ algorithm to gptqmodel instead.
Related Issue
autogptq already caused problems with supporting python 3.12.
Type of Change
How Has This Been Tested?
integration test still runs.
Checklist
Additional Notes
I did not investigate the extent of the new repo. This is only to make the switch. We might be able to find some nice speedups with some investigations.