Add fix for CPU Inference #385

vivekkhandelwal1 · 2023-10-27T08:29:12Z

Signed-Off By: Vivek Khandelwal vivek@nod-labs.com

vivekkhandelwal1 · 2023-10-27T08:30:31Z

@fxmarty, need your review on this.

younesbelkada

Thanks! I think you don't need to hardcode the dtype - wdyt?

auto_gptq/nn_modules/qlinear/qlinear_cuda.py

auto_gptq/nn_modules/qlinear/qlinear_cuda_old.py

fxmarty · 2023-10-30T08:22:54Z

A safe approach would be to modify https://github.com/PanQiWei/AutoGPTQ/blob/518617b8d682aaa95796f622d788e014ee882869/auto_gptq/modeling/_utils.py#L70 to pass the dtype to QuantLinear init, and to default to fp16 (ugly, but for backward compatibility).

This may not work for transformers integration though - maybe torch.get_default_dtype() would do.

vivekkhandelwal1 · 2023-10-31T07:53:43Z

@fxmarty, can you please review it now? I think we need to make the changes here https://github.com/huggingface/optimum/blob/8e7588b09df2f15c47e9b92f81ec2b05f7ae6957/optimum/gptq/quantizer.py#L242-L246 as well?

Refer: AutoGPTQ/AutoGPTQ#385 Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>

vivekkhandelwal1 · 2023-10-31T08:00:44Z

I have added a PR here: huggingface/optimum#1496

fxmarty

LGTM, can you make sure the tests pass?

CUDA_VISIBLE_DEVICES=0 pytest tests/ -s -vvvvv

auto_gptq/nn_modules/qlinear/qlinear_cuda.py

auto_gptq/nn_modules/qlinear/qlinear_cuda_old.py

Refer: AutoGPTQ/AutoGPTQ#385 Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>

vivekkhandelwal1 · 2023-10-31T11:00:08Z

@fxmarty I have updated the PR with the required changes.

fxmarty

LGTM!

Could you confirm that tests pass?

vivekkhandelwal1 · 2023-10-31T11:22:57Z

CUDA_VISIBLE_DEVICES=0 pytest tests/ -s -vvvvv

The following 2 tests failed:

FAILED tests/test_q4.py::TestsQ4CUDA::test_cuda_old_0 - AssertionError: False is not true
FAILED tests/test_q4.py::TestsQ4CUDA::test_cuda_old_1 - AssertionError: False is not true

Edit:

_________________________________________________________________________________ TestsQ4CUDA.test_cuda_old_0 _________________________________________________________________________________

a = (<tests.test_q4.TestsQ4CUDA testMethod=test_cuda_old_0>,), kw = {}

    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)

../shark-vivekkhandelwal1/shark.venv/lib/python3.11/site-packages/parameterized/parameterized.py:620: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/test_q4.py:356: in test_cuda_old
    self.assertTrue(linear.autogptq_cuda_available)
E   AssertionError: False is not true
_________________________________________________________________________________ TestsQ4CUDA.test_cuda_old_1 _________________________________________________________________________________

a = (<tests.test_q4.TestsQ4CUDA testMethod=test_cuda_old_1>,), kw = {}

    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)

../shark-vivekkhandelwal1/shark.venv/lib/python3.11/site-packages/parameterized/parameterized.py:620: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/test_q4.py:356: in test_cuda_old
    self.assertTrue(linear.autogptq_cuda_available)
E   AssertionError: False is not true

vivekkhandelwal1 · 2023-10-31T11:24:17Z

The above failure doesn't seem to be occurring because of the changes made in this PR. @fxmarty

fxmarty · 2023-10-31T11:38:35Z

Yes it was fixed by #387. Thank you!

Refer: AutoGPTQ/AutoGPTQ#385 Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>

vivekkhandelwal1 · 2023-10-31T11:51:31Z

Thanks, @fxmarty, for your support in getting these patches in.

chuyqa · 2023-10-31T22:58:50Z

auto_gptq/nn_modules/qlinear/qlinear_cuda.py

-            out = torch.matmul(x.to(weights.dtype), weights)
-        out = out.half().reshape(out_shape)
+            out = torch.matmul(x, weights)
+        out = out.to(dtype=weights.dtype).reshape(out_shape)


This seems broken in master at the moment:

File "/opt/miniconda3/envs/text-gen-gptq/lib/python3.10/site-packages/auto_gptq/nn_modules/qlinear/qlinear_cuda.py", line 272, in forward
out = out.to(dtype=weights.dtype).reshape(out_shape)
UnboundLocalError: local variable 'weights' referenced before assignment

Hi @chuyqa, thanks for pointing out this issue. I have added a fix here: #390.

Fixes: AutoGPTQ#385 (comment) Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>

vivekkhandelwal1 mentioned this pull request Oct 27, 2023

[core/ QLinear] Support CPU inference #376

Closed

younesbelkada mentioned this pull request Oct 28, 2023

Add support for loading GPTQ models on CPU huggingface/transformers#26719

Merged

younesbelkada reviewed Oct 30, 2023

View reviewed changes

vivekkhandelwal1 force-pushed the gptq-fix branch from a71beb1 to f500ff1 Compare October 31, 2023 07:50

vivekkhandelwal1 requested a review from younesbelkada October 31, 2023 07:52

vivekkhandelwal1 added a commit to vivekkhandelwal1/optimum that referenced this pull request Oct 31, 2023

Add support for CPU Inference

aac549e

Refer: AutoGPTQ/AutoGPTQ#385 Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>

vivekkhandelwal1 mentioned this pull request Oct 31, 2023

Add support for CPU Inference huggingface/optimum#1496

Merged

fxmarty approved these changes Oct 31, 2023

View reviewed changes

vivekkhandelwal1 added a commit to vivekkhandelwal1/optimum that referenced this pull request Oct 31, 2023

Add support for CPU Inference

78342d1

Refer: AutoGPTQ/AutoGPTQ#385 Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>

Add support for CPU Inference

21f973e

vivekkhandelwal1 requested a review from fxmarty October 31, 2023 11:00

vivekkhandelwal1 force-pushed the gptq-fix branch from f500ff1 to 21f973e Compare October 31, 2023 11:01

fxmarty approved these changes Oct 31, 2023

View reviewed changes

fxmarty merged commit 878cbb0 into AutoGPTQ:main Oct 31, 2023

fxmarty pushed a commit to huggingface/optimum that referenced this pull request Oct 31, 2023

Add support for CPU Inference (#1496)

9a9a7bf

Refer: AutoGPTQ/AutoGPTQ#385 Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>

vivekkhandelwal1 deleted the gptq-fix branch October 31, 2023 11:51

chuyqa reviewed Oct 31, 2023

View reviewed changes

vivekkhandelwal1 added a commit to vivekkhandelwal1/AutoGPTQ that referenced this pull request Nov 1, 2023

Fix result dtype conversion in QuantLinear.forward()

9858f29

Fixes: AutoGPTQ#385 (comment) Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>

vivekkhandelwal1 mentioned this pull request Nov 1, 2023

Fix result dtype conversion in QuantLinear.forward() #390

Closed

SunMarc mentioned this pull request Nov 1, 2023

fix weight init #392

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fix for CPU Inference #385

Add fix for CPU Inference #385

vivekkhandelwal1 commented Oct 27, 2023

vivekkhandelwal1 commented Oct 27, 2023

younesbelkada left a comment

fxmarty commented Oct 30, 2023 •

edited

Loading

vivekkhandelwal1 commented Oct 31, 2023

vivekkhandelwal1 commented Oct 31, 2023

fxmarty left a comment

vivekkhandelwal1 commented Oct 31, 2023

fxmarty left a comment

vivekkhandelwal1 commented Oct 31, 2023 •

edited

Loading

vivekkhandelwal1 commented Oct 31, 2023

fxmarty commented Oct 31, 2023

vivekkhandelwal1 commented Oct 31, 2023

chuyqa Oct 31, 2023

vivekkhandelwal1 Nov 1, 2023

Add fix for CPU Inference #385

Add fix for CPU Inference #385

Conversation

vivekkhandelwal1 commented Oct 27, 2023

vivekkhandelwal1 commented Oct 27, 2023

younesbelkada left a comment

Choose a reason for hiding this comment

fxmarty commented Oct 30, 2023 • edited Loading

vivekkhandelwal1 commented Oct 31, 2023

vivekkhandelwal1 commented Oct 31, 2023

fxmarty left a comment

Choose a reason for hiding this comment

vivekkhandelwal1 commented Oct 31, 2023

fxmarty left a comment

Choose a reason for hiding this comment

vivekkhandelwal1 commented Oct 31, 2023 • edited Loading

vivekkhandelwal1 commented Oct 31, 2023

fxmarty commented Oct 31, 2023

vivekkhandelwal1 commented Oct 31, 2023

chuyqa Oct 31, 2023

Choose a reason for hiding this comment

vivekkhandelwal1 Nov 1, 2023

Choose a reason for hiding this comment

fxmarty commented Oct 30, 2023 •

edited

Loading

vivekkhandelwal1 commented Oct 31, 2023 •

edited

Loading