[Neo][vLLM] Fix quantization failure caused by improperly loaded model. #2360

a-ys · 2024-09-04T21:51:19Z

Description

Fixes issue caused by: casper-hansen/AutoAWQ#558.
Fixes the following error message:

"OptimizationFatalError('Encountered an error during quantization: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)')",

Fixed in Neo by always quantizing with device_map=auto.
Relates to #2271. That PR fixes this issue by passing the device_map=auto in serving.properties. For non-Neo quantization, this will still be required.

…l. (deepjavalibrary#2360) (cherry picked from commit 452ff8e)

[Neo] Fix quantization issue with device_map

c0df299

a-ys requested review from zachgk, frankfliu and a team as code owners September 4, 2024 21:51

tosterberg approved these changes Sep 4, 2024

View reviewed changes

tosterberg merged commit 452ff8e into deepjavalibrary:master Sep 4, 2024
9 checks passed

tosterberg pushed a commit to tosterberg/djl-serving that referenced this pull request Sep 4, 2024

[Neo][vLLM] Fix quantization failure caused by improperly loaded mode…

260507d

…l. (deepjavalibrary#2360) (cherry picked from commit 452ff8e)

tosterberg pushed a commit to tosterberg/djl-serving that referenced this pull request Sep 4, 2024

[Neo][vLLM] Fix quantization failure caused by improperly loaded mode…

737e317

…l. (deepjavalibrary#2360) (cherry picked from commit 452ff8e)

tosterberg mentioned this pull request Sep 5, 2024

[Neo][vLLM] Fix quantization failure caused by improperly loaded mode… #2366

Merged

bkutasi mentioned this pull request Sep 12, 2024

Quantitative model report wrong, RuntimeError: Expected all tensors to be on the same device casper-hansen/AutoAWQ#558

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Neo][vLLM] Fix quantization failure caused by improperly loaded model. #2360

[Neo][vLLM] Fix quantization failure caused by improperly loaded model. #2360

a-ys commented Sep 4, 2024

[Neo][vLLM] Fix quantization failure caused by improperly loaded model. #2360

[Neo][vLLM] Fix quantization failure caused by improperly loaded model. #2360

Conversation

a-ys commented Sep 4, 2024

Description