Retry partial.to to fix accelerate invalid argument error for first moe layer for >4 GPU setups #2163

avtc · 2025-11-01T20:34:28Z

It works for me with 8x3090 from the first attempt.

I had the issue on the first layer with experts similar to one we overcome in early version with lock, for me retry works well, as it thrown only on first layer, do not see much reason in a lock there.

The original error trace that is fixed by PR:

Traceback (most recent call last):j in layer      [1 of 45] ████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░| 0:01:02 / 0:23:46 [2/46] 4.3%
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/utils/threadx.py", line 484, in _run
    result = fn(*args, **kwargs)
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/looper/module_looper.py", line 1608, in _process_on_worker
    proc.process(module=nm)
    ~~~~~~~~~~~~^^^^^^^^^^^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/looper/gptq_processor.py", line 162, in process
    wq, q_scales, q_zeros, q_g_idx, duration, avg_loss, damp_percent, nsamples = g.quantize()
                                                                                 ~~~~~~~~~~^^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/quantization/gptq.py", line 602, in quantize
    self.finalize_hessian(target_device=target_device)
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/quantization/gptq.py", line 520, in finalize_hessian
    self._materialize_global_hessian(target_device=target_device)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/quantization/gptq.py", line 503, in _materialize_global_hessian
    tmp = partial.to(device=result_accum.device, dtype=torch.float32)
torch.AcceleratorError: CUDA error: invalid argument

…oe layer for multi-GPU (>4) setups

Qubitium · 2025-11-02T01:47:02Z

@avtc Amazing simple fix. Looks like the nvidia gpu at a very low level has a internal bg thread that does memory sweep and cleanup! 0.5 seconds delay. Genius! Btw, can you remoe the second retry? I see from the logs that only 1 retry is enough. Is second retry necessary?

Qubitium · 2025-11-02T01:50:40Z

@avtc Another question, have you tried zero delay os a time delay is absolutely required for this 3090 oom?

gptqmodel/quantization/gptq.py

Qubitium · 2025-11-02T12:12:17Z

@avtc Thanks for the investigation and merged. I made a minor adjust to remove the second retry and reduced sleep to 250 ms.

retry partial.to to fix accelerate invalid argument error for first m…

9d6ee27

…oe layer for multi-GPU (>4) setups

Qubitium reviewed Nov 2, 2025

View reviewed changes

gptqmodel/quantization/gptq.py Outdated Show resolved Hide resolved

remove second loop, reduce delay time

c8f2a64

Qubitium merged commit ab2a743 into ModelCloud:main Nov 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Retry partial.to to fix accelerate invalid argument error for first moe layer for >4 GPU setups #2163

Retry partial.to to fix accelerate invalid argument error for first moe layer for >4 GPU setups #2163

Uh oh!

avtc commented Nov 1, 2025

Uh oh!

Qubitium commented Nov 2, 2025 •

edited

Loading

Uh oh!

Qubitium commented Nov 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

Qubitium commented Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Retry partial.to to fix accelerate invalid argument error for first moe layer for >4 GPU setups #2163

Retry partial.to to fix accelerate invalid argument error for first moe layer for >4 GPU setups #2163

Uh oh!

Conversation

avtc commented Nov 1, 2025

Uh oh!

Qubitium commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Qubitium commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Qubitium commented Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Qubitium commented Nov 2, 2025 •

edited

Loading

Qubitium commented Nov 2, 2025 •

edited

Loading