Skip to content

Conversation

@avtc
Copy link
Contributor

@avtc avtc commented Nov 1, 2025

It works for me with 8x3090 from the first attempt.
image

I had the issue on the first layer with experts similar to one we overcome in early version with lock, for me retry works well, as it thrown only on first layer, do not see much reason in a lock there.

The original error trace that is fixed by PR:

Traceback (most recent call last):j in layer      [1 of 45] ████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░| 0:01:02 / 0:23:46 [2/46] 4.3%
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/utils/threadx.py", line 484, in _run
    result = fn(*args, **kwargs)
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/looper/module_looper.py", line 1608, in _process_on_worker
    proc.process(module=nm)
    ~~~~~~~~~~~~^^^^^^^^^^^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/looper/gptq_processor.py", line 162, in process
    wq, q_scales, q_zeros, q_g_idx, duration, avg_loss, damp_percent, nsamples = g.quantize()
                                                                                 ~~~~~~~~~~^^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/quantization/gptq.py", line 602, in quantize
    self.finalize_hessian(target_device=target_device)
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/quantization/gptq.py", line 520, in finalize_hessian
    self._materialize_global_hessian(target_device=target_device)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/quantization/gptq.py", line 503, in _materialize_global_hessian
    tmp = partial.to(device=result_accum.device, dtype=torch.float32)
torch.AcceleratorError: CUDA error: invalid argument

@Qubitium
Copy link
Collaborator

Qubitium commented Nov 2, 2025

@avtc Amazing simple fix. Looks like the nvidia gpu at a very low level has a internal bg thread that does memory sweep and cleanup! 0.5 seconds delay. Genius! Btw, can you remoe the second retry? I see from the logs that only 1 retry is enough. Is second retry necessary?

@Qubitium
Copy link
Collaborator

Qubitium commented Nov 2, 2025

@avtc Another question, have you tried zero delay os a time delay is absolutely required for this 3090 oom?

@Qubitium
Copy link
Collaborator

Qubitium commented Nov 2, 2025

@avtc Thanks for the investigation and merged. I made a minor adjust to remove the second retry and reduced sleep to 250 ms.

@Qubitium Qubitium merged commit ab2a743 into ModelCloud:main Nov 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants