Fix for #2116 When all modules of a layer excluded there is an error in forward replay #2117

avtc · 2025-10-25T19:46:01Z

@Qubitium please review the fix for #2116

This reverts commit da463bb.

avtc · 2025-10-26T06:19:03Z

I have quantized with this fix, but after the model save there is an error in the end:

Finished! Quantized model saved to /home/ubuntu/models/GPTQModel/GLM-4.5-Air-gptqmodel-w8g64-tp8-padmoe1-v2-0-dump0.05-bs1-sException ignored in: <function ProgressBar.__del__ at 0x25c03d93fc0>░░░░░░░░░░░░░░░░░░░░░░░░░| 5:09:16 / 0:00:00 [0/0] 0.0%
Traceback (most recent call last):░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░| 5:09:16 / 0:00:00 [0/0] 0.0%
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 876, in __del__
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 916, in close
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 594, in detach
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 495, in _render_lock_context
TypeError: 'NoneType' object is not callable

Will check if balanced can be improved for non-quantized expert modules. Sometimes whole layers are excluded from quantization in dynamic config.

Qubitium · 2025-10-26T06:56:46Z

Finished! Quantized model saved to /home/ubuntu/models/GPTQModel/GLM-4.5-Air-gptqmodel-w8g64-tp8-padmoe1-v2-0-dump0.05-bs1-sException ignored in: <function ProgressBar.del at 0x25c03d93fc0>░░░░░░░░░░░░░░░░░░░░░░░░░| 5:09:16 / 0:00:00 [0/0] 0.0%
Traceback (most recent call last):░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░| 5:09:16 / 0:00:00 [0/0] 0.0%
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 876, in del
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 916, in close
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 594, in detach
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 495, in _render_lock_context
TypeError: 'NoneType' object is not callable

The error is a state bug in logbar so I need to fix this and unrelated to gpt-qmodel. I will check and merge this PR soon!

Qubitium · 2025-10-26T07:29:53Z

@avtc There is an issue where is_moe and subset_forward_serial check is now outside of the subset loop in the full layer so that even if the layer containsmoe but the first few subset is not, we are forcing the slower subset_forward_serial execution?

For example, if we have a normal moe layer:

attn: attention layer (normal) qkvo
attn: shared attention (many moe has shared attention) shred qkvo
mlp: expert modules (this is the part we want vram.balanced and subset_forward_serial)

So layers subset (1) and optionally subset (2) are forced to execute serially when they can still execute under faster data-paralleized.

Can you do a debug and print out the subsets and also the modules list when you now do the is_moe check?

So with dynamic we have an issue where:

subset can be empty
entire layer can be empty, or in another words, all subsets in that layer is now empety

Qubitium · 2025-10-26T07:43:18Z

The curent module_looper is a headache to unwrap. I am going to make it into a few modules instead of the monolithic blob with all these for loops that is making my head swirl trying to read/debug it.

avtc · 2025-10-26T08:11:36Z

I will partially revert and test. Do not see how convert to draft, please assume this is not finalized

…l outside fo a loop

avtc · 2025-10-26T08:39:56Z

I have removed the extra logic from before the loop, and only fixed the issue with undefined variable.
It surpasses non-moe layer[0] with all modules excluded.
So should be fine.
There can be still insufficient VRAM when expert layer will be excluded by dynamic rules, as balance is not happening for them.

P.S. Sometimes I have encountered the error even with 4 GPUs:

Traceback (most recent call last):----------+---------+
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/utils/threadx.py", line 484, in _run2/46] 4.3%
    result = fn(*args, **kwargs)░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░| 0:04:19 / 0:00:00 [0/0] 0.0%
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/looper/module_looper.py", line 1613, in _process_on_worker
    proc.process(module=nm)
    ~~~~~~~~~~~~^^^^^^^^^^^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/looper/gptq_processor.py", line 162, in process
    wq, q_scales, q_zeros, q_g_idx, duration, avg_loss, damp_percent, nsamples = g.quantize()
                                                                                 ~~~~~~~~~~^^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/quantization/gptq.py", line 602, in quantize
    self.finalize_hessian(target_device=target_device)
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/quantization/gptq.py", line 520, in finalize_hessian
    self._materialize_global_hessian(target_device=target_device)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/quantization/gptq.py", line 503, in _materialize_global_hessian
    tmp = partial.to(device=result_accum.device, dtype=torch.float32)
torch.AcceleratorError: CUDA error: invalid argument
Search for `cudaErrorInvalidValue' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.

Maybe it is related with current fix...

avtc · 2025-10-26T10:10:28Z

This PR can be considered as final

Qubitium · 2025-11-01T08:38:22Z

@avtc Sorry for getting back to you so late. I wanted to refractor the life cycle so main loop , layer loop, and subset loop are separted before merging this PR since I was having a very difificult time visualizing the logic myself with the complexity of the code layout that evolved over time. Still is too complex for my small mind but at least it is now much more modular for me to digest.

Please mege with master and re-apply your fix via the new lifecycle.

avtc · 2025-11-01T13:35:40Z

Ok, i will prepare changes and test them before pushing

avtc · 2025-11-01T15:32:35Z

@Qubitium what will be the proper fix for the case when layer has no modules for quantization - to run forward pass and produce proper layer_inputs for next layer? or to skip forward pass and forward replay and pass layer inputs to next layer from this layer?

Qubitium · 2025-11-01T15:43:27Z

@avtc If a full layer has no module to quantize a simple forward() is enough and output is captured to be used as next layer's input. So one pass forward (entire layer simple forward wihout need of dealing with subset loops and micro forward loops, just full layer, usally XXXDecodeLayer.forward(). So output = current_layer.forward() is enough or sometimes just calling the layer callable like layer() which same as layer.forward().

Assume layer 2 has no modules to quantize. At beginniing loop for layer 2, we have layer_output from completed forward_replay() of layer 1. Then pass this to layer 2 (as a whole) as layer_input and store ouput, then immediately loop to layer 3 without any further subset work that is only necessary if we need to quantize part of a layer.

avtc added 7 commits October 25, 2025 13:02

trying marlin for awq

da463bb

Revert "trying marlin for awq"

01c740b

This reverts commit da463bb.

fix replay when no modules were quantized in the layer

339deb3

debug

7fc2936

missing change

7013698

remove any_modules_processed

432f78d

fill forward_device_map for any layers not only moe

723973f

Qubitium mentioned this pull request Oct 26, 2025

more interpreter shutdown state fix ModelCloud/LogBar#25

Merged

Qubitium closed this in ModelCloud/LogBar#25 Oct 26, 2025

Qubitium reopened this Oct 26, 2025

Qubitium self-assigned this Oct 26, 2025

avtc added 3 commits October 26, 2025 10:12

revert init of is_moe_layer, forward_device_map, subset_forward_seria…

b4551f4

…l outside fo a loop

remove obsolete comment

30eac6c

fix newline spaces

02ac2df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix for #2116 When all modules of a layer excluded there is an error in forward replay #2117

Fix for #2116 When all modules of a layer excluded there is an error in forward replay #2117

avtc commented Oct 25, 2025 •

edited

Loading

Uh oh!

avtc commented Oct 26, 2025

Uh oh!

Qubitium commented Oct 26, 2025

Uh oh!

Qubitium commented Oct 26, 2025 •

edited

Loading

Uh oh!

Qubitium commented Oct 26, 2025

Uh oh!

avtc commented Oct 26, 2025

Uh oh!

avtc commented Oct 26, 2025 •

edited

Loading

Uh oh!

avtc commented Oct 26, 2025

Uh oh!

Qubitium commented Nov 1, 2025 •

edited

Loading

Uh oh!

avtc commented Nov 1, 2025

Uh oh!

avtc commented Nov 1, 2025

Uh oh!

Qubitium commented Nov 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix for #2116 When all modules of a layer excluded there is an error in forward replay #2117

Are you sure you want to change the base?

Fix for #2116 When all modules of a layer excluded there is an error in forward replay #2117

Conversation

avtc commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avtc commented Oct 26, 2025

Uh oh!

Qubitium commented Oct 26, 2025

Uh oh!

Qubitium commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Qubitium commented Oct 26, 2025

Uh oh!

avtc commented Oct 26, 2025

Uh oh!

avtc commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avtc commented Oct 26, 2025

Uh oh!

Qubitium commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avtc commented Nov 1, 2025

Uh oh!

avtc commented Nov 1, 2025

Uh oh!

Qubitium commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

avtc commented Oct 25, 2025 •

edited

Loading

Qubitium commented Oct 26, 2025 •

edited

Loading

avtc commented Oct 26, 2025 •

edited

Loading

Qubitium commented Nov 1, 2025 •

edited

Loading

Qubitium commented Nov 1, 2025 •

edited

Loading