fix problems in the FOEM data processing pipeline by Xingyu-Zheng · Pull Request #2659 · ModelCloud/GPTQModel

Xingyu-Zheng · 2026-04-02T11:12:33Z

When testing FOEM on Qwen3.5-35B-A3B, the error caused by reusing GPTAQ’s data processing pipeline occurs even earlier than the previous issue I encountered in gptqmodel/quantization/foem.py. In this case, simply setting alpha = 0 does not resolve the problem.

To address this, I added a special handling of alpha in the processor to ensure that FOEM, when used alone, achieves better generalization consistent with GPTQ.

…teraction with GPTAQ

Qubitium · 2026-04-02T13:24:48Z

@Xingyu-Zheng LGTM! Thanks.

Qubitium · 2026-04-02T14:06:25Z

@Xingyu-Zheng I just remebered why GPTAQ had issues with MoE. Calibration data is feed to model serially and becomes orderd input to module which generates output. GPTAQ processed had the assumption that the input captured is in the samer order. The problem of MoE routing is that input of [a, b, c, e] may be seen by an MoE module as [ b, e ] but there was no safe to actually match captured captured: b to input captured: b if that makes any sense. I will check the code again but this memory just came back on why GPTAQ never worked with MoE. I discovered this as soon as GPTAQ (preriously called gptq v2) was merged and had a short discussion with the author and we both had no good solutions at that time.

Xingyu-Zheng · 2026-04-03T07:40:45Z

@Qubitium I haven’t studied MoE models in depth, nor have I carefully gone through the implementation details in GPTQModel. However, here is my current hypothesis.

GPTAQ assumes a dual-stream data flow, where one stream corresponds to the FP model and the other to the progressively quantized model. As earlier layers become quantized, the routing decisions in later MoE layers may start to diverge between the two streams. For example, the FP model might route tokens {a, c} to expert 1, while the quantized model routes {b, d, e} to the same expert. As a result, when GPTAQ performs calibration on expert 1, the inputs $X$ and $\tilde{X}$ no longer match in either dimension or semantic meaning.

If this hypothesis is correct, there may be several possible solutions:

Force the quantized branch to follow the FP router decisions, ignoring its own routing outputs. This seems like the most reliable approach, as it ensures strict alignment between the two data streams for each expert. Moreover, since the router itself does not directly modify the token representations, this should not affect semantic propagation.
Disable top-k routing and send all tokens to the same expert (e.g., {a, b, c, d, e} to expert 1). However, I am unsure where expert weighting is applied in different MoE implementations. If the weighting differs between the FP and quantized branches, the representations may still be misaligned even if the dimensions match.
Fallback to GPTQ for MoE layers, while continuing to apply GPTAQ to attention and other global components. This avoids dealing with routing inconsistencies altogether.

I should note that I am not deeply familiar with MoE mechanisms, so these are only preliminary thoughts. I hope they might still provide some useful insights.

fix problems in the FOEM data processing pipeline arising from its in…

e5ed2e8

…teraction with GPTAQ

Qubitium approved these changes Apr 2, 2026

View reviewed changes

Qubitium merged commit 1dfe865 into ModelCloud:main Apr 2, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix problems in the FOEM data processing pipeline#2659

fix problems in the FOEM data processing pipeline#2659
Qubitium merged 1 commit intoModelCloud:mainfrom
Xingyu-Zheng:main

Xingyu-Zheng commented Apr 2, 2026

Uh oh!

Qubitium commented Apr 2, 2026

Uh oh!

Uh oh!

Qubitium commented Apr 2, 2026 •

edited

Loading

Uh oh!

Xingyu-Zheng commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Xingyu-Zheng commented Apr 2, 2026

Uh oh!

Qubitium commented Apr 2, 2026

Uh oh!

Uh oh!

Qubitium commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xingyu-Zheng commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Qubitium commented Apr 2, 2026 •

edited

Loading