Skip to content

feat: comfykit awq w4a16 modulation (CORE-31)#13580

Open
HK416-TYPED wants to merge 3 commits intoComfy-Org:masterfrom
HK416-TYPED:feat/comfykit-awq-w4a16-modulation
Open

feat: comfykit awq w4a16 modulation (CORE-31)#13580
HK416-TYPED wants to merge 3 commits intoComfy-Org:masterfrom
HK416-TYPED:feat/comfykit-awq-w4a16-modulation

Conversation

@HK416-TYPED
Copy link
Copy Markdown

Wires comfy-kitchen's int4 quantization layouts through ComfyUI's MixedPrecisionOps so the existing UNETLoader path can dispatch optimized int4 kernels for two checkpoint families used by
Qwen-Image-Edit-2511 (and similar SVD-LoRA + AWQ-modulation models).

353978a9 SVDQuant W4A4 integration

  • quant_ops.py: register TensorCoreSVDQuantW4A4Layout when comfy-kitchen exposes it; gate the kitchen CUDA backend on cuda ≥ 13 (the optimized kitchen CUDA ops are validated against cu13+;
    older cu falls back to eager).
  • ops.py: handle svdquant_w4a4 quant_format by loading weight_scale / proj_down / proj_up / smooth_factor into TensorCoreSVDQuantW4A4Layout.Params, with the img_mlp.net.2 /
    txt_mlp.net.2 fallback for act_unsigned (post-GELU u4.s4 MMA path).
  • Pairs with comfy-kitchen #36 (feat/svdquant-w4a4-kitchen-native).

3ddcc095 AWQ W4A16 modulation integration

  • quant_ops.py: detect TensorCoreAWQW4A16Layout, stub for the no-kitchen fallback (mirrors W4A4 pattern), register the layout class, add awq_w4a16 to QUANT_ALGOS (storage int8 packed
    uint4, params {weight_scale, weight_zero}, default group_size=64).
  • ops.py: add the awq_w4a16 branch in MixedPrecisionOps.Linear._load_from_state_dict that constructs Params(scale, zeros, group_size, …) and wraps qweight into a QuantizedTensor;
    F.linear then dispatches to ck.gemv_awq_w4a16 via the layout's aten handlers.
  • Targets the ~10 GB inflation in Qwen-Image-Edit kitchen-native checkpoints, where the modulation linears (img_mod.1 / txt_mod.1) currently dominate disk + VRAM because they're materialized
    as plain bf16 Linear during conversion.
  • Pairs with comfy-kitchen feat/awq-w4a16-modulation (companion PR coming after Low resolution on HiDPi monitor #36 lands).

Verification

  • Qwen-Image-Edit r96 ComfyUI E2E sampling: kitchen-native AWQ checkpoint runs end-to-end, image visually equivalent to the bf16-dequantized baseline (PSNR ~33 dB; differences sit at the bf16 ULP
    precision floor of (nibble - 8) * scale + zero chains).
  • Stratified sample across 6 variant families (balanced/fast/mid/quality × ranks 32/64/96/128 × base/lightning4/lightning8) all sample successfully with the right.

…major)

quant_ops.py: register TensorCoreSVDQuantW4A4Layout when comfy-kitchen exposes
it; gate the kitchen CUDA backend on cuda >= 13 (the optimized kitchen CUDA
ops are validated against cu13+ runtimes; on older cu the backend falls back
to eager).

ops.py: handle svdquant_w4a4 quant_format by loading weight_scale / proj_down /
proj_up / smooth_factor into TensorCoreSVDQuantW4A4Layout.Params, with the
img_mlp.net.2 / txt_mlp.net.2 fallback for act_unsigned. Targets the row-major
kitchen-native kernels on feat/svdquant-w4a4-kitchen-native; the verbatim
zgemm path is a sibling branch.
Wires comfy-kitchen's TensorCoreAWQW4A16Layout (introduced on
feat/awq-w4a16-modulation) into ComfyUI's MixedPrecisionOps so checkpoints
that tag modulation linears with comfy_quant.format = "awq_w4a16" get
their (qweight, weight_scale, weight_zero) loaded into the kitchen layout
class instead of being dequantized to bf16 plain Linear at conversion time.

quant_ops.py:
- detect TensorCoreAWQW4A16Layout availability and stub it out for the
  no-kitchen fallback (mirrors the SVDQuant W4A4 pattern)
- register the layout class + add "awq_w4a16" to QUANT_ALGOS
  (storage_t = int8 packed uint4, parameters = {weight_scale, weight_zero},
   default group_size = 64)

ops.py: add the awq_w4a16 branch in MixedPrecisionOps.Linear._load_from_state_dict
that constructs Params(scale, zeros, group_size, ...) and wraps qweight
into a QuantizedTensor — F.linear then dispatches to ck.gemv_awq_w4a16
via the layout's aten handlers.

Pairs with comfy-kitchen feat/awq-w4a16-modulation. Targets the ~10 GB
inflation in Qwen-Image-Edit kitchen-native checkpoints, where the
modulation linears (img_mod.1 / txt_mod.1) currently dominate disk + VRAM
because they're materialized as plain bf16 Linear during conversion.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

📝 Walkthrough

Walkthrough

This PR introduces support for two additional quantization formats—svdquant_w4a4 and awq_w4a16—by adding corresponding tensor layout classes in comfy/quant_ops.py with conditional kitchen-backed implementation. The comfy/ops.py changes extend state-dict loading to deserialize format-specific quantization parameters and construct appropriate layout metadata, while modifying inference behavior to conditionally quantize inputs based on layout declarations. Quantization metadata serialization is also extended to persist act_unsigned parameters when present.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The description provides detailed technical context for both SVDQuant W4A4 and AWQ W4A16 integrations, including file changes, implementation details, and verification results that directly relate to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title partially describes the PR's main changes but omits the equally significant SVDQuant W4A4 integration aspect, focusing only on AWQ W4A16 modulation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
comfy/ops.py (2)

951-956: Friendlier error when a checkpoint declares a format the kitchen build doesn't support.

With this PR, svdquant_w4a4 and awq_w4a16 are only registered into QUANT_ALGOS when the corresponding kitchen layout import succeeds. A user on an older comfy_kitchen that loads a Qwen-Image-Edit AWQ checkpoint will hit KeyError: 'awq_w4a16' at line 954 rather than a message pointing them at the actual problem. Consider raising a clear ValueError listing the supported formats.

♻️ Suggested message
-                    qconfig = QUANT_ALGOS[self.quant_format]
+                    if self.quant_format not in QUANT_ALGOS:
+                        raise ValueError(
+                            f"Quantization format '{self.quant_format}' for layer {layer_name} "
+                            f"is not available in this build (supported: {sorted(QUANT_ALGOS.keys())}). "
+                            f"Update comfy_kitchen to enable it."
+                        )
+                    qconfig = QUANT_ALGOS[self.quant_format]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy/ops.py` around lines 951 - 956, The code assumes self.quant_format
exists in QUANT_ALGOS and will raise a KeyError; change the block to explicitly
check whether self.quant_format is in QUANT_ALGOS and if not raise a ValueError
that names the offending format (self.quant_format) and the layer (layer_name)
and lists supported formats (sorted QUANT_ALGOS.keys()), then continue to look
up qconfig = QUANT_ALGOS[self.quant_format], set self.layout_type and call
get_layout_class(self.layout_type) as before.

1156-1157: Optional micro-optim: cache layout_cls instead of resolving it on every forward.

get_layout_class(self.layout_type) runs on every forward() invocation per linear. It's cheap (dict lookup) but redundant — self.layout_type is set once at load time. Caching once during _load_from_state_dict would also let you compute layout_quantizes_input ahead of time, removing two attribute lookups from the inference hot path.

♻️ Sketch
-                    qconfig = QUANT_ALGOS[self.quant_format]
-                    self.layout_type = qconfig["comfy_tensor_layout"]
-                    layout_cls = get_layout_class(self.layout_type)
+                    qconfig = QUANT_ALGOS[self.quant_format]
+                    self.layout_type = qconfig["comfy_tensor_layout"]
+                    layout_cls = get_layout_class(self.layout_type)
+                    self._layout_quantizes_input = getattr(layout_cls, "QUANTIZES_INPUT", True)
-                    layout_cls = get_layout_class(self.layout_type)
-                    layout_quantizes_input = getattr(layout_cls, "QUANTIZES_INPUT", True)
-
-                    if layout_quantizes_input:
+                    if getattr(self, "_layout_quantizes_input", True):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy/ops.py` around lines 1156 - 1157, Cache the resolved layout class and
the boolean QUANTIZES_INPUT during model load instead of resolving them in
forward: in _load_from_state_dict (or wherever self.layout_type is set) call
get_layout_class(self.layout_type) once, store it on the instance (e.g.,
self._layout_cls) and compute self._layout_quantizes_input =
getattr(self._layout_cls, "QUANTIZES_INPUT", True); then update forward to use
self._layout_cls and self._layout_quantizes_input instead of calling
get_layout_class(self.layout_type) and getattr each invocation.
comfy/quant_ops.py (1)

60-83: Optional: align stub-fallback placement with MXFP8 for consistency.

_CKMxfp8Layout defines its stub via if not _CK_MXFP8_AVAILABLE: at module scope (lines 60-62), whereas the new layouts define the stub inside the inner except ImportError branch. Both are functionally equivalent (since lines 40-44 already cover the outer-_CK_AVAILABLE-False case), but mirroring the MXFP8 pattern would make the three blocks read identically.

♻️ Sketch of the consistency-aligned shape
 _CK_SVDQUANT_W4A4_AVAILABLE = False
 if _CK_AVAILABLE:
     try:
         from comfy_kitchen.tensor import TensorCoreSVDQuantW4A4Layout as _CKSVDQuantW4A4Layout
         _CK_SVDQUANT_W4A4_AVAILABLE = True
     except ImportError:
         logging.info("comfy_kitchen does not expose SVDQuant W4A4 layout; int4 SVDQuant checkpoints will not be supported.")
-        class _CKSVDQuantW4A4Layout:
-            pass
+
+if not _CK_SVDQUANT_W4A4_AVAILABLE and _CK_AVAILABLE:
+    class _CKSVDQuantW4A4Layout:
+        pass

(Same shape for AWQ.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy/quant_ops.py` around lines 60 - 83, The three layout stubs are
inconsistent: _CKMxfp8Layout is defined at module scope under "if not
_CK_MXFP8_AVAILABLE" while _CKSVDQuantW4A4Layout and _CKAWQW4A16Layout are
defined inside the except ImportError branches; make them consistent by
moving/adding their stub definitions to the same outer-pattern (i.e., define a
stub class under "if not _CK_SVDQUANT_W4A4_AVAILABLE:" and "if not
_CK_AWQ_W4A16_AVAILABLE:" at module scope like _CKMxfp8Layout) and keep the
try/except only responsible for setting the real import and toggling the
_CK_*_AVAILABLE flags, referencing the symbols _CKSVDQuantW4A4Layout and
_CKAWQW4A16Layout to locate the code to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy/ops.py`:
- Around line 1000-1028: The heuristic that forces act_unsigned True for layers
matching ".img_mlp.net.2" or ".txt_mlp.net.2" is too broad; update the condition
in the svdquant_w4a4 branch so the override only runs when the layer_conf does
not explicitly contain the act_unsigned key (e.g., check presence/absence in
layer_conf) and preserve the original act_unsigned when it is explicitly set to
False, and add a one-line TODO comment referencing the Qwen-Image-Edit
conversion path so future re-exports can remove this workaround; locate and
modify the code around the symbols act_unsigned, layer_conf, layer_name inside
the svdquant_w4a4 handling block.

---

Nitpick comments:
In `@comfy/ops.py`:
- Around line 951-956: The code assumes self.quant_format exists in QUANT_ALGOS
and will raise a KeyError; change the block to explicitly check whether
self.quant_format is in QUANT_ALGOS and if not raise a ValueError that names the
offending format (self.quant_format) and the layer (layer_name) and lists
supported formats (sorted QUANT_ALGOS.keys()), then continue to look up qconfig
= QUANT_ALGOS[self.quant_format], set self.layout_type and call
get_layout_class(self.layout_type) as before.
- Around line 1156-1157: Cache the resolved layout class and the boolean
QUANTIZES_INPUT during model load instead of resolving them in forward: in
_load_from_state_dict (or wherever self.layout_type is set) call
get_layout_class(self.layout_type) once, store it on the instance (e.g.,
self._layout_cls) and compute self._layout_quantizes_input =
getattr(self._layout_cls, "QUANTIZES_INPUT", True); then update forward to use
self._layout_cls and self._layout_quantizes_input instead of calling
get_layout_class(self.layout_type) and getattr each invocation.

In `@comfy/quant_ops.py`:
- Around line 60-83: The three layout stubs are inconsistent: _CKMxfp8Layout is
defined at module scope under "if not _CK_MXFP8_AVAILABLE" while
_CKSVDQuantW4A4Layout and _CKAWQW4A16Layout are defined inside the except
ImportError branches; make them consistent by moving/adding their stub
definitions to the same outer-pattern (i.e., define a stub class under "if not
_CK_SVDQUANT_W4A4_AVAILABLE:" and "if not _CK_AWQ_W4A16_AVAILABLE:" at module
scope like _CKMxfp8Layout) and keep the try/except only responsible for setting
the real import and toggling the _CK_*_AVAILABLE flags, referencing the symbols
_CKSVDQuantW4A4Layout and _CKAWQW4A16Layout to locate the code to change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d527adc4-fdbb-40b5-980b-8ca75fe532ee

📥 Commits

Reviewing files that changed from the base of the PR and between 115f418 and 3ddcc09.

📒 Files selected for processing (2)
  • comfy/ops.py
  • comfy/quant_ops.py

Comment thread comfy/ops.py
Comment on lines +1000 to +1028
elif self.quant_format == "svdquant_w4a4":
# SVDQuant W4A4: per-group weight scales + low-rank correction
# (proj_down, proj_up) + activation smoothing (smooth_factor)
wscales = self._load_scale_param(state_dict, prefix, "weight_scale", device, manually_loaded_keys)
proj_down = self._load_scale_param(state_dict, prefix, "proj_down", device, manually_loaded_keys)
proj_up = self._load_scale_param(state_dict, prefix, "proj_up", device, manually_loaded_keys)
smooth_factor = self._load_scale_param(state_dict, prefix, "smooth_factor", device, manually_loaded_keys)
act_unsigned = bool(layer_conf.get("act_unsigned", False))

# Early Qwen-Image conversion artifacts did not persist the
# fused GELU -> fc2 unsigned-activation flag. Those layers
# are the second linear in the feed-forward block.
if not act_unsigned and (
layer_name.endswith(".img_mlp.net.2") or layer_name.endswith(".txt_mlp.net.2")
):
act_unsigned = True

if any(t is None for t in (wscales, proj_down, proj_up, smooth_factor)):
raise ValueError(f"Missing SVDQuant W4A4 parameters for layer {layer_name}")

params = layout_cls.Params(
scale=wscales,
orig_dtype=MixedPrecisionOps._compute_dtype,
orig_shape=(self.out_features, self.in_features),
proj_down=proj_down,
proj_up=proj_up,
smooth_factor=smooth_factor,
act_unsigned=act_unsigned,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

act_unsigned layer-name heuristic is a load-bearing workaround — please add a TODO and narrow the trigger.

The endswith(".img_mlp.net.2") or endswith(".txt_mlp.net.2") rule unconditionally flips act_unsigned to True for any model whose state_dict happens to use those submodule names — not just early Qwen-Image-Edit kitchen-native checkpoints. If a different topology (or future SVDQuant export) reuses the same names with signed activations, those layers will dispatch with the wrong activation domain and produce silently corrupted outputs (no exception, just bad samples). Two suggestions:

  1. Gate the override so it only triggers when the format truly requires a fused-GELU upstream signal (e.g., also check that act_unsigned is absent from layer_conf rather than merely False — a future exporter explicitly writing act_unsigned: false would currently still get overridden).
  2. Add a TODO referencing the Qwen-Image-Edit conversion path so this can be deleted once re-exported checkpoints carry the flag.
🛡️ Suggested narrowing
-                        act_unsigned = bool(layer_conf.get("act_unsigned", False))
-
-                        # Early Qwen-Image conversion artifacts did not persist the
-                        # fused GELU -> fc2 unsigned-activation flag. Those layers
-                        # are the second linear in the feed-forward block.
-                        if not act_unsigned and (
-                            layer_name.endswith(".img_mlp.net.2") or layer_name.endswith(".txt_mlp.net.2")
-                        ):
-                            act_unsigned = True
+                        # TODO(comfykit-awq): drop the layer-name heuristic once all SVDQuant
+                        # exporters persist `act_unsigned`. Only override when the flag is
+                        # absent from layer_conf, not when it's explicitly false.
+                        if "act_unsigned" in layer_conf:
+                            act_unsigned = bool(layer_conf["act_unsigned"])
+                        else:
+                            act_unsigned = layer_name.endswith(".img_mlp.net.2") or layer_name.endswith(".txt_mlp.net.2")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy/ops.py` around lines 1000 - 1028, The heuristic that forces
act_unsigned True for layers matching ".img_mlp.net.2" or ".txt_mlp.net.2" is
too broad; update the condition in the svdquant_w4a4 branch so the override only
runs when the layer_conf does not explicitly contain the act_unsigned key (e.g.,
check presence/absence in layer_conf) and preserve the original act_unsigned
when it is explicitly set to False, and add a one-line TODO comment referencing
the Qwen-Image-Edit conversion path so future re-exports can remove this
workaround; locate and modify the code around the symbols act_unsigned,
layer_conf, layer_name inside the svdquant_w4a4 handling block.

@alexisrolland alexisrolland changed the title Feat/comfykit awq w4a16 modulation(CORE-31) feat: comfykit awq w4a16 modulation (CORE-31) Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants