feat: comfykit awq w4a16 modulation (CORE-31)#13580
feat: comfykit awq w4a16 modulation (CORE-31)#13580HK416-TYPED wants to merge 3 commits intoComfy-Org:masterfrom
Conversation
…major) quant_ops.py: register TensorCoreSVDQuantW4A4Layout when comfy-kitchen exposes it; gate the kitchen CUDA backend on cuda >= 13 (the optimized kitchen CUDA ops are validated against cu13+ runtimes; on older cu the backend falls back to eager). ops.py: handle svdquant_w4a4 quant_format by loading weight_scale / proj_down / proj_up / smooth_factor into TensorCoreSVDQuantW4A4Layout.Params, with the img_mlp.net.2 / txt_mlp.net.2 fallback for act_unsigned. Targets the row-major kitchen-native kernels on feat/svdquant-w4a4-kitchen-native; the verbatim zgemm path is a sibling branch.
Wires comfy-kitchen's TensorCoreAWQW4A16Layout (introduced on
feat/awq-w4a16-modulation) into ComfyUI's MixedPrecisionOps so checkpoints
that tag modulation linears with comfy_quant.format = "awq_w4a16" get
their (qweight, weight_scale, weight_zero) loaded into the kitchen layout
class instead of being dequantized to bf16 plain Linear at conversion time.
quant_ops.py:
- detect TensorCoreAWQW4A16Layout availability and stub it out for the
no-kitchen fallback (mirrors the SVDQuant W4A4 pattern)
- register the layout class + add "awq_w4a16" to QUANT_ALGOS
(storage_t = int8 packed uint4, parameters = {weight_scale, weight_zero},
default group_size = 64)
ops.py: add the awq_w4a16 branch in MixedPrecisionOps.Linear._load_from_state_dict
that constructs Params(scale, zeros, group_size, ...) and wraps qweight
into a QuantizedTensor — F.linear then dispatches to ck.gemv_awq_w4a16
via the layout's aten handlers.
Pairs with comfy-kitchen feat/awq-w4a16-modulation. Targets the ~10 GB
inflation in Qwen-Image-Edit kitchen-native checkpoints, where the
modulation linears (img_mod.1 / txt_mod.1) currently dominate disk + VRAM
because they're materialized as plain bf16 Linear during conversion.
📝 WalkthroughWalkthroughThis PR introduces support for two additional quantization formats— 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
comfy/ops.py (2)
951-956: Friendlier error when a checkpoint declares a format the kitchen build doesn't support.With this PR,
svdquant_w4a4andawq_w4a16are only registered intoQUANT_ALGOSwhen the corresponding kitchen layout import succeeds. A user on an oldercomfy_kitchenthat loads a Qwen-Image-Edit AWQ checkpoint will hitKeyError: 'awq_w4a16'at line 954 rather than a message pointing them at the actual problem. Consider raising a clear ValueError listing the supported formats.♻️ Suggested message
- qconfig = QUANT_ALGOS[self.quant_format] + if self.quant_format not in QUANT_ALGOS: + raise ValueError( + f"Quantization format '{self.quant_format}' for layer {layer_name} " + f"is not available in this build (supported: {sorted(QUANT_ALGOS.keys())}). " + f"Update comfy_kitchen to enable it." + ) + qconfig = QUANT_ALGOS[self.quant_format]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@comfy/ops.py` around lines 951 - 956, The code assumes self.quant_format exists in QUANT_ALGOS and will raise a KeyError; change the block to explicitly check whether self.quant_format is in QUANT_ALGOS and if not raise a ValueError that names the offending format (self.quant_format) and the layer (layer_name) and lists supported formats (sorted QUANT_ALGOS.keys()), then continue to look up qconfig = QUANT_ALGOS[self.quant_format], set self.layout_type and call get_layout_class(self.layout_type) as before.
1156-1157: Optional micro-optim: cachelayout_clsinstead of resolving it on every forward.
get_layout_class(self.layout_type)runs on everyforward()invocation per linear. It's cheap (dict lookup) but redundant —self.layout_typeis set once at load time. Caching once during_load_from_state_dictwould also let you computelayout_quantizes_inputahead of time, removing two attribute lookups from the inference hot path.♻️ Sketch
- qconfig = QUANT_ALGOS[self.quant_format] - self.layout_type = qconfig["comfy_tensor_layout"] - layout_cls = get_layout_class(self.layout_type) + qconfig = QUANT_ALGOS[self.quant_format] + self.layout_type = qconfig["comfy_tensor_layout"] + layout_cls = get_layout_class(self.layout_type) + self._layout_quantizes_input = getattr(layout_cls, "QUANTIZES_INPUT", True)- layout_cls = get_layout_class(self.layout_type) - layout_quantizes_input = getattr(layout_cls, "QUANTIZES_INPUT", True) - - if layout_quantizes_input: + if getattr(self, "_layout_quantizes_input", True):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@comfy/ops.py` around lines 1156 - 1157, Cache the resolved layout class and the boolean QUANTIZES_INPUT during model load instead of resolving them in forward: in _load_from_state_dict (or wherever self.layout_type is set) call get_layout_class(self.layout_type) once, store it on the instance (e.g., self._layout_cls) and compute self._layout_quantizes_input = getattr(self._layout_cls, "QUANTIZES_INPUT", True); then update forward to use self._layout_cls and self._layout_quantizes_input instead of calling get_layout_class(self.layout_type) and getattr each invocation.comfy/quant_ops.py (1)
60-83: Optional: align stub-fallback placement with MXFP8 for consistency.
_CKMxfp8Layoutdefines its stub viaif not _CK_MXFP8_AVAILABLE:at module scope (lines 60-62), whereas the new layouts define the stub inside the innerexcept ImportErrorbranch. Both are functionally equivalent (since lines 40-44 already cover the outer-_CK_AVAILABLE-False case), but mirroring the MXFP8 pattern would make the three blocks read identically.♻️ Sketch of the consistency-aligned shape
_CK_SVDQUANT_W4A4_AVAILABLE = False if _CK_AVAILABLE: try: from comfy_kitchen.tensor import TensorCoreSVDQuantW4A4Layout as _CKSVDQuantW4A4Layout _CK_SVDQUANT_W4A4_AVAILABLE = True except ImportError: logging.info("comfy_kitchen does not expose SVDQuant W4A4 layout; int4 SVDQuant checkpoints will not be supported.") - class _CKSVDQuantW4A4Layout: - pass + +if not _CK_SVDQUANT_W4A4_AVAILABLE and _CK_AVAILABLE: + class _CKSVDQuantW4A4Layout: + pass(Same shape for AWQ.)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@comfy/quant_ops.py` around lines 60 - 83, The three layout stubs are inconsistent: _CKMxfp8Layout is defined at module scope under "if not _CK_MXFP8_AVAILABLE" while _CKSVDQuantW4A4Layout and _CKAWQW4A16Layout are defined inside the except ImportError branches; make them consistent by moving/adding their stub definitions to the same outer-pattern (i.e., define a stub class under "if not _CK_SVDQUANT_W4A4_AVAILABLE:" and "if not _CK_AWQ_W4A16_AVAILABLE:" at module scope like _CKMxfp8Layout) and keep the try/except only responsible for setting the real import and toggling the _CK_*_AVAILABLE flags, referencing the symbols _CKSVDQuantW4A4Layout and _CKAWQW4A16Layout to locate the code to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy/ops.py`:
- Around line 1000-1028: The heuristic that forces act_unsigned True for layers
matching ".img_mlp.net.2" or ".txt_mlp.net.2" is too broad; update the condition
in the svdquant_w4a4 branch so the override only runs when the layer_conf does
not explicitly contain the act_unsigned key (e.g., check presence/absence in
layer_conf) and preserve the original act_unsigned when it is explicitly set to
False, and add a one-line TODO comment referencing the Qwen-Image-Edit
conversion path so future re-exports can remove this workaround; locate and
modify the code around the symbols act_unsigned, layer_conf, layer_name inside
the svdquant_w4a4 handling block.
---
Nitpick comments:
In `@comfy/ops.py`:
- Around line 951-956: The code assumes self.quant_format exists in QUANT_ALGOS
and will raise a KeyError; change the block to explicitly check whether
self.quant_format is in QUANT_ALGOS and if not raise a ValueError that names the
offending format (self.quant_format) and the layer (layer_name) and lists
supported formats (sorted QUANT_ALGOS.keys()), then continue to look up qconfig
= QUANT_ALGOS[self.quant_format], set self.layout_type and call
get_layout_class(self.layout_type) as before.
- Around line 1156-1157: Cache the resolved layout class and the boolean
QUANTIZES_INPUT during model load instead of resolving them in forward: in
_load_from_state_dict (or wherever self.layout_type is set) call
get_layout_class(self.layout_type) once, store it on the instance (e.g.,
self._layout_cls) and compute self._layout_quantizes_input =
getattr(self._layout_cls, "QUANTIZES_INPUT", True); then update forward to use
self._layout_cls and self._layout_quantizes_input instead of calling
get_layout_class(self.layout_type) and getattr each invocation.
In `@comfy/quant_ops.py`:
- Around line 60-83: The three layout stubs are inconsistent: _CKMxfp8Layout is
defined at module scope under "if not _CK_MXFP8_AVAILABLE" while
_CKSVDQuantW4A4Layout and _CKAWQW4A16Layout are defined inside the except
ImportError branches; make them consistent by moving/adding their stub
definitions to the same outer-pattern (i.e., define a stub class under "if not
_CK_SVDQUANT_W4A4_AVAILABLE:" and "if not _CK_AWQ_W4A16_AVAILABLE:" at module
scope like _CKMxfp8Layout) and keep the try/except only responsible for setting
the real import and toggling the _CK_*_AVAILABLE flags, referencing the symbols
_CKSVDQuantW4A4Layout and _CKAWQW4A16Layout to locate the code to change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d527adc4-fdbb-40b5-980b-8ca75fe532ee
📒 Files selected for processing (2)
comfy/ops.pycomfy/quant_ops.py
| elif self.quant_format == "svdquant_w4a4": | ||
| # SVDQuant W4A4: per-group weight scales + low-rank correction | ||
| # (proj_down, proj_up) + activation smoothing (smooth_factor) | ||
| wscales = self._load_scale_param(state_dict, prefix, "weight_scale", device, manually_loaded_keys) | ||
| proj_down = self._load_scale_param(state_dict, prefix, "proj_down", device, manually_loaded_keys) | ||
| proj_up = self._load_scale_param(state_dict, prefix, "proj_up", device, manually_loaded_keys) | ||
| smooth_factor = self._load_scale_param(state_dict, prefix, "smooth_factor", device, manually_loaded_keys) | ||
| act_unsigned = bool(layer_conf.get("act_unsigned", False)) | ||
|
|
||
| # Early Qwen-Image conversion artifacts did not persist the | ||
| # fused GELU -> fc2 unsigned-activation flag. Those layers | ||
| # are the second linear in the feed-forward block. | ||
| if not act_unsigned and ( | ||
| layer_name.endswith(".img_mlp.net.2") or layer_name.endswith(".txt_mlp.net.2") | ||
| ): | ||
| act_unsigned = True | ||
|
|
||
| if any(t is None for t in (wscales, proj_down, proj_up, smooth_factor)): | ||
| raise ValueError(f"Missing SVDQuant W4A4 parameters for layer {layer_name}") | ||
|
|
||
| params = layout_cls.Params( | ||
| scale=wscales, | ||
| orig_dtype=MixedPrecisionOps._compute_dtype, | ||
| orig_shape=(self.out_features, self.in_features), | ||
| proj_down=proj_down, | ||
| proj_up=proj_up, | ||
| smooth_factor=smooth_factor, | ||
| act_unsigned=act_unsigned, | ||
| ) |
There was a problem hiding this comment.
act_unsigned layer-name heuristic is a load-bearing workaround — please add a TODO and narrow the trigger.
The endswith(".img_mlp.net.2") or endswith(".txt_mlp.net.2") rule unconditionally flips act_unsigned to True for any model whose state_dict happens to use those submodule names — not just early Qwen-Image-Edit kitchen-native checkpoints. If a different topology (or future SVDQuant export) reuses the same names with signed activations, those layers will dispatch with the wrong activation domain and produce silently corrupted outputs (no exception, just bad samples). Two suggestions:
- Gate the override so it only triggers when the format truly requires a fused-GELU upstream signal (e.g., also check that
act_unsignedis absent fromlayer_confrather than merely False — a future exporter explicitly writingact_unsigned: falsewould currently still get overridden). - Add a
TODOreferencing the Qwen-Image-Edit conversion path so this can be deleted once re-exported checkpoints carry the flag.
🛡️ Suggested narrowing
- act_unsigned = bool(layer_conf.get("act_unsigned", False))
-
- # Early Qwen-Image conversion artifacts did not persist the
- # fused GELU -> fc2 unsigned-activation flag. Those layers
- # are the second linear in the feed-forward block.
- if not act_unsigned and (
- layer_name.endswith(".img_mlp.net.2") or layer_name.endswith(".txt_mlp.net.2")
- ):
- act_unsigned = True
+ # TODO(comfykit-awq): drop the layer-name heuristic once all SVDQuant
+ # exporters persist `act_unsigned`. Only override when the flag is
+ # absent from layer_conf, not when it's explicitly false.
+ if "act_unsigned" in layer_conf:
+ act_unsigned = bool(layer_conf["act_unsigned"])
+ else:
+ act_unsigned = layer_name.endswith(".img_mlp.net.2") or layer_name.endswith(".txt_mlp.net.2")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@comfy/ops.py` around lines 1000 - 1028, The heuristic that forces
act_unsigned True for layers matching ".img_mlp.net.2" or ".txt_mlp.net.2" is
too broad; update the condition in the svdquant_w4a4 branch so the override only
runs when the layer_conf does not explicitly contain the act_unsigned key (e.g.,
check presence/absence in layer_conf) and preserve the original act_unsigned
when it is explicitly set to False, and add a one-line TODO comment referencing
the Qwen-Image-Edit conversion path so future re-exports can remove this
workaround; locate and modify the code around the symbols act_unsigned,
layer_conf, layer_name inside the svdquant_w4a4 handling block.
Wires comfy-kitchen's int4 quantization layouts through ComfyUI's MixedPrecisionOps so the existing UNETLoader path can dispatch optimized int4 kernels for two checkpoint families used by
Qwen-Image-Edit-2511 (and similar SVD-LoRA + AWQ-modulation models).
353978a9SVDQuant W4A4 integrationquant_ops.py: registerTensorCoreSVDQuantW4A4Layoutwhen comfy-kitchen exposes it; gate the kitchen CUDA backend on cuda ≥ 13 (the optimized kitchen CUDA ops are validated against cu13+;older cu falls back to eager).
ops.py: handlesvdquant_w4a4quant_formatby loadingweight_scale / proj_down / proj_up / smooth_factorintoTensorCoreSVDQuantW4A4Layout.Params, with theimg_mlp.net.2/txt_mlp.net.2fallback foract_unsigned(post-GELU u4.s4 MMA path).feat/svdquant-w4a4-kitchen-native).3ddcc095AWQ W4A16 modulation integrationquant_ops.py: detectTensorCoreAWQW4A16Layout, stub for the no-kitchen fallback (mirrors W4A4 pattern), register the layout class, addawq_w4a16toQUANT_ALGOS(storageint8packeduint4, params
{weight_scale, weight_zero}, defaultgroup_size=64).ops.py: add theawq_w4a16branch inMixedPrecisionOps.Linear._load_from_state_dictthat constructsParams(scale, zeros, group_size, …)and wrapsqweightinto aQuantizedTensor;F.linearthen dispatches tock.gemv_awq_w4a16via the layout'satenhandlers.img_mod.1/txt_mod.1) currently dominate disk + VRAM because they're materializedas plain bf16 Linear during conversion.
feat/awq-w4a16-modulation(companion PR coming after Low resolution on HiDPi monitor #36 lands).Verification
precision floor of
(nibble - 8) * scale + zerochains).