quant: extract FP→int quantization into its own sub-package by Allenjin123 · Pull Request #13 · CrucibleComputingGroup/scmp_kernels

Allenjin123 · 2026-05-20T06:06:02Z

Summary

Move FP→int quant out of scmp_kernels/sc/kernels.py into a new scmp_kernels/quant/ sub-package so quant strategies can be explored independently of the SC matmul.
- quant/fused.py — Triton fused_quant_kernel, fused_quant_bipolar_batched_kernel, host wrappers fused_quantize_bipolar / _perrow / _unipolar, plus _quant_dummy.
- quant/grouped.py — PyTorch _grouped_symmetric_quant, _grouped_asymmetric_quant, _grouped_symmetric_quant_batched.
sc/kernels.py re-exports all moved symbols under their original names — existing callers (scmp_diffusion/optimization_workspace/sc_patch.py, optimized_kernels.py, scmp_llm/calibrate_mp_thresholds.py) keep working unchanged.

No logic changes — strictly a mechanical move. The fused Triton kernels are byte-identical to before; only their module home changed.

Test plan

Verified on RTX PRO 6000 Blackwell (gl1810):

8/8 golden sc_matmul cases bit-exact vs pre-refactor reference (per_tensor/per_row/per_head × bipolar/unipolar × 2D/3D + chunk_d + low-prec)
scmp_llm/kernels/tests/test_sc_smoke.py: 13/13 PASS
per_row 128×128×128 sc_prec=8 stoc_len=256: 0.406 → 0.413 ms/it (+1.6%, within shared-GPU noise)
Both new (from scmp_kernels.quant import ...) and legacy (from scmp_kernels.sc.kernels import ...) import paths return the same objects
Simulated scmp_diffusion external import lists still resolve

🤖 Generated with Claude Code

Move the Triton fused quant kernels (fused_quant_kernel, fused_quant_bipolar_batched_kernel + host wrappers) and the PyTorch grouped quant helpers out of sc/kernels.py into scmp_kernels/quant/, so quant strategies can be explored independently of the SC matmul. sc/kernels.py re-exports the moved symbols under their original names, so existing callers (scmp_diffusion/optimization_workspace/sc_patch.py, optimized_kernels.py, scmp_llm calibration) keep working unchanged. Verified on RTX PRO 6000 Blackwell (gl1810): * 8/8 golden sc_matmul cases bit-exact vs pre-refactor reference * per_row 128×128×128 sc_prec=8 stoc_len=256: 0.406 → 0.413 ms/it (+1.6%, within shared-GPU noise) * scmp_llm/kernels/tests/test_sc_smoke.py: 13/13 PASS Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

+        offsets = pid * BLOCK + tl.arange(0, BLOCK)
+        total = rows * cols
+        mask = offsets < total
+        base = 0  # offsets already absolute


+from ..quant import (
+    fused_quant_kernel,
+    fused_quant_bipolar_batched_kernel,
+    fused_quantize_bipolar,
+    fused_quantize_bipolar_perrow,
+    fused_quantize_unipolar,
+    _quant_dummy,
+    _grouped_symmetric_quant,
+    _grouped_asymmetric_quant,
+    _grouped_symmetric_quant_batched,
+)


heroarmor

Reviewed — clean mechanical move, safe to merge. ✅

Verified line-by-line: the moved Triton/PyTorch code is byte-identical to the deletions, and sc/kernels.py re-exports all 9 symbols under their original names, so both external callers and the in-file call sites resolve fine.

No circular-import risk despite quant/fused.py referencing sc.kernels._resolve_rng_levels — that import is deferred inside the function body, so it works in either import order. Good call isolating it.

Minor nit (non-blocking): after the move, sc/kernels.py has two now-unused imports:

import functools (line 24) — lru_cache left with _quant_dummy
from triton.language.extra.cuda import libdevice (line 31) — nearbyint only used in the moved kernels

Worth a one-line cleanup, but not worth holding the merge over.

Copilot

Pull request overview

This PR refactors FP→int quantization code out of scmp_kernels.sc.kernels into a new scmp_kernels.quant subpackage, while keeping backward compatibility by re-exporting the moved symbols from the original module.

Changes:

Added scmp_kernels/quant/ subpackage containing Triton fused quant kernels/wrappers and PyTorch grouped quant helpers.
Updated scmp_kernels/sc/kernels.py to re-export quantization APIs from scmp_kernels.quant under their original names.
Removed the inlined quantization implementations from scmp_kernels/sc/kernels.py (replaced with pointers to the new module locations).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
scmp_kernels/sc/kernels.py	Re-exports quantization symbols from `scmp_kernels.quant` and removes the inlined implementations.
scmp_kernels/quant/fused.py	New home for Triton fused quant kernels and host wrappers (plus batched bipolar fused-transpose kernel).
scmp_kernels/quant/grouped.py	New home for grouped symmetric/asymmetric PyTorch quant helpers.
scmp_kernels/quant/init.py	Public re-export surface for the new quant subpackage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    sc_prec: int,
+    compute_sum: bool = False,
+    rng_levels: Optional[int] = None,
+) -> tuple[torch.Tensor, float, float, torch.Tensor | None]:


+# Local import to avoid a circular dependency at module load.  `_resolve_rng_levels`
+# is about SC's RNG grid size, not quantization — it stays in sc.kernels.
+def _resolve_rng_levels(sc_prec: int, rng_levels: Optional[int]) -> int:
+    from ..sc.kernels import _resolve_rng_levels as _impl
+    return _impl(sc_prec, rng_levels)


+    inv_scale_ptr,     # (BH,) float32 — per-head inv_scale
+    q_max,             # int: 2^(sc_prec-1) - 1
+    q_min,             # int: -(2^(sc_prec-1))
+    max_rng_val,       # int: 2^sc_prec - 1


heroarmor

Approving — clean mechanical extraction, byte-identical move, no circular-import risk, re-exports keep all callers working. The two unused imports (functools, libdevice) in sc/kernels.py are a non-blocking cleanup nit.

The quant sub-package extraction described by this plan landed in #13. The plan is now historical context only — remove from the working tree to avoid drift between the doc and the actual layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* quant: add SmoothQuant pre-quantization transform Mathematically equivalent diagonal rescale along D that migrates per-channel activation outliers into the weight: Y = A @ B.T = (A / s) @ (B * s).T, s_j = act_max[j]^a / w_max[j]^(1-a) New helpers in scmp_kernels/quant/smoothquant.py: accumulate_act_scales - per-channel max-abs aggregator for calibration compute_smooth_scales - build s from calibrated stats + weight apply_smoothing - apply diagonal rescale (2D and 3D) apply_smoothing_offline- bake s into the weight once Wired into sc_matmul as an optional smooth_scales kwarg; default None preserves byte-for-byte legacy behavior. Works for all three granularities (per_tensor, per_row, per_head). Tests in tests/test_smoothquant.py cover the math identity, calibration aggregator, alpha=0/1 closed forms, MSE improvement under simulated int8 quant for all three granularities (13-17x on synthetic outliers), and the sc_matmul kwarg-vs-manual equivalence (CUDA-only). Reference: Xiao et al., "SmoothQuant: Accurate and Efficient PTQ for LLMs", ICML 2023. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: remove stale MIGRATION_PLAN.md The quant sub-package extraction described by this plan landed in #13. The plan is now historical context only — remove from the working tree to avoid drift between the doc and the actual layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-code-quality Bot found potential problems May 20, 2026

View reviewed changes

Allenjin123 requested review from Copilot and heroarmor May 20, 2026 06:29

Copilot started reviewing on behalf of Allenjin123 May 20, 2026 06:30 View session

heroarmor reviewed May 20, 2026

View reviewed changes

heroarmor mentioned this pull request May 20, 2026

quant: add SmoothQuant pre-quantization transform #15

Merged

3 tasks

Copilot AI reviewed May 20, 2026

View reviewed changes

heroarmor approved these changes May 20, 2026

View reviewed changes

heroarmor merged commit fa7387d into main May 20, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quant: extract FP→int quantization into its own sub-package#13

quant: extract FP→int quantization into its own sub-package#13
heroarmor merged 1 commit into
mainfrom
refactor/extract-quant-module

Allenjin123 commented May 20, 2026

Uh oh!

heroarmor left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

heroarmor left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Allenjin123 commented May 20, 2026

Summary

Test plan

Uh oh!

heroarmor left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

heroarmor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants