Migrate scmp_llm: SC + MP + application/Diffusion + evaluation + archived by heroarmor · Pull Request #1 · CrucibleComputingGroup/scmp_kernels

heroarmor · 2026-05-11T20:25:37Z

Summary

End-to-end migration of scmp_llm into scmp_kernels. Every Python file in scmp_llm (SC kernels, Q-DiT integration, evaluation tools, legacy reference code) now has a home here, with a working runtime path for the diffusion application.

What landed

SC kernels (`scmp_kernels/sc/`)

sc_triton.py (single-file replacement of the previous kernels.py+matmul.py split), sng.py, rng.py, lfsr_taps.py, config_helpers.py, constants.py — copied verbatim from scmp_llm/SC/, imports rewritten to relative.
Quant clipping fully deleted. Both unipolar (q_lo, q_hi = 2, q_max - 2) and bipolar (q_clip = q_norm - 2) margins are gone, including the q_clip/q_clip_min/q_norm Triton kernel parameters and _grouped_symmetric_quant's clip_margin kwarg. 8-bit unipolar now uses full [0, 255] (256 levels); 8-bit bipolar uses full [-127, 127].

MP dispatch (`scmp_kernels/mp/`)

config.py ported from scmp_llm/SC/mp_config.py (762 lines).
__init__.py re-exports the public API: MPConfig, AdaptiveMPConfig, RangeMPConfig, RowAssignment, classify_rows_by_metric, adaptive_classify_rows, classify_groups_by_range, MPDistributionLogger, MetricProfiler.

Application: `application/Diffusion/`

91 source files copied from scmp_llm/Q-DiT/ (including models/evaluations/{evaluator.py, convert_npz.py, ...} — 92MB Inception checkpoint excluded; download separately).
All bare imports rewritten: from sc_triton/sng/config_helpers/mp_config import ... → from scmp_kernels.{sc,mp}.* import ....
Dead sys.path.insert(0, .../SC) shims removed throughout.
qdit/sc_integration/mp_config.py is now a thin re-export shim of scmp_kernels.mp.
application/{ViT,WorldModel}/ reserved as .gitkeep placeholders.

Evaluation (`evaluation/`)

evaluation/Diffusion/: kid.py, compare_images.py, build_full_mosaic.py, build_sample_grids.py (from scmp_llm/evaluation/).
evaluation/Diffusion/imagenet_ref/: extract.py, parallel_npz.py, compute_fid_kid.py, compare_grid.py (from scmp_llm/imagenet256_ref/).
evaluation/{ViT,WorldModel}/ placeholders.

Archived (`archived/`)

archived/origin_cpu/: sc.py, sc_enable.py — original NumPy/PyTorch-CPU SC reference impls.
archived/bench/: bench_table_vs_compact.py, compare_cbsg.py, compare_enable.py, compare_matmul.py, compare_unarysim.py, test_kernel_opt.py.
archived/tools/: dse.py, noise_model_calibration.py.
Each subfolder has a README explaining contents and the bare-import caveat.

Repo hygiene

.gitignore for __pycache__/, *.py[cod], *.egg-info/, build/, dist/, .pytest_cache/.
README.md documents the new layout.
MIGRATION_PLAN.md deleted (executed; stale).

Coverage audit

Source	Files	Funcs/Classes	Covered
`scmp_llm/SC/` (active migration: 6 files)	6	127	127
`scmp_llm/SC/` (archived: 10 files)	10	46	46
`scmp_llm/Q-DiT/`	59	277	277
`scmp_llm/evaluation/`	4	15	15
`scmp_llm/imagenet256_ref/`	4	–	4 files
TOTAL	83	465 + 4	100%

Known not-migrated (vit_sc-only; never existed in scmp_llm)

det_kernel_tuning
make_sobol_antithetic_config, make_sobol_altseed_config
auto_calibrator.py (RidgeFitter, auto_calibrate_mp, AutoMPBudgetLogger)
set_current_block_idx / _CURRENT_BLOCK_IDX global hook
FreeBoundaryMPConfig, fixed_levels extra field on AdaptiveMPConfig

These are vit_sc-side conveniences. Add when porting vit_sc into application/ViT/.

Verified

Imports: import scmp_kernels, from scmp_kernels.{sc,mp,sc.sc_triton,sc.sng,sc.config_helpers}, plus all qdit.sc_integration.* modules — clean under torch 2.10.0+cu128 / triton 3.6.0.
Symbol parity: every top-level def/class in the migrated scmp_llm files exists in the corresponding scmp_kernels file (AST-checked).
MP wiring: calibration scripts (calibrate_mp_thresholds.py etc.) write a JSON threshold table; runtime constructs AdaptiveMPConfig(threshold_table_path=...) at quant_sc_main.py:576; sc_attention.py / sc_mlp.py call adaptive_classify_rows per-forward and dispatch each row's precision through the rng_levels= arg of the SC Triton kernels.

Not verified (no GPU on the dev login node)

Actual Triton kernel execution / numerics. Run on a CUDA box:

PYTHONPATH=. python -m pytest tests/test_sc_smoke.py -v
PYTHONPATH=application/Diffusion python -c \"import qdit.sc_integration.sc_attention\"

Test plan

pytest tests/test_sc_smoke.py on a CUDA + Triton box.
Diffusion pilot: run a short FID sweep with --mp_config_path <calibrated_table.json> and confirm metric is in expected band given the wider quant range (256 levels vs 252).
Confirm vit_sc-only items (det_kernel_tuning, auto-calibrator) are acceptable as follow-up when ViT lands.

- replace kernels.py + matmul.py with single sc_triton.py (scmp_llm base; 4178 lines) - overwrite sng.py / rng.py / lfsr_taps.py / config_helpers.py from scmp_llm (rng/lfsr_taps identical, sng minor diff) - rewrite relative imports (from sng/rng/lfsr_taps/sc -> from .sng/.rng/.lfsr_taps/.constants) - drop the quant margin clipping (q_lo/q_hi=2..253) so range is full 0..q_max - sc/__init__.py: export sc_matmul from sc_triton (det_kernel_tuning was vit_sc-only; not in scmp_llm base) - test_sc_smoke: drop det_kernel_tuning import Note: scmp_llm's config_helpers.py is a subset of the previous file (no vit_sc Sobol antithetic/altseed variants). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Diffusion is the active integration target (this owner); ViT and WorldModel are left empty for other owners. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Source-only mirror of scmp_llm/Q-DiT (excludes __pycache__, models/evaluations FID inception checkpoint, build artifacts, results, *.nsys-rep, png). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- scmp_kernels/mp/config.py: ported from scmp_llm/SC/mp_config.py (762 lines; MPConfig, AdaptiveMPConfig, RangeMPConfig, RowAssignment, classify_rows_by_metric, adaptive_classify_rows, classify_groups_by_range, MPDistributionLogger, MetricProfiler) - scmp_kernels/mp/__init__.py: re-export the public API - application/Diffusion: rewrite bare `from sc_triton/sng/config_helpers/mp_config import ...` to `from scmp_kernels.{sc,mp}.* import ...` across sc_integration/*.py, scripts, tests - application/Diffusion/qdit/sc_integration/mp_config.py: drop stale SC sys.path shim, re-export from scmp_kernels.mp - .gitignore: __pycache__, *.pyc, build artifacts Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Remove the `SC_PATH = ... / SC; sys.path.insert(...)` blocks left over from when scmp_llm SC code lived on a separate sys.path (8 files: sc_integration/{sc_attention,sc_matmul,sc_mlp}.py, tests/test_noise_matmul_adapters.py, scripts/{owen_mode_sweep,sobol_variant_sweep,sobol_scramble_seed_sweep,debug_fixed_level_sanity}.py) - Drop now-unused `import sys` / `from pathlib import Path` where the SC_PATH block was the sole user - sobol_variant_sweep / sobol_scramble_seed_sweep: replace bare `import sc_triton` with `from scmp_kernels.sc import sc_triton` (these scripts monkey-patch sc_triton internals) - README.md: reflect new layout (mp/ populated; application/ tree) - tests/test_sc_smoke.py: update docstring reference - Delete MIGRATION_PLAN.md (executed; stale) - Untrack legacy *.pyc files that predate the .gitignore Verified: full runtime import works under torch 2.10 / triton 3.6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The enable-signal SC matmul mechanism is in production via the four sc_matmul_enable*_triton entrypoints in scmp_kernels.sc.sc_triton. sc_enable.py is the original NumPy/PyTorch-CPU implementation it replaced; kept here for historical reference and as a numerical cross-check target for the Triton kernels. Not on the runtime path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…bench,tools} - origin_cpu/sc.py: NumPy reference matmul_sc (joins existing sc_enable.py) - bench/: bench_table_vs_compact, compare_cbsg, compare_enable, compare_matmul, compare_unarysim, test_kernel_opt - tools/: dse, noise_model_calibration All confirmed dead at migration time (no in-repo callers). Kept as historical reference and as starting points for regression benchmarks or resurrection. Each subfolder has a README explaining contents and the bare-import caveat. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- evaluation/Diffusion/: kid.py, build_full_mosaic.py, build_sample_grids.py, compare_images.py (from scmp_llm/evaluation/) - evaluation/{ViT,WorldModel}/: .gitkeep placeholders - application/Diffusion/models/evaluations/: evaluator.py, convert_npz.py, evaluate.sh, requirements.txt, README.md, __init__.py (the 3 .py files I skipped during the original Q-DiT copy because the dir also held a 92MB FID inception .pb) - Top-level README.md: document the new evaluation/ + archived/ layout The Inception checkpoint (classify_image_graph_def.pb, 92MB) is NOT included — download separately and place alongside evaluator.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

From scmp_llm/imagenet256_ref/ (gitignored results dir). Used while building the FID/KID reference set: - extract.py: NPZ -> PNG extractor - parallel_npz.py: threaded PNG -> NPZ converter - compute_fid_kid.py: cleanfid FID/KID against a pre-built reference - compare_grid.py: side-by-side sweep diff Hardcoded /scratch/.../scmp_llm/ paths at the top of each file need editing before re-running. Documented in the subfolder README. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The earlier clipping removal handled the unipolar path (q_lo/q_hi margin -> full [0, q_max]). Two bipolar quantization helpers still clipped to q_norm-2 (±125 instead of ±127 for 8-bit signed): - fused_quantize_bipolar (sc/sc_triton.py:1749) - _sc_matmul_bipolar (sc/sc_triton.py:3460) Now both use the full symmetric range. Mirrors the same intent as the unipolar fix: drop the safety margin and use all 256 quantization levels. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

After the earlier q_lo/q_hi and q_norm-2 removals, q_clip / q_norm were still threaded through the bipolar quantization API as redundant parameters. Same for _grouped_symmetric_quant's clip_margin kwarg. Now removed: - fused_quant_bipolar_kernel: parameters (q_clip, q_clip_min, q_norm) -> (q_max). - fused_quant_bipolar_perrow_kernel: (q_clip, q_norm) -> (q_max). - fused_quantize_bipolar / fused_quantize_bipolar_perrow: drop local q_clip variable. - _sc_matmul_bipolar: drop q_norm/q_clip; use q_max throughout. - _grouped_symmetric_quant: drop clip_margin parameter (was always 0 at call sites). - All 4 _grouped_symmetric_quant call sites: drop clip_margin=0. Runtime smoke: full package + simplified _grouped_symmetric_quant import and run under torch 2.10 / triton 3.6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Allenjin123 · 2026-05-11T23:41:43Z

We want a clear separation between the SC implementation and the application implementation, so no diffusion-related code should be included. We should also remove all legacy code(that is what initial commit is doing), since we are primarily using agents to write and maintain the code now. Deleting stale content will help reduce context length.

Please use this repository as a submodule for the other repository. Also, please commit changes directly to the existing repository instead of rebasing. Let's actually learn how to collaborate as a team! Thanks.

heroarmor and others added 5 commits May 11, 2026 16:13

scaffold application/ with Diffusion, ViT, WorldModel placeholders

2becf19

Diffusion is the active integration target (this owner); ViT and WorldModel are left empty for other owners. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

populate application/Diffusion with Q-DiT source

56515b3

Source-only mirror of scmp_llm/Q-DiT (excludes __pycache__, models/evaluations FID inception checkpoint, build artifacts, results, *.nsys-rep, png). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

heroarmor changed the title ~~Sync SC code from scmp_llm base; remove quant margin clipping~~ Migrate SC kernels + populate mp/ + add application/Diffusion May 11, 2026

heroarmor and others added 6 commits May 11, 2026 18:35

heroarmor changed the title ~~Migrate SC kernels + populate mp/ + add application/Diffusion~~ Migrate scmp_llm: SC + MP + application/Diffusion + evaluation + archived May 11, 2026

Allenjin123 closed this May 11, 2026

heroarmor mentioned this pull request May 12, 2026

Add MP module, drop quant-clipping margin, unify sc_matmul dispatcher #2

Merged

2 tasks

heroarmor mentioned this pull request May 20, 2026

sc/kernels: apply Owen scramble in the rescale branch (fixes halve x4.215 at 128 cycles) #16

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate scmp_llm: SC + MP + application/Diffusion + evaluation + archived#1

Migrate scmp_llm: SC + MP + application/Diffusion + evaluation + archived#1
heroarmor wants to merge 11 commits into
CrucibleComputingGroup:mainfrom
heroarmor:sync-from-scmp_llm

heroarmor commented May 11, 2026 •

edited

Loading

Uh oh!

Allenjin123 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

heroarmor commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What landed

SC kernels (scmp_kernels/sc/)

MP dispatch (scmp_kernels/mp/)

Application: application/Diffusion/

Evaluation (evaluation/)

Archived (archived/)

Repo hygiene

Coverage audit

Known not-migrated (vit_sc-only; never existed in scmp_llm)

Verified

Not verified (no GPU on the dev login node)

Test plan

Uh oh!

Allenjin123 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

heroarmor commented May 11, 2026 •

edited

Loading

SC kernels (`scmp_kernels/sc/`)

MP dispatch (`scmp_kernels/mp/`)

Application: `application/Diffusion/`

Evaluation (`evaluation/`)

Archived (`archived/`)