Skip to content

Migrate scmp_llm: SC + MP + application/Diffusion + evaluation + archived#1

Closed
heroarmor wants to merge 11 commits into
CrucibleComputingGroup:mainfrom
heroarmor:sync-from-scmp_llm
Closed

Migrate scmp_llm: SC + MP + application/Diffusion + evaluation + archived#1
heroarmor wants to merge 11 commits into
CrucibleComputingGroup:mainfrom
heroarmor:sync-from-scmp_llm

Conversation

@heroarmor
Copy link
Copy Markdown
Collaborator

@heroarmor heroarmor commented May 11, 2026

Summary

End-to-end migration of scmp_llm into scmp_kernels. Every Python file in scmp_llm (SC kernels, Q-DiT integration, evaluation tools, legacy reference code) now has a home here, with a working runtime path for the diffusion application.

What landed

SC kernels (scmp_kernels/sc/)

  • sc_triton.py (single-file replacement of the previous kernels.py+matmul.py split), sng.py, rng.py, lfsr_taps.py, config_helpers.py, constants.py — copied verbatim from scmp_llm/SC/, imports rewritten to relative.
  • Quant clipping fully deleted. Both unipolar (q_lo, q_hi = 2, q_max - 2) and bipolar (q_clip = q_norm - 2) margins are gone, including the q_clip/q_clip_min/q_norm Triton kernel parameters and _grouped_symmetric_quant's clip_margin kwarg. 8-bit unipolar now uses full [0, 255] (256 levels); 8-bit bipolar uses full [-127, 127].

MP dispatch (scmp_kernels/mp/)

  • config.py ported from scmp_llm/SC/mp_config.py (762 lines).
  • __init__.py re-exports the public API: MPConfig, AdaptiveMPConfig, RangeMPConfig, RowAssignment, classify_rows_by_metric, adaptive_classify_rows, classify_groups_by_range, MPDistributionLogger, MetricProfiler.

Application: application/Diffusion/

  • 91 source files copied from scmp_llm/Q-DiT/ (including models/evaluations/{evaluator.py, convert_npz.py, ...} — 92MB Inception checkpoint excluded; download separately).
  • All bare imports rewritten: from sc_triton/sng/config_helpers/mp_config import ...from scmp_kernels.{sc,mp}.* import ....
  • Dead sys.path.insert(0, .../SC) shims removed throughout.
  • qdit/sc_integration/mp_config.py is now a thin re-export shim of scmp_kernels.mp.
  • application/{ViT,WorldModel}/ reserved as .gitkeep placeholders.

Evaluation (evaluation/)

  • evaluation/Diffusion/: kid.py, compare_images.py, build_full_mosaic.py, build_sample_grids.py (from scmp_llm/evaluation/).
  • evaluation/Diffusion/imagenet_ref/: extract.py, parallel_npz.py, compute_fid_kid.py, compare_grid.py (from scmp_llm/imagenet256_ref/).
  • evaluation/{ViT,WorldModel}/ placeholders.

Archived (archived/)

  • archived/origin_cpu/: sc.py, sc_enable.py — original NumPy/PyTorch-CPU SC reference impls.
  • archived/bench/: bench_table_vs_compact.py, compare_cbsg.py, compare_enable.py, compare_matmul.py, compare_unarysim.py, test_kernel_opt.py.
  • archived/tools/: dse.py, noise_model_calibration.py.
  • Each subfolder has a README explaining contents and the bare-import caveat.

Repo hygiene

  • .gitignore for __pycache__/, *.py[cod], *.egg-info/, build/, dist/, .pytest_cache/.
  • README.md documents the new layout.
  • MIGRATION_PLAN.md deleted (executed; stale).

Coverage audit

Source Files Funcs/Classes Covered
scmp_llm/SC/ (active migration: 6 files) 6 127 127
scmp_llm/SC/ (archived: 10 files) 10 46 46
scmp_llm/Q-DiT/ 59 277 277
scmp_llm/evaluation/ 4 15 15
scmp_llm/imagenet256_ref/ 4 4 files
TOTAL 83 465 + 4 100%

Known not-migrated (vit_sc-only; never existed in scmp_llm)

  • det_kernel_tuning
  • make_sobol_antithetic_config, make_sobol_altseed_config
  • auto_calibrator.py (RidgeFitter, auto_calibrate_mp, AutoMPBudgetLogger)
  • set_current_block_idx / _CURRENT_BLOCK_IDX global hook
  • FreeBoundaryMPConfig, fixed_levels extra field on AdaptiveMPConfig

These are vit_sc-side conveniences. Add when porting vit_sc into application/ViT/.

Verified

  • Imports: import scmp_kernels, from scmp_kernels.{sc,mp,sc.sc_triton,sc.sng,sc.config_helpers}, plus all qdit.sc_integration.* modules — clean under torch 2.10.0+cu128 / triton 3.6.0.
  • Symbol parity: every top-level def/class in the migrated scmp_llm files exists in the corresponding scmp_kernels file (AST-checked).
  • MP wiring: calibration scripts (calibrate_mp_thresholds.py etc.) write a JSON threshold table; runtime constructs AdaptiveMPConfig(threshold_table_path=...) at quant_sc_main.py:576; sc_attention.py / sc_mlp.py call adaptive_classify_rows per-forward and dispatch each row's precision through the rng_levels= arg of the SC Triton kernels.

Not verified (no GPU on the dev login node)

  • Actual Triton kernel execution / numerics. Run on a CUDA box:
    PYTHONPATH=. python -m pytest tests/test_sc_smoke.py -v
    PYTHONPATH=application/Diffusion python -c \"import qdit.sc_integration.sc_attention\"
    

Test plan

  • pytest tests/test_sc_smoke.py on a CUDA + Triton box.
  • Diffusion pilot: run a short FID sweep with --mp_config_path <calibrated_table.json> and confirm metric is in expected band given the wider quant range (256 levels vs 252).
  • Confirm vit_sc-only items (det_kernel_tuning, auto-calibrator) are acceptable as follow-up when ViT lands.

heroarmor and others added 5 commits May 11, 2026 16:13
- replace kernels.py + matmul.py with single sc_triton.py (scmp_llm base; 4178 lines)
- overwrite sng.py / rng.py / lfsr_taps.py / config_helpers.py from scmp_llm (rng/lfsr_taps identical, sng minor diff)
- rewrite relative imports (from sng/rng/lfsr_taps/sc -> from .sng/.rng/.lfsr_taps/.constants)
- drop the quant margin clipping (q_lo/q_hi=2..253) so range is full 0..q_max
- sc/__init__.py: export sc_matmul from sc_triton (det_kernel_tuning was vit_sc-only; not in scmp_llm base)
- test_sc_smoke: drop det_kernel_tuning import

Note: scmp_llm's config_helpers.py is a subset of the previous file (no vit_sc Sobol antithetic/altseed variants).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Diffusion is the active integration target (this owner); ViT and WorldModel
are left empty for other owners.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Source-only mirror of scmp_llm/Q-DiT (excludes __pycache__, models/evaluations
FID inception checkpoint, build artifacts, results, *.nsys-rep, png).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- scmp_kernels/mp/config.py: ported from scmp_llm/SC/mp_config.py (762 lines; MPConfig, AdaptiveMPConfig, RangeMPConfig, RowAssignment, classify_rows_by_metric, adaptive_classify_rows, classify_groups_by_range, MPDistributionLogger, MetricProfiler)
- scmp_kernels/mp/__init__.py: re-export the public API
- application/Diffusion: rewrite bare `from sc_triton/sng/config_helpers/mp_config import ...` to `from scmp_kernels.{sc,mp}.* import ...` across sc_integration/*.py, scripts, tests
- application/Diffusion/qdit/sc_integration/mp_config.py: drop stale SC sys.path shim, re-export from scmp_kernels.mp
- .gitignore: __pycache__, *.pyc, build artifacts

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove the `SC_PATH = ... / SC; sys.path.insert(...)` blocks left over from when scmp_llm SC code lived on a separate sys.path (8 files: sc_integration/{sc_attention,sc_matmul,sc_mlp}.py, tests/test_noise_matmul_adapters.py, scripts/{owen_mode_sweep,sobol_variant_sweep,sobol_scramble_seed_sweep,debug_fixed_level_sanity}.py)
- Drop now-unused `import sys` / `from pathlib import Path` where the SC_PATH block was the sole user
- sobol_variant_sweep / sobol_scramble_seed_sweep: replace bare `import sc_triton` with `from scmp_kernels.sc import sc_triton` (these scripts monkey-patch sc_triton internals)
- README.md: reflect new layout (mp/ populated; application/ tree)
- tests/test_sc_smoke.py: update docstring reference
- Delete MIGRATION_PLAN.md (executed; stale)
- Untrack legacy *.pyc files that predate the .gitignore

Verified: full runtime import works under torch 2.10 / triton 3.6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@heroarmor heroarmor changed the title Sync SC code from scmp_llm base; remove quant margin clipping Migrate SC kernels + populate mp/ + add application/Diffusion May 11, 2026
heroarmor and others added 6 commits May 11, 2026 18:35
The enable-signal SC matmul mechanism is in production via the four
sc_matmul_enable*_triton entrypoints in scmp_kernels.sc.sc_triton.

sc_enable.py is the original NumPy/PyTorch-CPU implementation it
replaced; kept here for historical reference and as a numerical
cross-check target for the Triton kernels. Not on the runtime path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…bench,tools}

- origin_cpu/sc.py: NumPy reference matmul_sc (joins existing sc_enable.py)
- bench/: bench_table_vs_compact, compare_cbsg, compare_enable, compare_matmul, compare_unarysim, test_kernel_opt
- tools/: dse, noise_model_calibration

All confirmed dead at migration time (no in-repo callers). Kept as
historical reference and as starting points for regression benchmarks
or resurrection. Each subfolder has a README explaining contents and
the bare-import caveat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- evaluation/Diffusion/: kid.py, build_full_mosaic.py, build_sample_grids.py, compare_images.py (from scmp_llm/evaluation/)
- evaluation/{ViT,WorldModel}/: .gitkeep placeholders
- application/Diffusion/models/evaluations/: evaluator.py, convert_npz.py, evaluate.sh, requirements.txt, README.md, __init__.py (the 3 .py files I skipped during the original Q-DiT copy because the dir also held a 92MB FID inception .pb)
- Top-level README.md: document the new evaluation/ + archived/ layout

The Inception checkpoint (classify_image_graph_def.pb, 92MB) is NOT included — download separately and place alongside evaluator.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
From scmp_llm/imagenet256_ref/ (gitignored results dir). Used while
building the FID/KID reference set:
- extract.py: NPZ -> PNG extractor
- parallel_npz.py: threaded PNG -> NPZ converter
- compute_fid_kid.py: cleanfid FID/KID against a pre-built reference
- compare_grid.py: side-by-side sweep diff

Hardcoded /scratch/.../scmp_llm/ paths at the top of each file need
editing before re-running. Documented in the subfolder README.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier clipping removal handled the unipolar path (q_lo/q_hi
margin -> full [0, q_max]). Two bipolar quantization helpers still
clipped to q_norm-2 (±125 instead of ±127 for 8-bit signed):

  - fused_quantize_bipolar (sc/sc_triton.py:1749)
  - _sc_matmul_bipolar    (sc/sc_triton.py:3460)

Now both use the full symmetric range. Mirrors the same intent as the
unipolar fix: drop the safety margin and use all 256 quantization
levels.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the earlier q_lo/q_hi and q_norm-2 removals, q_clip / q_norm were
still threaded through the bipolar quantization API as redundant
parameters. Same for _grouped_symmetric_quant's clip_margin kwarg.
Now removed:

- fused_quant_bipolar_kernel: parameters (q_clip, q_clip_min, q_norm) -> (q_max).
- fused_quant_bipolar_perrow_kernel: (q_clip, q_norm) -> (q_max).
- fused_quantize_bipolar / fused_quantize_bipolar_perrow: drop local q_clip variable.
- _sc_matmul_bipolar: drop q_norm/q_clip; use q_max throughout.
- _grouped_symmetric_quant: drop clip_margin parameter (was always 0 at call sites).
- All 4 _grouped_symmetric_quant call sites: drop clip_margin=0.

Runtime smoke: full package + simplified _grouped_symmetric_quant
import and run under torch 2.10 / triton 3.6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@heroarmor heroarmor changed the title Migrate SC kernels + populate mp/ + add application/Diffusion Migrate scmp_llm: SC + MP + application/Diffusion + evaluation + archived May 11, 2026
@Allenjin123
Copy link
Copy Markdown
Contributor

We want a clear separation between the SC implementation and the application implementation, so no diffusion-related code should be included. We should also remove all legacy code(that is what initial commit is doing), since we are primarily using agents to write and maintain the code now. Deleting stale content will help reduce context length.

Please use this repository as a submodule for the other repository. Also, please commit changes directly to the existing repository instead of rebasing. Let's actually learn how to collaborate as a team! Thanks.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants