Migrate scmp_llm: SC + MP + application/Diffusion + evaluation + archived#1
Closed
heroarmor wants to merge 11 commits into
Closed
Migrate scmp_llm: SC + MP + application/Diffusion + evaluation + archived#1heroarmor wants to merge 11 commits into
heroarmor wants to merge 11 commits into
Conversation
- replace kernels.py + matmul.py with single sc_triton.py (scmp_llm base; 4178 lines) - overwrite sng.py / rng.py / lfsr_taps.py / config_helpers.py from scmp_llm (rng/lfsr_taps identical, sng minor diff) - rewrite relative imports (from sng/rng/lfsr_taps/sc -> from .sng/.rng/.lfsr_taps/.constants) - drop the quant margin clipping (q_lo/q_hi=2..253) so range is full 0..q_max - sc/__init__.py: export sc_matmul from sc_triton (det_kernel_tuning was vit_sc-only; not in scmp_llm base) - test_sc_smoke: drop det_kernel_tuning import Note: scmp_llm's config_helpers.py is a subset of the previous file (no vit_sc Sobol antithetic/altseed variants). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Diffusion is the active integration target (this owner); ViT and WorldModel are left empty for other owners. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Source-only mirror of scmp_llm/Q-DiT (excludes __pycache__, models/evaluations FID inception checkpoint, build artifacts, results, *.nsys-rep, png). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- scmp_kernels/mp/config.py: ported from scmp_llm/SC/mp_config.py (762 lines; MPConfig, AdaptiveMPConfig, RangeMPConfig, RowAssignment, classify_rows_by_metric, adaptive_classify_rows, classify_groups_by_range, MPDistributionLogger, MetricProfiler)
- scmp_kernels/mp/__init__.py: re-export the public API
- application/Diffusion: rewrite bare `from sc_triton/sng/config_helpers/mp_config import ...` to `from scmp_kernels.{sc,mp}.* import ...` across sc_integration/*.py, scripts, tests
- application/Diffusion/qdit/sc_integration/mp_config.py: drop stale SC sys.path shim, re-export from scmp_kernels.mp
- .gitignore: __pycache__, *.pyc, build artifacts
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove the `SC_PATH = ... / SC; sys.path.insert(...)` blocks left over from when scmp_llm SC code lived on a separate sys.path (8 files: sc_integration/{sc_attention,sc_matmul,sc_mlp}.py, tests/test_noise_matmul_adapters.py, scripts/{owen_mode_sweep,sobol_variant_sweep,sobol_scramble_seed_sweep,debug_fixed_level_sanity}.py)
- Drop now-unused `import sys` / `from pathlib import Path` where the SC_PATH block was the sole user
- sobol_variant_sweep / sobol_scramble_seed_sweep: replace bare `import sc_triton` with `from scmp_kernels.sc import sc_triton` (these scripts monkey-patch sc_triton internals)
- README.md: reflect new layout (mp/ populated; application/ tree)
- tests/test_sc_smoke.py: update docstring reference
- Delete MIGRATION_PLAN.md (executed; stale)
- Untrack legacy *.pyc files that predate the .gitignore
Verified: full runtime import works under torch 2.10 / triton 3.6.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The enable-signal SC matmul mechanism is in production via the four sc_matmul_enable*_triton entrypoints in scmp_kernels.sc.sc_triton. sc_enable.py is the original NumPy/PyTorch-CPU implementation it replaced; kept here for historical reference and as a numerical cross-check target for the Triton kernels. Not on the runtime path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…bench,tools} - origin_cpu/sc.py: NumPy reference matmul_sc (joins existing sc_enable.py) - bench/: bench_table_vs_compact, compare_cbsg, compare_enable, compare_matmul, compare_unarysim, test_kernel_opt - tools/: dse, noise_model_calibration All confirmed dead at migration time (no in-repo callers). Kept as historical reference and as starting points for regression benchmarks or resurrection. Each subfolder has a README explaining contents and the bare-import caveat. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- evaluation/Diffusion/: kid.py, build_full_mosaic.py, build_sample_grids.py, compare_images.py (from scmp_llm/evaluation/)
- evaluation/{ViT,WorldModel}/: .gitkeep placeholders
- application/Diffusion/models/evaluations/: evaluator.py, convert_npz.py, evaluate.sh, requirements.txt, README.md, __init__.py (the 3 .py files I skipped during the original Q-DiT copy because the dir also held a 92MB FID inception .pb)
- Top-level README.md: document the new evaluation/ + archived/ layout
The Inception checkpoint (classify_image_graph_def.pb, 92MB) is NOT included — download separately and place alongside evaluator.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
From scmp_llm/imagenet256_ref/ (gitignored results dir). Used while building the FID/KID reference set: - extract.py: NPZ -> PNG extractor - parallel_npz.py: threaded PNG -> NPZ converter - compute_fid_kid.py: cleanfid FID/KID against a pre-built reference - compare_grid.py: side-by-side sweep diff Hardcoded /scratch/.../scmp_llm/ paths at the top of each file need editing before re-running. Documented in the subfolder README. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier clipping removal handled the unipolar path (q_lo/q_hi margin -> full [0, q_max]). Two bipolar quantization helpers still clipped to q_norm-2 (±125 instead of ±127 for 8-bit signed): - fused_quantize_bipolar (sc/sc_triton.py:1749) - _sc_matmul_bipolar (sc/sc_triton.py:3460) Now both use the full symmetric range. Mirrors the same intent as the unipolar fix: drop the safety margin and use all 256 quantization levels. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the earlier q_lo/q_hi and q_norm-2 removals, q_clip / q_norm were still threaded through the bipolar quantization API as redundant parameters. Same for _grouped_symmetric_quant's clip_margin kwarg. Now removed: - fused_quant_bipolar_kernel: parameters (q_clip, q_clip_min, q_norm) -> (q_max). - fused_quant_bipolar_perrow_kernel: (q_clip, q_norm) -> (q_max). - fused_quantize_bipolar / fused_quantize_bipolar_perrow: drop local q_clip variable. - _sc_matmul_bipolar: drop q_norm/q_clip; use q_max throughout. - _grouped_symmetric_quant: drop clip_margin parameter (was always 0 at call sites). - All 4 _grouped_symmetric_quant call sites: drop clip_margin=0. Runtime smoke: full package + simplified _grouped_symmetric_quant import and run under torch 2.10 / triton 3.6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
2 tasks
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
End-to-end migration of
scmp_llmintoscmp_kernels. Every Python file inscmp_llm(SC kernels, Q-DiT integration, evaluation tools, legacy reference code) now has a home here, with a working runtime path for the diffusion application.What landed
SC kernels (
scmp_kernels/sc/)sc_triton.py(single-file replacement of the previouskernels.py+matmul.pysplit),sng.py,rng.py,lfsr_taps.py,config_helpers.py,constants.py— copied verbatim fromscmp_llm/SC/, imports rewritten to relative.q_lo, q_hi = 2, q_max - 2) and bipolar (q_clip = q_norm - 2) margins are gone, including theq_clip/q_clip_min/q_normTriton kernel parameters and_grouped_symmetric_quant'sclip_marginkwarg. 8-bit unipolar now uses full[0, 255](256 levels); 8-bit bipolar uses full[-127, 127].MP dispatch (
scmp_kernels/mp/)config.pyported fromscmp_llm/SC/mp_config.py(762 lines).__init__.pyre-exports the public API:MPConfig,AdaptiveMPConfig,RangeMPConfig,RowAssignment,classify_rows_by_metric,adaptive_classify_rows,classify_groups_by_range,MPDistributionLogger,MetricProfiler.Application:
application/Diffusion/scmp_llm/Q-DiT/(includingmodels/evaluations/{evaluator.py, convert_npz.py, ...}— 92MB Inception checkpoint excluded; download separately).from sc_triton/sng/config_helpers/mp_config import ...→from scmp_kernels.{sc,mp}.* import ....sys.path.insert(0, .../SC)shims removed throughout.qdit/sc_integration/mp_config.pyis now a thin re-export shim ofscmp_kernels.mp.application/{ViT,WorldModel}/reserved as.gitkeepplaceholders.Evaluation (
evaluation/)evaluation/Diffusion/:kid.py,compare_images.py,build_full_mosaic.py,build_sample_grids.py(fromscmp_llm/evaluation/).evaluation/Diffusion/imagenet_ref/:extract.py,parallel_npz.py,compute_fid_kid.py,compare_grid.py(fromscmp_llm/imagenet256_ref/).evaluation/{ViT,WorldModel}/placeholders.Archived (
archived/)archived/origin_cpu/:sc.py,sc_enable.py— original NumPy/PyTorch-CPU SC reference impls.archived/bench/:bench_table_vs_compact.py,compare_cbsg.py,compare_enable.py,compare_matmul.py,compare_unarysim.py,test_kernel_opt.py.archived/tools/:dse.py,noise_model_calibration.py.Repo hygiene
.gitignorefor__pycache__/,*.py[cod],*.egg-info/,build/,dist/,.pytest_cache/.README.mddocuments the new layout.MIGRATION_PLAN.mddeleted (executed; stale).Coverage audit
scmp_llm/SC/(active migration: 6 files)scmp_llm/SC/(archived: 10 files)scmp_llm/Q-DiT/scmp_llm/evaluation/scmp_llm/imagenet256_ref/Known not-migrated (vit_sc-only; never existed in scmp_llm)
det_kernel_tuningmake_sobol_antithetic_config,make_sobol_altseed_configauto_calibrator.py(RidgeFitter,auto_calibrate_mp,AutoMPBudgetLogger)set_current_block_idx/_CURRENT_BLOCK_IDXglobal hookFreeBoundaryMPConfig,fixed_levelsextra field onAdaptiveMPConfigThese are vit_sc-side conveniences. Add when porting
vit_scintoapplication/ViT/.Verified
import scmp_kernels,from scmp_kernels.{sc,mp,sc.sc_triton,sc.sng,sc.config_helpers}, plus allqdit.sc_integration.*modules — clean under torch 2.10.0+cu128 / triton 3.6.0.scmp_llmfiles exists in the correspondingscmp_kernelsfile (AST-checked).calibrate_mp_thresholds.pyetc.) write a JSON threshold table; runtime constructsAdaptiveMPConfig(threshold_table_path=...)atquant_sc_main.py:576;sc_attention.py/sc_mlp.pycalladaptive_classify_rowsper-forward and dispatch each row's precision through therng_levels=arg of the SC Triton kernels.Not verified (no GPU on the dev login node)
Test plan
pytest tests/test_sc_smoke.pyon a CUDA + Triton box.--mp_config_path <calibrated_table.json>and confirm metric is in expected band given the wider quant range (256 levels vs 252).det_kernel_tuning, auto-calibrator) are acceptable as follow-up when ViT lands.