Bootstrap scmp_diffusion from scmp_llm by heroarmor · Pull Request #1 · CrucibleComputingGroup/scmp_diffusion

heroarmor · 2026-05-12T04:01:20Z

Initial bootstrap of `scmp_diffusion` from the Diffusion / Q-DiT / evaluation code that lived in the private `scmp_llm` staging repo. Consumes `scmp_kernels` PR #2 via git submodule.

Architecture

```
scmp_diffusion/ ← this PR
├── scmp_kernels/ ← git submodule → CrucibleComputingGroup/scmp_kernels
├── diffusion/ gaussian_diffusion + sampler (5 files, verbatim)
├── models/ DiT models + Inception evaluator wrapper (8 files, verbatim)
├── qdit/ Q-DiT quantisation + sc_integration (15 files, imports rewired)
├── scripts/ calibration / sweep / SLURM launchers (60 files)
├── tests/ unit tests (1 file)
├── evaluation/ FID / KID / sample-grid mosaics, ImageNet-256 ref-batch prep (10 files)
├── tools/ noise-model calibration utility (1 file)
└── utils/ download + logging (3 files)
```

`scmp_kernels` is a submodule pinned to its `add-mp-module` branch tip at SHA `fa6b5cd`. Once PR #2 lands on `CrucibleComputingGroup/scmp_kernels:main`, the submodule pin and URL will be flipped to org/main in a follow-up commit.

Commits (5)

1. `chore: add scmp_kernels submodule + .gitignore` — bootstraps the submodule at `./scmp_kernels`, URL pointing at `heroarmor/scmp_kernels.git` (the fork with PR #2 in flight). Standard repo-hygiene .gitignore.

2. `import: Q-DiT application code with imports rewired to scmp_kernels` (93 files, +16,750/−1) — brings in `qdit/`, `diffusion/`, `models/`, `scripts/`, `tests/`, `utils/` from `scmp_llm/Q-DiT/`. All SC/MP-related bare imports rewired:

`from sc_triton import ...` → `from scmp_kernels.sc import sc_matmul, ...`
`from sng/config_helpers/mp_config import ...` → `from scmp_kernels.{sc.sng, sc.config_helpers, mp} import ...`
Removed every `sys.path.insert(..., 'SC')` shim.
Deleted dead `qdit/sc_integration/sc_matmul.py` (vestigial — `sc_matmul_qk` had zero callers).
`qdit/sc_integration/mp_config.py` is now a thin re-export shim of `scmp_kernels.mp` (local relative imports inside the package keep resolving).
Excluded artifacts: `pycache/`, `build/`, `Q_DiT.egg-info/`, `.nsys-rep`, `.png`, `logs_mp_sweep/`, `results/`, 92 MB Inception `.pb` checkpoint.

3. `refactor: route all SC matmul calls through the unified sc_matmul dispatcher` (10 files, +206/−450) — `sc_attention.py` (4 `get*_fn` dispatchers collapsed to one), `sc_mlp.py`, `noise_matmul.py` (4 adapter functions collapsed to one matching `sc_matmul`), 4 scripts + 1 test. Every call site now uses `sc_matmul(granularity=...)`. 53 distinct call sites verified to cover all 3 granularities + grouped + chunked + per-head variants.

4. `remove sc_enable=False legacy path — enable-signal is canonical` (21 files, +46/−91) — team policy: `sc_enable=True` is the only supported mode (`scmp_kernels` intentionally dropped packed-XNOR/AND in favour of enable-signal table-lookup). This commit:

Drops the `sc_enable` ctor arg + attribute from `SCController`.
Removes 8 `if self.sc_controller.sc_enable` gates in `sc_attention.py` and 1 in `sc_mlp.py`.
Drops `--sc_enable` from CLI (`quant_sc_main.py`) and from 15 shell-script launchers.
Drops `sc_enable=...` from `SCController` construction in `sc_modelutils.py`.
Catches up 3 legacy methods in `sc_attention.py` that the earlier rewrite had missed (still had positional `max/min` args — would have crashed at runtime).

5. `migrate Gap 3 application-side aux code from scmp_llm` (11 files, +931/−1) — application-side tooling that lived in `scmp_llm/SC/`, `scmp_llm/evaluation/`, `scmp_llm/imagenet256_ref/`:

`tools/calibrate_noise_model.py` ← `SC/noise_model_calibration.py` (calibrates the surrogate consumed by `qdit/sc_integration/noise_matmul.py`).
`evaluation/{kid, build_full_mosaic, build_sample_grids, compare_images}.py` ← `evaluation/*.py`.
`evaluation/imagenet_ref/{extract, parallel_npz, compute_fid_kid, compare_grid}.py + run_openai_eval{,_v2}.sh` ← `imagenet256_ref/*`.
`.gitignore` extended for the 1.9 GB `VIRTUAL_imagenet256_labeled.npz` and 1.2 GB ImageNet `images/` artifacts that travel out-of-band.

Legacy code from `scmp_llm/SC/` (CPU reference `sc.py`/`sc_enable.py`, bench/compare scripts, DSE, in-file test helpers tied to packed-XNOR) is intentionally not migrated, per Allenjin123's "no legacy in scmp_kernels" rule extended to the app side.

Net effect — every `scmp_llm` Python file is accounted for

Source in `scmp_llm`	New home	Status
`SC/sc_triton.py`	`scmp_kernels/sc/kernels.py` + `matmul.py` (PR #2)	migrated
`SC/{sng,rng,lfsr_taps,config_helpers}.py`	`scmp_kernels/sc/*` (PR #2)	migrated
`SC/mp_config.py`	`scmp_kernels/mp/config.py` (PR #2)	migrated
`SC/{sc,sc_enable,bench_,compare_,test_kernel_opt,dse}.py`	—	dropped (legacy)
`SC/noise_model_calibration.py`	`tools/calibrate_noise_model.py`	migrated (this PR)
`Q-DiT/{diffusion,models,qdit,scripts,tests,utils}/`	`{same}/`	migrated (this PR)
`evaluation/`	`evaluation/`	migrated (this PR)
`imagenet256_ref/`	`evaluation/imagenet_ref/`	migrated (this PR)

Verified

AST parses for every `.py` outside the submodule.
Zero references to `sc_enable`, `sc_triton`, `bin_to_stoc_packed`, `xnor_matmul`, or any of the deprecated flat-API names anywhere in the app code.
53 `sc_matmul(granularity=...)` call sites exercise all granularities + group_a/group_b + chunk_d + rng_levels paths.

Not verified (no GPU on dev box)

Actual Triton kernel execution end-to-end.
Numerical reproducibility of pre-migration FID / KID results (the algorithm shift from packed-XNOR → enable-signal table-lookup means numbers won't bit-exactly match historical `scmp_llm` results, but the method is correct and the team-canonical algorithm).

Test plan

On a CUDA + Triton + DiT-checkpoint box: `cd scmp_kernels && pytest tests/test_sc_smoke.py -v` (submodule smoke).
`PYTHONPATH=. python -c "import qdit.sc_integration.sc_attention"` — confirm Q-DiT import chain resolves.
Run one quant_sc_main pass end-to-end on a small batch to verify the dispatcher reaches the kernels correctly.
Re-run one calibration sweep + one FID eval to establish new baseline numbers under the unified API.

🤖 Generated with Claude Code

Adds CrucibleComputingGroup/scmp_kernels as a submodule at ./scmp_kernels, pinned to heroarmor:add-mp-module branch tip (commit fbd7009). Will be re-pinned to org main once PR #2 lands. URL: https://github.com/heroarmor/scmp_kernels.git Path: ./scmp_kernels Initial pin: fbd7009 (add-mp-module branch — MP module + clipping removal + flat-API public surface for Q-DiT compatibility) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Brings the Diffusion application source code from scmp_llm/Q-DiT into this repo (93 files: qdit/, diffusion/, models/, scripts/, tests/, utils/, env files). All SC/MP-related bare imports are rewired to use the scmp_kernels submodule: from sc_triton import sc_matmul, sc_matmul_grouped, ... → from scmp_kernels.sc import sc_matmul_per_tensor as sc_matmul, sc_matmul_grouped, ... from sng import RNGPool, SNGBank → (deleted — only used by qdit/sc_integration/sc_matmul.py, which is itself deleted) from config_helpers import ... → from scmp_kernels.sc.config_helpers import ... from mp_config import (...) → from scmp_kernels.mp import (...) Other changes: - Deleted qdit/sc_integration/sc_matmul.py (vestigial — sc_matmul_qk had zero callers repo-wide; only path that needed xnor_matmul / bin_to_stoc_packed). - Removed corresponding line and __all__ entry in qdit/sc_integration/__init__.py. - qdit/sc_integration/mp_config.py is now a thin re-export shim of scmp_kernels.mp, so local relative imports (from .mp_config import …) in sc_attention/sc_mlp/sc_controller still resolve. - Removed every sys.path.insert(0, ".../SC") shim; package imports replace them. - Excluded artifacts not migrated: __pycache__/, build/, Q_DiT.egg-info/, *.nsys-rep, *.png, logs_mp_sweep/, results/, and the 92MB Inception .pb checkpoint under models/evaluations/. Verified: - All rewired .py files parse (AST OK). - All 8 sc_triton public names Q-DiT imports resolve through scmp_kernels/sc/__init__.py at the pinned submodule SHA. - No remaining bare imports of sc_triton/sng/config_helpers/mp_config. - No remaining sys.path.insert SC shims. Not verified (no GPU on dev box): - Actual kernel execution. Needs `pytest tests/` on a CUDA + Triton box. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…patcher scmp_kernels: - submodule bumped to fa6b5cd which removes the flat-API duplicates (sc_matmul_per_tensor, sc_matmul_mlp, sc_matmul_grouped, sc_matmul_enable_triton, sc_matmul_enable_triton_mlp, sc_matmul_grouped_enable_triton, sc_matmul_enable_batched_bipolar) and extends sc_matmul with group_a / group_b / rng_levels kwargs. qdit/sc_integration/sc_attention.py: - Collapses four _get_*_fn dispatchers (sc_matmul, mlp, av_grouped, batched_bipolar) into one _get_matmul_fn that returns either the real Triton sc_matmul or the noisy surrogate. - All call sites updated to pass granularity= (per_tensor, per_row, or per_head) instead of relying on which specialised function was returned. ~140 lines of branching/positional-arg plumbing removed. - Per-tensor max/min positional args are dropped — sc_matmul computes those internally. qdit/sc_integration/sc_mlp.py: - Same collapse: single _get_matmul_fn; all MLP linear paths now call sc_matmul(..., granularity="per_row", chunk_d=..., group_a=..., group_b=..., rng_levels=...). qdit/sc_integration/noise_matmul.py: - Four signature-specific adapters (noisy_sc_matmul, noisy_sc_matmul_mlp, noisy_sc_matmul_grouped, noisy_sc_matmul_enable_batched_bipolar) collapsed into one noisy_sc_matmul whose signature mirrors scmp_kernels.sc.sc_matmul. Per-row scaling is derived from (granularity, group_a, group_b). scripts/{debug_fixed_level_sanity, owen_mode_sweep, sobol_scramble_seed_sweep, sobol_variant_sweep, calibrate_mp_thresholds}.py tests/test_noise_matmul_adapters.py: - All flat-API call sites rewritten to sc_matmul(..., granularity=...). - Pre-computed q_maxs / q_mins arguments dropped where sc_matmul computes them internally. - Test names follow the new API (test_sc_matmul_per_tensor_vs_noisy etc.). Verified: every modified file parses (AST); no remaining import or call-site references to the deleted flat-API names anywhere outside the rewritten docstrings. Not verified (no GPU on dev box): kernel execution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Team policy: scmp_kernels intentionally dropped the packed-XNOR / packed-AND algorithm in favor of the enable-signal table-lookup path. There is no longer an "off" mode for SC matmul — every SC call goes through the enable-signal kernels. The sc_enable flag's only remaining behavior was to gate a (now dead) algorithm switch, so it's removed. qdit/sc_integration/sc_controller.py - Drop the ``sc_enable`` constructor arg and the ``self.sc_enable`` attribute. Drop the repr field. qdit/sc_integration/sc_attention.py - 8 ``if self.sc_controller.sc_enable`` gates removed: • 4 inside ``_sc_linear_dynamic_mp`` / ``_sc_linear_combined_mp`` / ``_sc_linear`` — was conditionally adding ``rng_levels`` to kwargs; now passed unconditionally (``_rng_levels`` returns None unless fixed-level mode is set, same behavior). • 2 ``sc_enable and sc_mode == "bipolar"`` → just ``sc_mode == "bipolar"`` (per-head bipolar fast path always picks the batched kernel when mode allows). - ``_rng_levels`` body collapsed (both branches returned None already). - Also catches up these three legacy methods that the earlier rewrite missed: dropped positional ``x.max().item(), x.min().item(), …`` arguments and switched to ``granularity=`` kwargs so calls go through the unified ``sc_matmul`` dispatcher (would have failed at runtime). qdit/sc_integration/sc_mlp.py - ``_rng_levels`` body collapsed; no other sc_enable usage. qdit/sc_integration/sc_modelutils.py - Drop ``sc_enable=getattr(args, 'sc_enable', False)`` from the SCController constructor call. scripts/quant_sc_main.py - Drop ``--sc_enable`` CLI flag and its mentions in the run-name builder + logging. scripts/calibrate_mp_thresholds.py - ``_resolve_level_rng_levels`` body collapsed (was returning None in both branches). 15 shell scripts (batch_mp_sweep, calib_*, run_*gpu*, bench_*, owen_*, unit_test, etc.) - Drop ``--sc_enable`` argument from every script. Verified: zero remaining references to ``sc_enable`` anywhere in scmp_diffusion (outside the submodule). All edited Python files parse. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

scmp_llm/SC/ legacy/bench files (3a/3b/3c/3g): dropped — not migrated. These include sc.py and sc_enable.py (NumPy/PyTorch-CPU SC reference impls, superseded by Triton kernels), the bench_* / compare_* / test_kernel_opt.py comparison scripts (most tied to the deprecated packed-XNOR/AND path resolved in Gap 1), dse.py (kernel DSE), and the matmul_sc_triton / test_all_configs / benchmark_comparison test helpers that lived inside sc_triton.py (also packed-XNOR-bound). The following ARE application-side and migrate cleanly: tools/ calibrate_noise_model.py ← scmp_llm/SC/noise_model_calibration.py Calibrates the closed-form noise surrogate that qdit/sc_integration/noise_matmul.py consumes. evaluation/ kid.py ← scmp_llm/evaluation/kid.py build_full_mosaic.py ← scmp_llm/evaluation/build_full_mosaic.py build_sample_grids.py ← scmp_llm/evaluation/build_sample_grids.py compare_images.py ← scmp_llm/evaluation/compare_images.py FID/KID + sample-grid + side-by-side comparison helpers used in result reporting. evaluation/imagenet_ref/ extract.py ← scmp_llm/imagenet256_ref/extract.py parallel_npz.py ← scmp_llm/imagenet256_ref/parallel_npz.py compute_fid_kid.py ← scmp_llm/imagenet256_ref/compute_fid_kid.py compare_grid.py ← scmp_llm/imagenet256_ref/compare_grid.py run_openai_eval.sh ← scmp_llm/imagenet256_ref/run_openai_eval.sh run_openai_eval_v2.sh ← scmp_llm/imagenet256_ref/run_openai_eval_v2.sh ImageNet-256 reference-batch prep (images + FID statistics). Binary artifacts (1.9 GB VIRTUAL_imagenet256_labeled.npz, 1.2 GB images/) deliberately excluded — added to .gitignore. No SC/MP bare imports in any migrated file (all clean stdlib + torch + numpy + PIL + cleanfid). AST parses for every .py. After this commit, every Python file that lived in scmp_llm is either - migrated to scmp_kernels (the 6 active kernel files), or - migrated to scmp_diffusion (Q-DiT + evaluation + tools + ImageNet ref), or - intentionally dropped per team policy (CPU refs, legacy benchmarks, packed-XNOR/AND-bound tests, DSE). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Bootstraps the new scmp_diffusion repository by migrating Q-DiT/Diffusion application code from the private scmp_llm staging repo. All SC/MP-related imports are rewired to consume the new scmp_kernels package (added as a git submodule). Files are migrated largely verbatim from their original locations; the only non-trivial code changes happen in the SC integration layer (dispatcher consolidation, removal of sc_enable=False legacy path) which were already covered in earlier scmp_llm review cycles.

Changes:

Adds scmp_kernels as a git submodule and rewires all SC imports to the new scmp_kernels.sc / scmp_kernels.mp public surface (with qdit/sc_integration/mp_config.py as a thin re-export shim).
Imports Q-DiT application code (diffusion/, models/, qdit/, scripts/, tests/, utils/) along with calibration tooling and evaluation scripts (evaluation/, tools/).
Drops the sc_enable=False legacy code path and the dead sc_matmul_qk integration.

Reviewed changes

Copilot reviewed 101 out of 107 changed files in this pull request and generated no comments.

Show a summary per file

The PR touches ~100 files across many directories; below is a grouped summary rather than per-file since the migration is mostly verbatim copies.

File / Group	Description
`.gitmodules`, `.gitignore`	Adds `scmp_kernels` submodule (pinned to fork) and ignore rules for large artifacts.
`setup.py`, `requirements.txt`, `qdit_*` env files	Packaging metadata and pinned conda/pip environment snapshots.
`diffusion/`, `models/`	Verbatim DiT + gaussian-diffusion code from upstream.
`qdit/`, `qdit/sc_integration/`	Quantization + SC integration; `mp_config.py` is a re-export shim, `__init__.py` updated to expose new API.
`scripts/` (60+ files)	Calibration, sweep, SLURM, and diagnostic scripts; most are operator-facing shell/python wrappers with hard-coded scratch paths.
`scripts/eval/`	Clean-fid / OpenAI evaluator wrappers and `pngs_to_npz` packer.
`evaluation/`, `evaluation/imagenet_ref/`	FID/KID/mosaic tools and ImageNet-256 reference batch helpers.
`tests/test_noise_matmul_adapters.py`	Single unit test comparing the noisy surrogate against the real SC kernels through `sc_matmul`.
`utils/`, `tools/`	Logger / DiT-checkpoint download helpers; noise-model calibration entry point (not shown in diff).

A few minor observations (not worth blocking on, since the script files are operator-side tooling with environment-specific paths baked in throughout):

scripts/test.sh line 5 still points to /home/kangqi/scmp_llm/results/... and ./evaluations/evaluate.sh (the directory is models/evaluations/ in this repo), so it will not run as-is. Same story for many sbatch_*.sb files referencing /scratch/.../scmp_llm/....
scripts/unit_test.sh runs quant_sc_main.py rather than the tests/ pytest suite — the name doesn't match its content.
scripts/quant_main.sh invokes python -u quant_main.py (no scripts/ prefix), which won't resolve from the repo root.
models/evaluations/requirements.txt pins tensorflow-gpu>=2.0, which is no longer published on PyPI as of TF 2.12+; the canonical install is tensorflow[and-cuda] (as the new scripts/eval/README.md correctly documents).

Given these are bootstrap-time hard-coded paths inherited verbatim from scmp_llm and the PR description explicitly calls out that none of this has been GPU-verified end-to-end, I'm not filing per-line comments — they are out of scope for an initial migration PR whose goal is to land the code rather than make every operator script portable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

heroarmor and others added 5 commits May 11, 2026 22:02

heroarmor mentioned this pull request May 12, 2026

Bump scmp_kernels to consolidated-triton-kernels tip #2

Merged

Allenjin123 requested review from Allenjin123 and Copilot May 13, 2026 06:18

Copilot started reviewing on behalf of Allenjin123 May 13, 2026 06:18 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Allenjin123 approved these changes May 13, 2026

View reviewed changes

Allenjin123 merged commit 5cf24ca into CrucibleComputingGroup:main May 13, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bootstrap scmp_diffusion from scmp_llm#1

Bootstrap scmp_diffusion from scmp_llm#1
Allenjin123 merged 5 commits into
CrucibleComputingGroup:mainfrom
heroarmor:bootstrap-from-scmp_llm

heroarmor commented May 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

heroarmor commented May 12, 2026

Architecture

Commits (5)

Net effect — every `scmp_llm` Python file is accounted for

Verified

Not verified (no GPU on dev box)

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants