Bootstrap scmp_diffusion from scmp_llm#1
Conversation
Adds CrucibleComputingGroup/scmp_kernels as a submodule at ./scmp_kernels, pinned to heroarmor:add-mp-module branch tip (commit fbd7009). Will be re-pinned to org main once PR #2 lands. URL: https://github.com/heroarmor/scmp_kernels.git Path: ./scmp_kernels Initial pin: fbd7009 (add-mp-module branch — MP module + clipping removal + flat-API public surface for Q-DiT compatibility) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings the Diffusion application source code from scmp_llm/Q-DiT into
this repo (93 files: qdit/, diffusion/, models/, scripts/, tests/,
utils/, env files). All SC/MP-related bare imports are rewired to use
the scmp_kernels submodule:
from sc_triton import sc_matmul, sc_matmul_grouped, ...
→ from scmp_kernels.sc import sc_matmul_per_tensor as sc_matmul,
sc_matmul_grouped, ...
from sng import RNGPool, SNGBank → (deleted — only used by
qdit/sc_integration/sc_matmul.py,
which is itself deleted)
from config_helpers import ... → from scmp_kernels.sc.config_helpers import ...
from mp_config import (...) → from scmp_kernels.mp import (...)
Other changes:
- Deleted qdit/sc_integration/sc_matmul.py (vestigial — sc_matmul_qk
had zero callers repo-wide; only path that needed xnor_matmul /
bin_to_stoc_packed).
- Removed corresponding line and __all__ entry in
qdit/sc_integration/__init__.py.
- qdit/sc_integration/mp_config.py is now a thin re-export shim of
scmp_kernels.mp, so local relative imports (from .mp_config import …)
in sc_attention/sc_mlp/sc_controller still resolve.
- Removed every sys.path.insert(0, ".../SC") shim; package imports
replace them.
- Excluded artifacts not migrated: __pycache__/, build/, Q_DiT.egg-info/,
*.nsys-rep, *.png, logs_mp_sweep/, results/, and the 92MB Inception
.pb checkpoint under models/evaluations/.
Verified:
- All rewired .py files parse (AST OK).
- All 8 sc_triton public names Q-DiT imports resolve through
scmp_kernels/sc/__init__.py at the pinned submodule SHA.
- No remaining bare imports of sc_triton/sng/config_helpers/mp_config.
- No remaining sys.path.insert SC shims.
Not verified (no GPU on dev box):
- Actual kernel execution. Needs `pytest tests/` on a CUDA + Triton box.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…patcher
scmp_kernels:
- submodule bumped to fa6b5cd which removes the flat-API duplicates
(sc_matmul_per_tensor, sc_matmul_mlp, sc_matmul_grouped,
sc_matmul_enable_triton, sc_matmul_enable_triton_mlp,
sc_matmul_grouped_enable_triton, sc_matmul_enable_batched_bipolar)
and extends sc_matmul with group_a / group_b / rng_levels kwargs.
qdit/sc_integration/sc_attention.py:
- Collapses four _get_*_fn dispatchers (sc_matmul, mlp, av_grouped,
batched_bipolar) into one _get_matmul_fn that returns either the
real Triton sc_matmul or the noisy surrogate.
- All call sites updated to pass granularity= (per_tensor, per_row,
or per_head) instead of relying on which specialised function was
returned. ~140 lines of branching/positional-arg plumbing removed.
- Per-tensor max/min positional args are dropped — sc_matmul computes
those internally.
qdit/sc_integration/sc_mlp.py:
- Same collapse: single _get_matmul_fn; all MLP linear paths now call
sc_matmul(..., granularity="per_row", chunk_d=..., group_a=...,
group_b=..., rng_levels=...).
qdit/sc_integration/noise_matmul.py:
- Four signature-specific adapters (noisy_sc_matmul,
noisy_sc_matmul_mlp, noisy_sc_matmul_grouped,
noisy_sc_matmul_enable_batched_bipolar) collapsed into one
noisy_sc_matmul whose signature mirrors scmp_kernels.sc.sc_matmul.
Per-row scaling is derived from (granularity, group_a, group_b).
scripts/{debug_fixed_level_sanity, owen_mode_sweep, sobol_scramble_seed_sweep,
sobol_variant_sweep, calibrate_mp_thresholds}.py
tests/test_noise_matmul_adapters.py:
- All flat-API call sites rewritten to sc_matmul(..., granularity=...).
- Pre-computed q_maxs / q_mins arguments dropped where sc_matmul
computes them internally.
- Test names follow the new API (test_sc_matmul_per_tensor_vs_noisy etc.).
Verified: every modified file parses (AST); no remaining import or
call-site references to the deleted flat-API names anywhere outside
the rewritten docstrings.
Not verified (no GPU on dev box): kernel execution.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Team policy: scmp_kernels intentionally dropped the packed-XNOR / packed-AND
algorithm in favor of the enable-signal table-lookup path. There is no
longer an "off" mode for SC matmul — every SC call goes through the
enable-signal kernels. The sc_enable flag's only remaining behavior was
to gate a (now dead) algorithm switch, so it's removed.
qdit/sc_integration/sc_controller.py
- Drop the ``sc_enable`` constructor arg and the ``self.sc_enable``
attribute. Drop the repr field.
qdit/sc_integration/sc_attention.py
- 8 ``if self.sc_controller.sc_enable`` gates removed:
• 4 inside ``_sc_linear_dynamic_mp`` / ``_sc_linear_combined_mp`` /
``_sc_linear`` — was conditionally adding ``rng_levels`` to kwargs;
now passed unconditionally (``_rng_levels`` returns None unless
fixed-level mode is set, same behavior).
• 2 ``sc_enable and sc_mode == "bipolar"`` → just
``sc_mode == "bipolar"`` (per-head bipolar fast path always picks
the batched kernel when mode allows).
- ``_rng_levels`` body collapsed (both branches returned None already).
- Also catches up these three legacy methods that the earlier rewrite
missed: dropped positional ``x.max().item(), x.min().item(), …``
arguments and switched to ``granularity=`` kwargs so calls go through
the unified ``sc_matmul`` dispatcher (would have failed at runtime).
qdit/sc_integration/sc_mlp.py
- ``_rng_levels`` body collapsed; no other sc_enable usage.
qdit/sc_integration/sc_modelutils.py
- Drop ``sc_enable=getattr(args, 'sc_enable', False)`` from the
SCController constructor call.
scripts/quant_sc_main.py
- Drop ``--sc_enable`` CLI flag and its mentions in the run-name
builder + logging.
scripts/calibrate_mp_thresholds.py
- ``_resolve_level_rng_levels`` body collapsed (was returning None
in both branches).
15 shell scripts (batch_mp_sweep, calib_*, run_*gpu*, bench_*, owen_*,
unit_test, etc.)
- Drop ``--sc_enable`` argument from every script.
Verified: zero remaining references to ``sc_enable`` anywhere in
scmp_diffusion (outside the submodule). All edited Python files parse.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scmp_llm/SC/ legacy/bench files (3a/3b/3c/3g): dropped — not migrated.
These include sc.py and sc_enable.py (NumPy/PyTorch-CPU SC reference
impls, superseded by Triton kernels), the bench_* / compare_* /
test_kernel_opt.py comparison scripts (most tied to the deprecated
packed-XNOR/AND path resolved in Gap 1), dse.py (kernel DSE), and the
matmul_sc_triton / test_all_configs / benchmark_comparison test
helpers that lived inside sc_triton.py (also packed-XNOR-bound).
The following ARE application-side and migrate cleanly:
tools/
calibrate_noise_model.py ← scmp_llm/SC/noise_model_calibration.py
Calibrates the closed-form noise surrogate that
qdit/sc_integration/noise_matmul.py consumes.
evaluation/
kid.py ← scmp_llm/evaluation/kid.py
build_full_mosaic.py ← scmp_llm/evaluation/build_full_mosaic.py
build_sample_grids.py ← scmp_llm/evaluation/build_sample_grids.py
compare_images.py ← scmp_llm/evaluation/compare_images.py
FID/KID + sample-grid + side-by-side comparison
helpers used in result reporting.
evaluation/imagenet_ref/
extract.py ← scmp_llm/imagenet256_ref/extract.py
parallel_npz.py ← scmp_llm/imagenet256_ref/parallel_npz.py
compute_fid_kid.py ← scmp_llm/imagenet256_ref/compute_fid_kid.py
compare_grid.py ← scmp_llm/imagenet256_ref/compare_grid.py
run_openai_eval.sh ← scmp_llm/imagenet256_ref/run_openai_eval.sh
run_openai_eval_v2.sh ← scmp_llm/imagenet256_ref/run_openai_eval_v2.sh
ImageNet-256 reference-batch prep (images +
FID statistics). Binary artifacts (1.9 GB
VIRTUAL_imagenet256_labeled.npz, 1.2 GB images/)
deliberately excluded — added to .gitignore.
No SC/MP bare imports in any migrated file (all clean stdlib + torch +
numpy + PIL + cleanfid). AST parses for every .py.
After this commit, every Python file that lived in scmp_llm is either
- migrated to scmp_kernels (the 6 active kernel files), or
- migrated to scmp_diffusion (Q-DiT + evaluation + tools + ImageNet ref), or
- intentionally dropped per team policy (CPU refs, legacy benchmarks,
packed-XNOR/AND-bound tests, DSE).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Bootstraps the new scmp_diffusion repository by migrating Q-DiT/Diffusion application code from the private scmp_llm staging repo. All SC/MP-related imports are rewired to consume the new scmp_kernels package (added as a git submodule). Files are migrated largely verbatim from their original locations; the only non-trivial code changes happen in the SC integration layer (dispatcher consolidation, removal of sc_enable=False legacy path) which were already covered in earlier scmp_llm review cycles.
Changes:
- Adds
scmp_kernelsas a git submodule and rewires all SC imports to the newscmp_kernels.sc/scmp_kernels.mppublic surface (withqdit/sc_integration/mp_config.pyas a thin re-export shim). - Imports Q-DiT application code (
diffusion/,models/,qdit/,scripts/,tests/,utils/) along with calibration tooling and evaluation scripts (evaluation/,tools/). - Drops the
sc_enable=Falselegacy code path and the deadsc_matmul_qkintegration.
Reviewed changes
Copilot reviewed 101 out of 107 changed files in this pull request and generated no comments.
Show a summary per file
The PR touches ~100 files across many directories; below is a grouped summary rather than per-file since the migration is mostly verbatim copies.
| File / Group | Description |
|---|---|
.gitmodules, .gitignore |
Adds scmp_kernels submodule (pinned to fork) and ignore rules for large artifacts. |
setup.py, requirements.txt, qdit_* env files |
Packaging metadata and pinned conda/pip environment snapshots. |
diffusion/, models/ |
Verbatim DiT + gaussian-diffusion code from upstream. |
qdit/, qdit/sc_integration/ |
Quantization + SC integration; mp_config.py is a re-export shim, __init__.py updated to expose new API. |
scripts/ (60+ files) |
Calibration, sweep, SLURM, and diagnostic scripts; most are operator-facing shell/python wrappers with hard-coded scratch paths. |
scripts/eval/ |
Clean-fid / OpenAI evaluator wrappers and pngs_to_npz packer. |
evaluation/, evaluation/imagenet_ref/ |
FID/KID/mosaic tools and ImageNet-256 reference batch helpers. |
tests/test_noise_matmul_adapters.py |
Single unit test comparing the noisy surrogate against the real SC kernels through sc_matmul. |
utils/, tools/ |
Logger / DiT-checkpoint download helpers; noise-model calibration entry point (not shown in diff). |
A few minor observations (not worth blocking on, since the script files are operator-side tooling with environment-specific paths baked in throughout):
scripts/test.shline 5 still points to/home/kangqi/scmp_llm/results/...and./evaluations/evaluate.sh(the directory ismodels/evaluations/in this repo), so it will not run as-is. Same story for manysbatch_*.sbfiles referencing/scratch/.../scmp_llm/....scripts/unit_test.shrunsquant_sc_main.pyrather than thetests/pytest suite — the name doesn't match its content.scripts/quant_main.shinvokespython -u quant_main.py(noscripts/prefix), which won't resolve from the repo root.models/evaluations/requirements.txtpinstensorflow-gpu>=2.0, which is no longer published on PyPI as of TF 2.12+; the canonical install istensorflow[and-cuda](as the newscripts/eval/README.mdcorrectly documents).
Given these are bootstrap-time hard-coded paths inherited verbatim from scmp_llm and the PR description explicitly calls out that none of this has been GPU-verified end-to-end, I'm not filing per-line comments — they are out of scope for an initial migration PR whose goal is to land the code rather than make every operator script portable.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Initial bootstrap of `scmp_diffusion` from the Diffusion / Q-DiT / evaluation code that lived in the private `scmp_llm` staging repo. Consumes `scmp_kernels` PR #2 via git submodule.
Architecture
```
scmp_diffusion/ ← this PR
├── scmp_kernels/ ← git submodule → CrucibleComputingGroup/scmp_kernels
├── diffusion/ gaussian_diffusion + sampler (5 files, verbatim)
├── models/ DiT models + Inception evaluator wrapper (8 files, verbatim)
├── qdit/ Q-DiT quantisation + sc_integration (15 files, imports rewired)
├── scripts/ calibration / sweep / SLURM launchers (60 files)
├── tests/ unit tests (1 file)
├── evaluation/ FID / KID / sample-grid mosaics, ImageNet-256 ref-batch prep (10 files)
├── tools/ noise-model calibration utility (1 file)
└── utils/ download + logging (3 files)
```
`scmp_kernels` is a submodule pinned to its `add-mp-module` branch tip at SHA `fa6b5cd`. Once PR #2 lands on `CrucibleComputingGroup/scmp_kernels:main`, the submodule pin and URL will be flipped to org/main in a follow-up commit.
Commits (5)
1. `chore: add scmp_kernels submodule + .gitignore` — bootstraps the submodule at `./scmp_kernels`, URL pointing at `heroarmor/scmp_kernels.git` (the fork with PR #2 in flight). Standard repo-hygiene .gitignore.
2. `import: Q-DiT application code with imports rewired to scmp_kernels` (93 files, +16,750/−1) — brings in `qdit/`, `diffusion/`, `models/`, `scripts/`, `tests/`, `utils/` from `scmp_llm/Q-DiT/`. All SC/MP-related bare imports rewired:
3. `refactor: route all SC matmul calls through the unified sc_matmul dispatcher` (10 files, +206/−450) — `sc_attention.py` (4 `get*_fn` dispatchers collapsed to one), `sc_mlp.py`, `noise_matmul.py` (4 adapter functions collapsed to one matching `sc_matmul`), 4 scripts + 1 test. Every call site now uses `sc_matmul(granularity=...)`. 53 distinct call sites verified to cover all 3 granularities + grouped + chunked + per-head variants.
4. `remove sc_enable=False legacy path — enable-signal is canonical` (21 files, +46/−91) — team policy: `sc_enable=True` is the only supported mode (`scmp_kernels` intentionally dropped packed-XNOR/AND in favour of enable-signal table-lookup). This commit:
5. `migrate Gap 3 application-side aux code from scmp_llm` (11 files, +931/−1) — application-side tooling that lived in `scmp_llm/SC/`, `scmp_llm/evaluation/`, `scmp_llm/imagenet256_ref/`:
Legacy code from `scmp_llm/SC/` (CPU reference `sc.py`/`sc_enable.py`, bench/compare scripts, DSE, in-file test helpers tied to packed-XNOR) is intentionally not migrated, per Allenjin123's "no legacy in scmp_kernels" rule extended to the app side.
Net effect — every `scmp_llm` Python file is accounted for
Verified
Not verified (no GPU on dev box)
Test plan
🤖 Generated with Claude Code