Skip to content

Bootstrap scmp_diffusion from scmp_llm#1

Merged
Allenjin123 merged 5 commits into
CrucibleComputingGroup:mainfrom
heroarmor:bootstrap-from-scmp_llm
May 13, 2026
Merged

Bootstrap scmp_diffusion from scmp_llm#1
Allenjin123 merged 5 commits into
CrucibleComputingGroup:mainfrom
heroarmor:bootstrap-from-scmp_llm

Conversation

@heroarmor
Copy link
Copy Markdown
Contributor

Initial bootstrap of `scmp_diffusion` from the Diffusion / Q-DiT / evaluation code that lived in the private `scmp_llm` staging repo. Consumes `scmp_kernels` PR #2 via git submodule.

Architecture

```
scmp_diffusion/ ← this PR
├── scmp_kernels/ ← git submodule → CrucibleComputingGroup/scmp_kernels
├── diffusion/ gaussian_diffusion + sampler (5 files, verbatim)
├── models/ DiT models + Inception evaluator wrapper (8 files, verbatim)
├── qdit/ Q-DiT quantisation + sc_integration (15 files, imports rewired)
├── scripts/ calibration / sweep / SLURM launchers (60 files)
├── tests/ unit tests (1 file)
├── evaluation/ FID / KID / sample-grid mosaics, ImageNet-256 ref-batch prep (10 files)
├── tools/ noise-model calibration utility (1 file)
└── utils/ download + logging (3 files)
```

`scmp_kernels` is a submodule pinned to its `add-mp-module` branch tip at SHA `fa6b5cd`. Once PR #2 lands on `CrucibleComputingGroup/scmp_kernels:main`, the submodule pin and URL will be flipped to org/main in a follow-up commit.

Commits (5)

1. `chore: add scmp_kernels submodule + .gitignore` — bootstraps the submodule at `./scmp_kernels`, URL pointing at `heroarmor/scmp_kernels.git` (the fork with PR #2 in flight). Standard repo-hygiene .gitignore.

2. `import: Q-DiT application code with imports rewired to scmp_kernels` (93 files, +16,750/−1) — brings in `qdit/`, `diffusion/`, `models/`, `scripts/`, `tests/`, `utils/` from `scmp_llm/Q-DiT/`. All SC/MP-related bare imports rewired:

  • `from sc_triton import ...` → `from scmp_kernels.sc import sc_matmul, ...`
  • `from sng/config_helpers/mp_config import ...` → `from scmp_kernels.{sc.sng, sc.config_helpers, mp} import ...`
  • Removed every `sys.path.insert(..., 'SC')` shim.
  • Deleted dead `qdit/sc_integration/sc_matmul.py` (vestigial — `sc_matmul_qk` had zero callers).
  • `qdit/sc_integration/mp_config.py` is now a thin re-export shim of `scmp_kernels.mp` (local relative imports inside the package keep resolving).
  • Excluded artifacts: `pycache/`, `build/`, `Q_DiT.egg-info/`, `.nsys-rep`, `.png`, `logs_mp_sweep/`, `results/`, 92 MB Inception `.pb` checkpoint.

3. `refactor: route all SC matmul calls through the unified sc_matmul dispatcher` (10 files, +206/−450) — `sc_attention.py` (4 `get*_fn` dispatchers collapsed to one), `sc_mlp.py`, `noise_matmul.py` (4 adapter functions collapsed to one matching `sc_matmul`), 4 scripts + 1 test. Every call site now uses `sc_matmul(granularity=...)`. 53 distinct call sites verified to cover all 3 granularities + grouped + chunked + per-head variants.

4. `remove sc_enable=False legacy path — enable-signal is canonical` (21 files, +46/−91) — team policy: `sc_enable=True` is the only supported mode (`scmp_kernels` intentionally dropped packed-XNOR/AND in favour of enable-signal table-lookup). This commit:

  • Drops the `sc_enable` ctor arg + attribute from `SCController`.
  • Removes 8 `if self.sc_controller.sc_enable` gates in `sc_attention.py` and 1 in `sc_mlp.py`.
  • Drops `--sc_enable` from CLI (`quant_sc_main.py`) and from 15 shell-script launchers.
  • Drops `sc_enable=...` from `SCController` construction in `sc_modelutils.py`.
  • Catches up 3 legacy methods in `sc_attention.py` that the earlier rewrite had missed (still had positional `max/min` args — would have crashed at runtime).

5. `migrate Gap 3 application-side aux code from scmp_llm` (11 files, +931/−1) — application-side tooling that lived in `scmp_llm/SC/`, `scmp_llm/evaluation/`, `scmp_llm/imagenet256_ref/`:

  • `tools/calibrate_noise_model.py` ← `SC/noise_model_calibration.py` (calibrates the surrogate consumed by `qdit/sc_integration/noise_matmul.py`).
  • `evaluation/{kid, build_full_mosaic, build_sample_grids, compare_images}.py` ← `evaluation/*.py`.
  • `evaluation/imagenet_ref/{extract, parallel_npz, compute_fid_kid, compare_grid}.py + run_openai_eval{,_v2}.sh` ← `imagenet256_ref/*`.
  • `.gitignore` extended for the 1.9 GB `VIRTUAL_imagenet256_labeled.npz` and 1.2 GB ImageNet `images/` artifacts that travel out-of-band.

Legacy code from `scmp_llm/SC/` (CPU reference `sc.py`/`sc_enable.py`, bench/compare scripts, DSE, in-file test helpers tied to packed-XNOR) is intentionally not migrated, per Allenjin123's "no legacy in scmp_kernels" rule extended to the app side.

Net effect — every `scmp_llm` Python file is accounted for

Source in `scmp_llm` New home Status
`SC/sc_triton.py` `scmp_kernels/sc/kernels.py` + `matmul.py` (PR #2) migrated
`SC/{sng,rng,lfsr_taps,config_helpers}.py` `scmp_kernels/sc/*` (PR #2) migrated
`SC/mp_config.py` `scmp_kernels/mp/config.py` (PR #2) migrated
`SC/{sc,sc_enable,bench_,compare_,test_kernel_opt,dse}.py` dropped (legacy)
`SC/noise_model_calibration.py` `tools/calibrate_noise_model.py` migrated (this PR)
`Q-DiT/{diffusion,models,qdit,scripts,tests,utils}/` `{same}/` migrated (this PR)
`evaluation/` `evaluation/` migrated (this PR)
`imagenet256_ref/` `evaluation/imagenet_ref/` migrated (this PR)

Verified

  • AST parses for every `.py` outside the submodule.
  • Zero references to `sc_enable`, `sc_triton`, `bin_to_stoc_packed`, `xnor_matmul`, or any of the deprecated flat-API names anywhere in the app code.
  • 53 `sc_matmul(granularity=...)` call sites exercise all granularities + group_a/group_b + chunk_d + rng_levels paths.

Not verified (no GPU on dev box)

  • Actual Triton kernel execution end-to-end.
  • Numerical reproducibility of pre-migration FID / KID results (the algorithm shift from packed-XNOR → enable-signal table-lookup means numbers won't bit-exactly match historical `scmp_llm` results, but the method is correct and the team-canonical algorithm).

Test plan

  • On a CUDA + Triton + DiT-checkpoint box: `cd scmp_kernels && pytest tests/test_sc_smoke.py -v` (submodule smoke).
  • `PYTHONPATH=. python -c "import qdit.sc_integration.sc_attention"` — confirm Q-DiT import chain resolves.
  • Run one quant_sc_main pass end-to-end on a small batch to verify the dispatcher reaches the kernels correctly.
  • Re-run one calibration sweep + one FID eval to establish new baseline numbers under the unified API.

🤖 Generated with Claude Code

heroarmor and others added 5 commits May 11, 2026 22:02
Adds CrucibleComputingGroup/scmp_kernels as a submodule at ./scmp_kernels,
pinned to heroarmor:add-mp-module branch tip (commit fbd7009). Will be
re-pinned to org main once PR #2 lands.

URL: https://github.com/heroarmor/scmp_kernels.git
Path: ./scmp_kernels
Initial pin: fbd7009 (add-mp-module branch — MP module + clipping removal
+ flat-API public surface for Q-DiT compatibility)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings the Diffusion application source code from scmp_llm/Q-DiT into
this repo (93 files: qdit/, diffusion/, models/, scripts/, tests/,
utils/, env files). All SC/MP-related bare imports are rewired to use
the scmp_kernels submodule:

  from sc_triton import sc_matmul, sc_matmul_grouped, ...
    → from scmp_kernels.sc import sc_matmul_per_tensor as sc_matmul,
                                  sc_matmul_grouped, ...
  from sng import RNGPool, SNGBank          → (deleted — only used by
                                                qdit/sc_integration/sc_matmul.py,
                                                which is itself deleted)
  from config_helpers import ...            → from scmp_kernels.sc.config_helpers import ...
  from mp_config import (...)               → from scmp_kernels.mp import (...)

Other changes:
  - Deleted qdit/sc_integration/sc_matmul.py (vestigial — sc_matmul_qk
    had zero callers repo-wide; only path that needed xnor_matmul /
    bin_to_stoc_packed).
  - Removed corresponding line and __all__ entry in
    qdit/sc_integration/__init__.py.
  - qdit/sc_integration/mp_config.py is now a thin re-export shim of
    scmp_kernels.mp, so local relative imports (from .mp_config import …)
    in sc_attention/sc_mlp/sc_controller still resolve.
  - Removed every sys.path.insert(0, ".../SC") shim; package imports
    replace them.
  - Excluded artifacts not migrated: __pycache__/, build/, Q_DiT.egg-info/,
    *.nsys-rep, *.png, logs_mp_sweep/, results/, and the 92MB Inception
    .pb checkpoint under models/evaluations/.

Verified:
  - All rewired .py files parse (AST OK).
  - All 8 sc_triton public names Q-DiT imports resolve through
    scmp_kernels/sc/__init__.py at the pinned submodule SHA.
  - No remaining bare imports of sc_triton/sng/config_helpers/mp_config.
  - No remaining sys.path.insert SC shims.

Not verified (no GPU on dev box):
  - Actual kernel execution. Needs `pytest tests/` on a CUDA + Triton box.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…patcher

scmp_kernels:
  - submodule bumped to fa6b5cd which removes the flat-API duplicates
    (sc_matmul_per_tensor, sc_matmul_mlp, sc_matmul_grouped,
    sc_matmul_enable_triton, sc_matmul_enable_triton_mlp,
    sc_matmul_grouped_enable_triton, sc_matmul_enable_batched_bipolar)
    and extends sc_matmul with group_a / group_b / rng_levels kwargs.

qdit/sc_integration/sc_attention.py:
  - Collapses four _get_*_fn dispatchers (sc_matmul, mlp, av_grouped,
    batched_bipolar) into one _get_matmul_fn that returns either the
    real Triton sc_matmul or the noisy surrogate.
  - All call sites updated to pass granularity= (per_tensor, per_row,
    or per_head) instead of relying on which specialised function was
    returned. ~140 lines of branching/positional-arg plumbing removed.
  - Per-tensor max/min positional args are dropped — sc_matmul computes
    those internally.

qdit/sc_integration/sc_mlp.py:
  - Same collapse: single _get_matmul_fn; all MLP linear paths now call
    sc_matmul(..., granularity="per_row", chunk_d=..., group_a=...,
    group_b=..., rng_levels=...).

qdit/sc_integration/noise_matmul.py:
  - Four signature-specific adapters (noisy_sc_matmul,
    noisy_sc_matmul_mlp, noisy_sc_matmul_grouped,
    noisy_sc_matmul_enable_batched_bipolar) collapsed into one
    noisy_sc_matmul whose signature mirrors scmp_kernels.sc.sc_matmul.
    Per-row scaling is derived from (granularity, group_a, group_b).

scripts/{debug_fixed_level_sanity, owen_mode_sweep, sobol_scramble_seed_sweep,
         sobol_variant_sweep, calibrate_mp_thresholds}.py
tests/test_noise_matmul_adapters.py:
  - All flat-API call sites rewritten to sc_matmul(..., granularity=...).
  - Pre-computed q_maxs / q_mins arguments dropped where sc_matmul
    computes them internally.
  - Test names follow the new API (test_sc_matmul_per_tensor_vs_noisy etc.).

Verified: every modified file parses (AST); no remaining import or
call-site references to the deleted flat-API names anywhere outside
the rewritten docstrings.

Not verified (no GPU on dev box): kernel execution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Team policy: scmp_kernels intentionally dropped the packed-XNOR / packed-AND
algorithm in favor of the enable-signal table-lookup path. There is no
longer an "off" mode for SC matmul — every SC call goes through the
enable-signal kernels. The sc_enable flag's only remaining behavior was
to gate a (now dead) algorithm switch, so it's removed.

qdit/sc_integration/sc_controller.py
  - Drop the ``sc_enable`` constructor arg and the ``self.sc_enable``
    attribute. Drop the repr field.

qdit/sc_integration/sc_attention.py
  - 8 ``if self.sc_controller.sc_enable`` gates removed:
      • 4 inside ``_sc_linear_dynamic_mp`` / ``_sc_linear_combined_mp`` /
        ``_sc_linear`` — was conditionally adding ``rng_levels`` to kwargs;
        now passed unconditionally (``_rng_levels`` returns None unless
        fixed-level mode is set, same behavior).
      • 2 ``sc_enable and sc_mode == "bipolar"`` → just
        ``sc_mode == "bipolar"`` (per-head bipolar fast path always picks
        the batched kernel when mode allows).
  - ``_rng_levels`` body collapsed (both branches returned None already).
  - Also catches up these three legacy methods that the earlier rewrite
    missed: dropped positional ``x.max().item(), x.min().item(), …``
    arguments and switched to ``granularity=`` kwargs so calls go through
    the unified ``sc_matmul`` dispatcher (would have failed at runtime).

qdit/sc_integration/sc_mlp.py
  - ``_rng_levels`` body collapsed; no other sc_enable usage.

qdit/sc_integration/sc_modelutils.py
  - Drop ``sc_enable=getattr(args, 'sc_enable', False)`` from the
    SCController constructor call.

scripts/quant_sc_main.py
  - Drop ``--sc_enable`` CLI flag and its mentions in the run-name
    builder + logging.

scripts/calibrate_mp_thresholds.py
  - ``_resolve_level_rng_levels`` body collapsed (was returning None
    in both branches).

15 shell scripts (batch_mp_sweep, calib_*, run_*gpu*, bench_*, owen_*,
unit_test, etc.)
  - Drop ``--sc_enable`` argument from every script.

Verified: zero remaining references to ``sc_enable`` anywhere in
scmp_diffusion (outside the submodule). All edited Python files parse.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scmp_llm/SC/ legacy/bench files (3a/3b/3c/3g): dropped — not migrated.
These include sc.py and sc_enable.py (NumPy/PyTorch-CPU SC reference
impls, superseded by Triton kernels), the bench_* / compare_* /
test_kernel_opt.py comparison scripts (most tied to the deprecated
packed-XNOR/AND path resolved in Gap 1), dse.py (kernel DSE), and the
matmul_sc_triton / test_all_configs / benchmark_comparison test
helpers that lived inside sc_triton.py (also packed-XNOR-bound).

The following ARE application-side and migrate cleanly:

tools/
  calibrate_noise_model.py            ← scmp_llm/SC/noise_model_calibration.py
                                        Calibrates the closed-form noise surrogate that
                                        qdit/sc_integration/noise_matmul.py consumes.

evaluation/
  kid.py                              ← scmp_llm/evaluation/kid.py
  build_full_mosaic.py                ← scmp_llm/evaluation/build_full_mosaic.py
  build_sample_grids.py               ← scmp_llm/evaluation/build_sample_grids.py
  compare_images.py                   ← scmp_llm/evaluation/compare_images.py
                                        FID/KID + sample-grid + side-by-side comparison
                                        helpers used in result reporting.

evaluation/imagenet_ref/
  extract.py                          ← scmp_llm/imagenet256_ref/extract.py
  parallel_npz.py                     ← scmp_llm/imagenet256_ref/parallel_npz.py
  compute_fid_kid.py                  ← scmp_llm/imagenet256_ref/compute_fid_kid.py
  compare_grid.py                     ← scmp_llm/imagenet256_ref/compare_grid.py
  run_openai_eval.sh                  ← scmp_llm/imagenet256_ref/run_openai_eval.sh
  run_openai_eval_v2.sh               ← scmp_llm/imagenet256_ref/run_openai_eval_v2.sh
                                        ImageNet-256 reference-batch prep (images +
                                        FID statistics). Binary artifacts (1.9 GB
                                        VIRTUAL_imagenet256_labeled.npz, 1.2 GB images/)
                                        deliberately excluded — added to .gitignore.

No SC/MP bare imports in any migrated file (all clean stdlib + torch +
numpy + PIL + cleanfid). AST parses for every .py.

After this commit, every Python file that lived in scmp_llm is either
- migrated to scmp_kernels (the 6 active kernel files), or
- migrated to scmp_diffusion (Q-DiT + evaluation + tools + ImageNet ref), or
- intentionally dropped per team policy (CPU refs, legacy benchmarks,
  packed-XNOR/AND-bound tests, DSE).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Bootstraps the new scmp_diffusion repository by migrating Q-DiT/Diffusion application code from the private scmp_llm staging repo. All SC/MP-related imports are rewired to consume the new scmp_kernels package (added as a git submodule). Files are migrated largely verbatim from their original locations; the only non-trivial code changes happen in the SC integration layer (dispatcher consolidation, removal of sc_enable=False legacy path) which were already covered in earlier scmp_llm review cycles.

Changes:

  • Adds scmp_kernels as a git submodule and rewires all SC imports to the new scmp_kernels.sc / scmp_kernels.mp public surface (with qdit/sc_integration/mp_config.py as a thin re-export shim).
  • Imports Q-DiT application code (diffusion/, models/, qdit/, scripts/, tests/, utils/) along with calibration tooling and evaluation scripts (evaluation/, tools/).
  • Drops the sc_enable=False legacy code path and the dead sc_matmul_qk integration.

Reviewed changes

Copilot reviewed 101 out of 107 changed files in this pull request and generated no comments.

Show a summary per file

The PR touches ~100 files across many directories; below is a grouped summary rather than per-file since the migration is mostly verbatim copies.

File / Group Description
.gitmodules, .gitignore Adds scmp_kernels submodule (pinned to fork) and ignore rules for large artifacts.
setup.py, requirements.txt, qdit_* env files Packaging metadata and pinned conda/pip environment snapshots.
diffusion/, models/ Verbatim DiT + gaussian-diffusion code from upstream.
qdit/, qdit/sc_integration/ Quantization + SC integration; mp_config.py is a re-export shim, __init__.py updated to expose new API.
scripts/ (60+ files) Calibration, sweep, SLURM, and diagnostic scripts; most are operator-facing shell/python wrappers with hard-coded scratch paths.
scripts/eval/ Clean-fid / OpenAI evaluator wrappers and pngs_to_npz packer.
evaluation/, evaluation/imagenet_ref/ FID/KID/mosaic tools and ImageNet-256 reference batch helpers.
tests/test_noise_matmul_adapters.py Single unit test comparing the noisy surrogate against the real SC kernels through sc_matmul.
utils/, tools/ Logger / DiT-checkpoint download helpers; noise-model calibration entry point (not shown in diff).

A few minor observations (not worth blocking on, since the script files are operator-side tooling with environment-specific paths baked in throughout):

  • scripts/test.sh line 5 still points to /home/kangqi/scmp_llm/results/... and ./evaluations/evaluate.sh (the directory is models/evaluations/ in this repo), so it will not run as-is. Same story for many sbatch_*.sb files referencing /scratch/.../scmp_llm/....
  • scripts/unit_test.sh runs quant_sc_main.py rather than the tests/ pytest suite — the name doesn't match its content.
  • scripts/quant_main.sh invokes python -u quant_main.py (no scripts/ prefix), which won't resolve from the repo root.
  • models/evaluations/requirements.txt pins tensorflow-gpu>=2.0, which is no longer published on PyPI as of TF 2.12+; the canonical install is tensorflow[and-cuda] (as the new scripts/eval/README.md correctly documents).

Given these are bootstrap-time hard-coded paths inherited verbatim from scmp_llm and the PR description explicitly calls out that none of this has been GPU-verified end-to-end, I'm not filing per-line comments — they are out of scope for an initial migration PR whose goal is to land the code rather than make every operator script portable.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Allenjin123 Allenjin123 merged commit 5cf24ca into CrucibleComputingGroup:main May 13, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants