Add halve_bipolar_stoc_len flag + repo housekeeping by Allenjin123 · Pull Request #12 · CrucibleComputingGroup/scmp_kernels

Allenjin123 · 2026-05-20T05:36:52Z

Summary

sc/matmul: add opt-in halve_bipolar_stoc_len kwarg to sc_matmul. When True and mode="bipolar", halves any unset stoc_len / rng_levels to 2**(sc_prec-1), matching the uSystolic / HUB sign-magnitude cycle-halving optimization (wu-hpca2022). Default False preserves all existing behavior.
repo: add a standard Python .gitignore and git rm --cached 10 stale .pyc files that had been committed under scmp_kernels/__pycache__/.

Motivation

Bipolar mode is already sign-magnitude with q_max = 2**(sc_prec-1) - 1, so the magnitude only carries sc_prec-1 bits. The default stoc_len = 2**sc_prec therefore runs ~2× more cycles than the magnitude grid needs. The flag exposes the paper's optimization as a single switch; legacy callers see zero behavior change.

Tradeoffs measured (per-tensor, sc_prec=8, Sobol RNG)

Path	Cycles	MSE excess vs default	Wallclock
`halve=False` (default)	256	1.00×	1.00×
`halve=True`	128	~3.5×	~0.5–0.6× at 1024×2048 scale

The 2× speedup materializes at large problem sizes where the SC inner loop dominates kernel time. The MSE inflation is a real tradeoff (per-D-shared Sobol's variance scales worse than textbook MC); it's the cost the user opts into by setting the flag.

What it does NOT change

mode="unipolar": flag is a no-op.
Any call with explicit stoc_len or rng_levels: those user values are preserved.
Kernel internals (sc/kernels.py, sc/sng.py): unchanged.

Forwarding correctness

The flag's rng_levels override is forwarded through all five dispatch entry points: per_tensor, per_row, per_row+chunk_d, per_row+3D, per_head. (The per_tensor forwarding fixed a latent dispatch bug where the caller dropped rng_levels — pre-flag this was harmless because the kwarg was always None, but it would have caused per_tensor users to silently get the wrong halved-cycle behavior.)

Test plan

Existing tests/ suite unaffected (flag default-off; legacy path byte-identical).
Manual: halve=False reproduces baseline MSE; halve=True halves stoc_len and gives ~2× wallclock speedup at 1024×2048.
Manual: per_tensor halved produces M2-shaped excess ratio (~3.5×) — confirms rng_levels reaches the kernel.
Reviewer: confirm the API addition is consistent with the project's kwarg style (single kebab-less boolean, default False).

Add a standard Python .gitignore (__pycache__, *.pyc, egg-info, build/dist, common editor/OS files) and git rm --cached the .pyc files that were previously committed under scmp_kernels/.

…halving When True and mode='bipolar', overrides any unset stoc_len / rng_levels to 2^(sc_prec-1). Bipolar magnitudes only carry sc_prec-1 bits of info (q_max = 2^(sc_prec-1) - 1), so halving the stream and RNG grid preserves the quantization resolution while approximately halving cycles, matching the wu-hpca2022 sign-magnitude optimization. Default False preserves legacy behavior. Forwards rng_levels through all five dispatch paths (per_tensor, per_row, per_row+chunk_d, per_row+3D, per_head).

Copilot

Pull request overview

Adds an opt-in API flag to reduce stochastic stream length/RNG grid in bipolar mode (cycle-halving optimization) while preserving existing default behavior, plus minor repository housekeeping.

Changes:

Added halve_bipolar_stoc_len: bool = False to sc_matmul; when enabled in mode="bipolar", it overrides unset stoc_len/rng_levels to 2 ** (sc_prec - 1).
Fixed per_tensor dispatch to forward rng_levels into _sc_matmul_per_tensor.
Added a standard Python .gitignore to prevent committing bytecode/caches and common build/editor artifacts.

Reviewed changes

Copilot reviewed 1 out of 12 changed files in this pull request and generated 1 comment.

File	Description
`scmp_kernels/sc/matmul.py`	Adds the cycle-halving flag and forwards `rng_levels` through the per-tensor dispatch path.
`.gitignore`	Ignores Python cache/bytecode and common local build/editor/OS files.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    if halve_bipolar_stoc_len and mode == "bipolar":
+        halved = 2 ** (sc_prec - 1)
+        if stoc_len is None:
+            stoc_len = halved
+        if rng_levels is None:


heroarmor

LGTM — approving.

Verified statically (no torch/CUDA in my env, so I did not re-run the GPU suite — relying on static review + your stated manual results):

Override correctly gated on halve_bipolar_stoc_len and mode == "bipolar", only touches unset (None) values, preserves explicit user stoc_len/rng_levels, no-op for unipolar. Default False keeps the legacy path byte-identical.
rng_levels is forwarded through all five dispatch entry points (per_tensor / per_row+chunk_d / per_row 3D batched / per_row / per_head).
Confirmed the latent per_tensor bug: _sc_matmul_per_tensor already accepts rng_levels but the caller was dropping it — harmless before the flag, would have silently broken halving for per_tensor users. Good catch.
Math checks out: bipolar q_max = 2^(sc_prec-1) - 1 ⇒ magnitude spans 2^(sc_prec-1) levels, so the halved grid loses no resolution.
.gitignore + removing the committed .pyc files is welcome housekeeping.

The ~3.5× MSE inflation is a clearly-documented, opt-in tradeoff behind a default-off flag, so no concern there. Nice clean change.

Allenjin123 added 2 commits May 20, 2026 01:34

repo: add .gitignore and untrack stale __pycache__ artifacts

7a0012a

Add a standard Python .gitignore (__pycache__, *.pyc, egg-info, build/dist, common editor/OS files) and git rm --cached the .pyc files that were previously committed under scmp_kernels/.

Allenjin123 requested review from Copilot and heroarmor May 20, 2026 05:45

Copilot started reviewing on behalf of Allenjin123 May 20, 2026 05:45 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

Comment thread scmp_kernels/sc/matmul.py

Comment on lines +121 to +125

if halve_bipolar_stoc_len and mode == "bipolar":

halved = 2 ** (sc_prec - 1)

if stoc_len is None:

stoc_len = halved

if rng_levels is None:

heroarmor approved these changes May 20, 2026

View reviewed changes

heroarmor merged commit e950e26 into main May 20, 2026
2 checks passed

heroarmor mentioned this pull request May 20, 2026

sc/kernels: apply Owen scramble family at full stoc_len too #14

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add halve_bipolar_stoc_len flag + repo housekeeping#12

Add halve_bipolar_stoc_len flag + repo housekeeping#12
heroarmor merged 2 commits into
mainfrom
feature/halve-bipolar-stoc-len

Allenjin123 commented May 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

heroarmor left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Allenjin123 commented May 20, 2026

Summary

Motivation

Tradeoffs measured (per-tensor, sc_prec=8, Sobol RNG)

What it does NOT change

Forwarding correctness

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

heroarmor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants