Skip to content

Add halve_bipolar_stoc_len flag + repo housekeeping#12

Merged
heroarmor merged 2 commits into
mainfrom
feature/halve-bipolar-stoc-len
May 20, 2026
Merged

Add halve_bipolar_stoc_len flag + repo housekeeping#12
heroarmor merged 2 commits into
mainfrom
feature/halve-bipolar-stoc-len

Conversation

@Allenjin123
Copy link
Copy Markdown
Contributor

Summary

  • sc/matmul: add opt-in halve_bipolar_stoc_len kwarg to sc_matmul. When True and mode="bipolar", halves any unset stoc_len / rng_levels to 2**(sc_prec-1), matching the uSystolic / HUB sign-magnitude cycle-halving optimization (wu-hpca2022). Default False preserves all existing behavior.
  • repo: add a standard Python .gitignore and git rm --cached 10 stale .pyc files that had been committed under scmp_kernels/__pycache__/.

Motivation

Bipolar mode is already sign-magnitude with q_max = 2**(sc_prec-1) - 1, so the magnitude only carries sc_prec-1 bits. The default stoc_len = 2**sc_prec therefore runs ~2× more cycles than the magnitude grid needs. The flag exposes the paper's optimization as a single switch; legacy callers see zero behavior change.

Tradeoffs measured (per-tensor, sc_prec=8, Sobol RNG)

Path Cycles MSE excess vs default Wallclock
halve=False (default) 256 1.00× 1.00×
halve=True 128 ~3.5× ~0.5–0.6× at 1024×2048 scale

The 2× speedup materializes at large problem sizes where the SC inner loop dominates kernel time. The MSE inflation is a real tradeoff (per-D-shared Sobol's variance scales worse than textbook MC); it's the cost the user opts into by setting the flag.

What it does NOT change

  • mode="unipolar": flag is a no-op.
  • Any call with explicit stoc_len or rng_levels: those user values are preserved.
  • Kernel internals (sc/kernels.py, sc/sng.py): unchanged.

Forwarding correctness

The flag's rng_levels override is forwarded through all five dispatch entry points: per_tensor, per_row, per_row+chunk_d, per_row+3D, per_head. (The per_tensor forwarding fixed a latent dispatch bug where the caller dropped rng_levels — pre-flag this was harmless because the kwarg was always None, but it would have caused per_tensor users to silently get the wrong halved-cycle behavior.)

Test plan

  • Existing tests/ suite unaffected (flag default-off; legacy path byte-identical).
  • Manual: halve=False reproduces baseline MSE; halve=True halves stoc_len and gives ~2× wallclock speedup at 1024×2048.
  • Manual: per_tensor halved produces M2-shaped excess ratio (~3.5×) — confirms rng_levels reaches the kernel.
  • Reviewer: confirm the API addition is consistent with the project's kwarg style (single kebab-less boolean, default False).

Add a standard Python .gitignore (__pycache__, *.pyc, egg-info, build/dist,
common editor/OS files) and git rm --cached the .pyc files that were
previously committed under scmp_kernels/.
…halving

When True and mode='bipolar', overrides any unset stoc_len / rng_levels
to 2^(sc_prec-1). Bipolar magnitudes only carry sc_prec-1 bits of info
(q_max = 2^(sc_prec-1) - 1), so halving the stream and RNG grid preserves
the quantization resolution while approximately halving cycles, matching
the wu-hpca2022 sign-magnitude optimization.

Default False preserves legacy behavior. Forwards rng_levels through all
five dispatch paths (per_tensor, per_row, per_row+chunk_d, per_row+3D,
per_head).
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in API flag to reduce stochastic stream length/RNG grid in bipolar mode (cycle-halving optimization) while preserving existing default behavior, plus minor repository housekeeping.

Changes:

  • Added halve_bipolar_stoc_len: bool = False to sc_matmul; when enabled in mode="bipolar", it overrides unset stoc_len/rng_levels to 2 ** (sc_prec - 1).
  • Fixed per_tensor dispatch to forward rng_levels into _sc_matmul_per_tensor.
  • Added a standard Python .gitignore to prevent committing bytecode/caches and common build/editor artifacts.

Reviewed changes

Copilot reviewed 1 out of 12 changed files in this pull request and generated 1 comment.

File Description
scmp_kernels/sc/matmul.py Adds the cycle-halving flag and forwards rng_levels through the per-tensor dispatch path.
.gitignore Ignores Python cache/bytecode and common local build/editor/OS files.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scmp_kernels/sc/matmul.py
Comment on lines +121 to +125
if halve_bipolar_stoc_len and mode == "bipolar":
halved = 2 ** (sc_prec - 1)
if stoc_len is None:
stoc_len = halved
if rng_levels is None:
Copy link
Copy Markdown
Collaborator

@heroarmor heroarmor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — approving.

Verified statically (no torch/CUDA in my env, so I did not re-run the GPU suite — relying on static review + your stated manual results):

  • Override correctly gated on halve_bipolar_stoc_len and mode == "bipolar", only touches unset (None) values, preserves explicit user stoc_len/rng_levels, no-op for unipolar. Default False keeps the legacy path byte-identical.
  • rng_levels is forwarded through all five dispatch entry points (per_tensor / per_row+chunk_d / per_row 3D batched / per_row / per_head).
  • Confirmed the latent per_tensor bug: _sc_matmul_per_tensor already accepts rng_levels but the caller was dropping it — harmless before the flag, would have silently broken halving for per_tensor users. Good catch.
  • Math checks out: bipolar q_max = 2^(sc_prec-1) - 1 ⇒ magnitude spans 2^(sc_prec-1) levels, so the halved grid loses no resolution.
  • .gitignore + removing the committed .pyc files is welcome housekeeping.

The ~3.5× MSE inflation is a clearly-documented, opt-in tradeoff behind a default-off flag, so no concern there. Nice clean change.

@heroarmor heroarmor merged commit e950e26 into main May 20, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants