Skip to content

sc/kernels: apply Owen scramble family at full stoc_len too#14

Closed
heroarmor wants to merge 1 commit into
mainfrom
feat/scramble-family-full-length
Closed

sc/kernels: apply Owen scramble family at full stoc_len too#14
heroarmor wants to merge 1 commit into
mainfrom
feat/scramble-family-full-length

Conversation

@heroarmor
Copy link
Copy Markdown
Collaborator

Summary

_prepare_rng_prefix previously skipped the per-dim Owen XOR when stoc_len == 2**sc_prec (full-length Sobol), applying the scramble only to truncated prefixes. This re-applies the scramble family at full length too.

The skip was wrong for the enable-signal matmul. That path reads joint counts |{t : rng_a[d,t]<ba AND rng_b[d,t]<bb}|, which depend on the per-d trajectory — not just the marginal count count(r<v)=v (the only quantity invariant under XOR over a Sobol permutation). With make_sobol_simple_config broadcasting the same Sobol-Q/Sobol-K pair across all D dims, skipping the scramble at full length left every dim with an identical joint trajectory, so SC noise across D accumulated instead of averaging out.

Always scrambling — selecting counter / bitrev / random via SC_OWEN_MODE inside _owen_scramble — restores per-dim decorrelation at stoc_len=256 (sc_prec=8).

Relation to prior work

Supersedes the closed #11, rebased onto main so it also carries the halve_bipolar_stoc_len flag + housekeeping from #12.

Effect (expected, Llama-3.1-8B-Instruct, wikitext-2 test, ctx=1024 stride=512, per_row, sc_prec=8)

config PPL xFP16
FP16 baseline 6.7711 1.000
INT8 per_row deterministic 6.9328 x1.024
SC sl=256 (no scramble at full length, prior) 7.9383 x1.172
SC sl=256 (scramble family applied) to be re-measured expect ≈ sl=128 floor

Test plan

  • Re-run sl=256 per_row PPL with SC_OWEN_MODE=counter and SC_OWEN_MODE=bitrev; confirm both drop from x1.172 toward the sl=128 / INT8 floor.
  • SC_DISABLE_OWEN=1 (or SC_OWEN_MODE=off) reproduces the prior unscrambled sl=256 number — confirms the gate is the only behavioral change.
  • Truncated stoc_len < 256 numbers unchanged (that path already scrambled).

🤖 Generated with Claude Code

_prepare_rng_prefix previously skipped the per-dim Owen XOR when
stoc_len == 2**sc_prec (full-length Sobol), applying it only to
truncated prefixes. The skip is wrong for the enable-signal matmul:
that path reads JOINT counts |{t : rng_a[d,t]<ba AND rng_b[d,t]<bb}|,
which depend on the per-d trajectory, not just the marginal count
count(r<v)=v (the only thing invariant under XOR over a Sobol
permutation).

With make_sobol_simple_config broadcasting the same Sobol-Q/Sobol-K
pair across all D dims, skipping the scramble at full length left
every dim with an identical joint trajectory, so SC noise across D
accumulated instead of averaging out. Always scrambling — selecting
counter / bitrev / random via SC_OWEN_MODE inside _owen_scramble —
restores per-dim decorrelation at stoc_len=256 (sc_prec=8).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Allenjin123
Copy link
Copy Markdown
Contributor

let's discuss tomorrow on how to do it first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants