Skip to content

sc/kernels: apply Owen scramble at full stoc_len too#11

Closed
heroarmor wants to merge 1 commit into
mainfrom
fix/owen-scramble-at-full-length
Closed

sc/kernels: apply Owen scramble at full stoc_len too#11
heroarmor wants to merge 1 commit into
mainfrom
fix/owen-scramble-at-full-length

Conversation

@heroarmor
Copy link
Copy Markdown
Collaborator

Summary

_prepare_rng_prefix previously skipped the per-dim XOR scramble when stoc_len == 2**sc_prec (full-length Sobol), with the comment "no scramble is needed". That reasoning only holds for the marginal count count(r<v)=v, which is invariant under XOR over a Sobol permutation.

The enable-signal matmul actually reads joint counts |{t : rng_a[d,t]<ba AND rng_b[d,t]<bb}| which depend on the per-d trajectory, not just marginals. With make_sobol_simple_config broadcasting the same Sobol-Q/Sobol-K pair across all D dims, skipping the scramble at full length left every dim with an identical joint trajectory, so SC noise across D accumulated instead of averaging out.

Effect

Llama-3.1-8B-Instruct PPL on wikitext-2 test (ctx=1024, stride=512, per_row, sc_prec=8):

config PPL xFP16
FP16 baseline 6.7711 1.000
INT8 per_row deterministic 6.9328 x1.024
SC sl=128 (Owen applied) 7.1771 x1.060
SC sl=256 (no Owen, prior) 7.9383 x1.172

SC sl=256 was worse than its own deterministic INT8 floor. After this change sl=256 also runs through Owen scramble; validation run pending.

Why this is safe

  • _owen_scramble only depends on prefix.shape[0] (D); it doesn't care whether the input is a prefix or the full sequence.
  • XOR is a bijection on [0, base_levels), so each per-d sequence is still a permutation of [0, base_levels). Marginals (and therefore k_table) are unchanged.
  • The fixed-level rescale branch (grid_levels != base_levels) is untouched.

Status

Proposed fix. Owen-at-full-length validation run (SC sl=256, full wikitext-2) is currently in flight; results will be posted in a follow-up comment.

Test plan

  • SC sl=256 per_row PPL drops from x1.172 toward x1.024 (INT8 floor) or better
  • SC sl=128 per_row PPL unchanged (it already went through Owen)
  • RULER VT @ 1K still scores 100 at sl>=96 (sanity)

🤖 Generated with Claude Code

When `stoc_len == 2**sc_prec`, `_prepare_rng_prefix` previously skipped
the per-dim XOR scramble. The comment claimed scrambling was "not
needed" at full length because the marginal `count(r<v)=v` over a
Sobol-N permutation is invariant under XOR.

That reasoning only covers the marginal. The enable-signal matmul reads
joint counts `|{t: rng_a[d,t]<ba AND rng_b[d,t]<bb}|`, which depend on
the per-d (rng_a, rng_b) *trajectory*, not just marginals. With the
default `make_sobol_simple_config` broadcasting the same Sobol-Q/Sobol-K
pair across all D dims, skipping the scramble at full length left every
dim with an identical joint trajectory. SC noise across D then
accumulated as a single biased estimator instead of averaging across
independent estimators.

Effect on Llama-3.1-8B-Instruct PPL (wikitext-2 test, ctx=1024 stride=512,
per_row, sc_prec=8):

  FP16                  6.7711   x1.000
  INT8 per_row det.     6.9328   x1.024   (deterministic floor)
  SC sl=128 (Owen)      7.1771   x1.060
  SC sl=256 (no Owen)   7.9383   x1.172   <- worse than INT8 floor

After this change SC at sl=256 also goes through `_owen_scramble`, giving
each dim a distinct XOR mask and recovering cross-D averaging. The
prefix-vs-full distinction is no longer load-bearing, so the `is_prefix`
guard is removed entirely.

Owen scramble itself is unchanged; only the gate is widened.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the SC enable-table RNG preparation logic so that the per-dimension Owen XOR scramble is applied even when using the full-length Sobol stream (stoc_len == 2**sc_prec). This aims to reduce cross-dimension correlation in joint (rng_a, rng_b) trajectories, improving noise averaging behavior across D dimensions.

Changes:

  • Always apply _owen_scramble() in _prepare_rng_prefix() when grid_levels == 2**sc_prec, including for full-length streams.
  • Remove the previous “no scramble needed at full length” special-casing and replace it with updated rationale in comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +908 to +913
prefix = rng[:, :stoc_len].contiguous() if stoc_len < rng.shape[1] else rng
if grid_levels == base_levels:
# Fixed-level path: if we're truncating a longer Sobol sequence, apply
# Owen scramble to break the prefix stratification artifact. When the
# sequence is used in full (non-truncated), no scramble is needed.
if is_prefix:
return _owen_scramble(prefix, base_levels)
return prefix
# Per-dim Owen XOR — even at full length, this decorrelates the joint
# (rng_a, rng_b) trajectory across dimensions; without it, all D dims
# share the same joint, and SC noise accumulates instead of averaging.
return _owen_scramble(prefix, base_levels)
@heroarmor
Copy link
Copy Markdown
Collaborator Author

Per Allen's recommendation, dropping this implementation approach. Closing the PR; remote branch fix/owen-scramble-at-full-length is left intact so the diff stays referenceable. The PPL data motivating the investigation (INT8 floor + sl=128 vs sl=256 anomaly) lives in scmp_llm#5; the alternative fix direction is TBD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants