Skip to content

Use Singleton's stable trig recurrence for twiddle factors#120

Merged
dannys4 merged 2 commits into
mainfrom
an/singleton-twiddles
Apr 27, 2026
Merged

Use Singleton's stable trig recurrence for twiddle factors#120
dannys4 merged 2 commits into
mainfrom
an/singleton-twiddles

Conversation

@andreasnoack
Copy link
Copy Markdown
Member

This reduces the error brought up in #118. With this PR, the error grows with the square root of the problem size instead of proportionally. See plot below. This is still faster than logarithmically, which a table look-up approach could achieve, but the changes required for this PR are pretty minor, whereas I suspect the table look-up would require a larger refactoring. That can be done later, but I suspect it might increase the planning costs noticeably, so there might be a tradeoff. (Personally, I rarely use my plan more than twice.) The code is written by Claude based on my instructions.

Screenshot 2026-04-27 at 23 00 32

andreasnoack and others added 2 commits April 27, 2026 23:16
Replace the `w *= step` recurrence in every FFT kernel with Singleton's
(1967) trigonometric recurrence:
    c_{k+1} = c_k - (α c_k + β s_k)
    s_{k+1} = s_k - (α s_k - β c_k)
where `α = 2 sin²(θ/2)` and `β = sin(θ)` are precomputed once per
inner loop. Writing the correction first — `c - (αc + βs)` rather
than `(1-α)c - βs` — keeps the update in the high significand bits,
giving RMS error ~ √k · ε instead of the naive k · ε.

The only trig call per kernel is the Singleton setup
(`sincospi(freq / 2)`), so the inner loop stays ~8 flops per step,
comparable to the naive complex multiply and an order of magnitude
cheaper than a fresh per-iteration `cispi`. Kernels converted:
`fft_composite!`, both `fft_dft!` methods, `fft_pow2_radix4!`,
`fft_pow3!`. Recursive sub-calls now pass `w_sub = cispi(dir · 2 /
m)` so the sub-tree starts with a < 1 ULP phase.

Measured ComplexF32 forward-error relative to a ComplexF64 reference:
~1.3 ULP at N = 256, ~6.6 ULP at N = 16384 (vs ~4000 ULP on main);
roughly 3× worse than fresh SLEEF `cispi` per step but at a small
fraction of the cost.

Generic: works for any real `T` with `sincospi`, including `Float16`,
`BigFloat`, and user-defined types. Direction is recovered from
`sign(imag(w))` with a real-w fallback (N = 2).

Adds `test/onedim/accuracy.jl` as a regression guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cover the full power-of-2 ladder from 2^4 (16) up to 2^18 (262144) —
including both even powers (= powers of 4, which recurse to the
N = 4 base in `fft_pow2_radix4!`) and odd powers (which recurse to
N = 2) — plus powers of 3 from 3^1 up to 3^9 (19683). Bounds are set
~2-3× above the worst ULP ratio observed over 5 seeds on aarch64;
the pre-fix naive recurrence still fails these comfortably at any
size above a few hundred.

Factored the per-seed worst-relerr computation into a helper so the
two category blocks stay readable. Total test runtime is ~0.5s.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.82%. Comparing base (cf52fa1) to head (5d89095).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #120      +/-   ##
==========================================
+ Coverage   98.76%   98.82%   +0.05%     
==========================================
  Files           4        5       +1     
  Lines         569      594      +25     
==========================================
+ Hits          562      587      +25     
  Misses          7        7              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@dannys4 dannys4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! thanks for the PR, and I'm glad that we can at least make a pretty drastic improvement with such a minor change. I wish I had the bandwidth to do more on this front.

@dannys4 dannys4 added this pull request to the merge queue Apr 27, 2026
Merged via the queue into main with commit 56bfdbc Apr 27, 2026
9 checks passed
@dannys4 dannys4 deleted the an/singleton-twiddles branch April 27, 2026 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants