Use Singleton's stable trig recurrence for twiddle factors#120
Merged
Conversation
Replace the `w *= step` recurrence in every FFT kernel with Singleton's
(1967) trigonometric recurrence:
c_{k+1} = c_k - (α c_k + β s_k)
s_{k+1} = s_k - (α s_k - β c_k)
where `α = 2 sin²(θ/2)` and `β = sin(θ)` are precomputed once per
inner loop. Writing the correction first — `c - (αc + βs)` rather
than `(1-α)c - βs` — keeps the update in the high significand bits,
giving RMS error ~ √k · ε instead of the naive k · ε.
The only trig call per kernel is the Singleton setup
(`sincospi(freq / 2)`), so the inner loop stays ~8 flops per step,
comparable to the naive complex multiply and an order of magnitude
cheaper than a fresh per-iteration `cispi`. Kernels converted:
`fft_composite!`, both `fft_dft!` methods, `fft_pow2_radix4!`,
`fft_pow3!`. Recursive sub-calls now pass `w_sub = cispi(dir · 2 /
m)` so the sub-tree starts with a < 1 ULP phase.
Measured ComplexF32 forward-error relative to a ComplexF64 reference:
~1.3 ULP at N = 256, ~6.6 ULP at N = 16384 (vs ~4000 ULP on main);
roughly 3× worse than fresh SLEEF `cispi` per step but at a small
fraction of the cost.
Generic: works for any real `T` with `sincospi`, including `Float16`,
`BigFloat`, and user-defined types. Direction is recovered from
`sign(imag(w))` with a real-w fallback (N = 2).
Adds `test/onedim/accuracy.jl` as a regression guard.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cover the full power-of-2 ladder from 2^4 (16) up to 2^18 (262144) — including both even powers (= powers of 4, which recurse to the N = 4 base in `fft_pow2_radix4!`) and odd powers (which recurse to N = 2) — plus powers of 3 from 3^1 up to 3^9 (19683). Bounds are set ~2-3× above the worst ULP ratio observed over 5 seeds on aarch64; the pre-fix naive recurrence still fails these comfortably at any size above a few hundred. Factored the per-seed worst-relerr computation into a helper so the two category blocks stay readable. Total test runtime is ~0.5s.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #120 +/- ##
==========================================
+ Coverage 98.76% 98.82% +0.05%
==========================================
Files 4 5 +1
Lines 569 594 +25
==========================================
+ Hits 562 587 +25
Misses 7 7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
dannys4
approved these changes
Apr 27, 2026
Collaborator
dannys4
left a comment
There was a problem hiding this comment.
This looks great! thanks for the PR, and I'm glad that we can at least make a pretty drastic improvement with such a minor change. I wish I had the bandwidth to do more on this front.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This reduces the error brought up in #118. With this PR, the error grows with the square root of the problem size instead of proportionally. See plot below. This is still faster than logarithmically, which a table look-up approach could achieve, but the changes required for this PR are pretty minor, whereas I suspect the table look-up would require a larger refactoring. That can be done later, but I suspect it might increase the planning costs noticeably, so there might be a tradeoff. (Personally, I rarely use my plan more than twice.) The code is written by Claude based on my instructions.