Skip to content

PERF: Use ABI-guaranteed SIMD baselines for redistribution-safe FFTW builds#6007

Merged
hjmjohnson merged 2 commits intoInsightSoftwareConsortium:mainfrom
hjmjohnson:fftw-simd-redistributable
Apr 8, 2026
Merged

PERF: Use ABI-guaranteed SIMD baselines for redistribution-safe FFTW builds#6007
hjmjohnson merged 2 commits intoInsightSoftwareConsortium:mainfrom
hjmjohnson:fftw-simd-redistributable

Conversation

@hjmjohnson
Copy link
Copy Markdown
Member

@hjmjohnson hjmjohnson commented Apr 3, 2026

Summary

Addresses seanm's review of PR #6006: `check_c_source_runs` probed the build host's CPU at configure time, making distributed FFTW binaries unsafe on machines that lack the detected SIMD extensions (SIGILL crash). This is incompatible with conda, pip/PyPI, and manylinux redistribution workflows.

Detection policy (compile-time only, never runtime)

Platform Extension Default Rationale
x86_64 / AMD64 SSE, SSE2 ON Mandated by the AMD64 ABI — every 64-bit x86 CPU has them
aarch64 / arm64 NEON ON Mandated by the AArch64 ABI — every arm64 CPU has it
x86_64 AVX OFF by default Auto-ON when compiler already targets AVX (e.g. `-march=native`, `/arch:AVX`)
x86_64 AVX2 OFF by default Auto-ON when compiler already targets AVX2 (e.g. `-mavx2`, `/arch:AVX2`)
macOS universal all SIMD OFF Single configure cannot target both arm64 and x86_64 simultaneously

AVX/AVX2 detection uses `check_c_source_compiles` (not `_runs`) against the predefined macros `AVX`/`AVX2`, which are only defined when the compiler is actually generating those instructions. Cache variables are unset before each probe so detection re-runs every configure when compiler flags change.

Impact on redistribution platforms

  • conda/pip x86_64: SSE+SSE2 always ON (previously OFF without runtime probe passing)
  • conda/pip arm64: NEON always ON (unchanged)
  • AVX2 on build host: ON only when compiler targets it (previously ON for all build hosts with AVX2 hardware)

Follows up on / closes #6006.

Test plan

  • Configure on x86_64 Linux without extra flags → SSE/SSE2=ON, AVX/AVX2=OFF
  • Configure on x86_64 with `-DCMAKE_C_FLAGS=-mavx2` → AVX/AVX2=ON
  • Configure on arm64 → NEON=ON
  • Configure with `CMAKE_OSX_ARCHITECTURES="arm64;x86_64"` → all SIMD=OFF
  • Re-configure after adding `-march=native` — AVX detection re-runs (not stale)

🤖 Generated with Claude Code

@github-actions github-actions bot added type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Performance Improvement in terms of compilation or execution time labels Apr 3, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 3, 2026

Greptile Summary

This PR replaces a runtime CPU probe (check_c_source_runs + __builtin_cpu_supports) with a redistribution-safe detection policy: SSE/SSE2 are unconditionally ON for x86_64 (ABI-mandated), NEON is unconditionally ON for aarch64, and AVX/AVX2 are enabled only when the compiler's pre-defined __AVX__/__AVX2__ macros are present. A macOS universal-binary guard disables all SIMD defaults when CMAKE_OSX_ARCHITECTURES names more than one arch.

  • The check_c_source_compiles result for _fftw_compiler_targets_avx is cached as a CMake cache entry. If a user first configures without -march=native (caching it as 0) and later adds the flag, the stale cached value prevents re-detection, and option(FFTW_ENABLE_AVX) will not update either since it is already in the cache — effectively making the advertised "auto-enable" a one-shot event. Adding unset(_fftw_compiler_targets_avx CACHE) before each probe (or a clear warning comment) would make the behaviour match user expectations.

Confidence Score: 4/5

Safe to merge with a known UX rough edge: the auto-detection of AVX/AVX2 is a one-shot operation due to CMake check-variable caching, which may surprise users who change CMAKE_C_FLAGS on an existing build tree.

The core redistribution-safety goal (replacing _runs with _compiles / ABI baselines) is correct and well-reasoned. One P1 finding remains: the stale-cache double-lock means the auto-enable AVX behaviour silently does nothing on a reconfigure, contrary to the stated semantics. This does not produce incorrect FFTW binaries but fails to upgrade them when the user's intent changes.

CMake/itkExternal_FFTW.cmake — the two check_c_source_compiles blocks for AVX/AVX2 and their interaction with option() caching.

Important Files Changed

Filename Overview
CMake/itkExternal_FFTW.cmake Replaces runtime CPU-probing (check_c_source_runs) with ABI-guaranteed SSE/SSE2/NEON defaults and compile-time AVX/AVX2 detection via check_c_source_compiles; adds macOS universal-binary guard. Correct for redistribution safety, with one stale-cache edge case for users who change CMAKE_C_FLAGS between configure runs.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[cmake configure] --> B{ITK_USE_SYSTEM_FFTW?}
    B -- yes --> Z[find_package FFTW]
    B -- no --> C{APPLE AND CMAKE_OSX_ARCHITECTURES count > 1?}
    C -- yes universal2 --> D[All SIMD OFF + status message]
    C -- no --> E{CMAKE_SYSTEM_PROCESSOR}
    E -- aarch64/arm64/ARM64 --> F[NEON=ON, SSE/AVX=OFF]
    E -- x86_64/AMD64 --> G[SSE=ON, SSE2=ON]
    G --> H[check_c_source_compiles __AVX__ macro]
    H -- defined --> I[_fftw_default_avx=ON]
    H -- not defined --> J[_fftw_default_avx=OFF]
    I --> K[check_c_source_compiles __AVX2__ macro]
    J --> K
    K -- defined --> L[_fftw_default_avx2=ON]
    K -- not defined --> M[_fftw_default_avx2=OFF]
    E -- i686/i386 --> N[All SIMD OFF]
    E -- other --> N
    F --> O[option FFTW_ENABLE_* cached defaults]
    L --> O
    M --> O
    D --> O
    N --> O
    O --> P[ExternalProject_Add fftwf/fftwd with -DENABLE_* flags]
Loading

Reviews (1): Last reviewed commit: "PERF: Use ABI-guaranteed SIMD baselines ..." | Re-trigger Greptile

@hjmjohnson hjmjohnson requested a review from seanm April 3, 2026 11:03
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request Apr 3, 2026
check_c_source_compiles stores its result in the CMake cache by variable
name.  Without unsetting first, a subsequent configure that adds
-march=native to CMAKE_C_FLAGS would silently reuse the stale cached
0, leaving FFTW_ENABLE_AVX at its initial default even though the
compiler is now generating AVX instructions.

Adding unset(_fftw_compiler_targets_avx CACHE) and
unset(_fftw_compiler_targets_avx2 CACHE) before each probe forces the
compile check to re-run on every configure, ensuring the auto-detected
default always reflects the current toolchain flags.  The option()
caching semantics are unchanged: FFTW_ENABLE_AVX/AVX2 are only updated
from the detected default when not already present in the CMake cache,
so explicit user overrides are preserved.

Addresses greptile P1 finding on PR InsightSoftwareConsortium#6007.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@hjmjohnson
Copy link
Copy Markdown
Member Author

Addressed the greptile P1 finding: added unset(_fftw_compiler_targets_avx CACHE) and unset(_fftw_compiler_targets_avx2 CACHE) before each check_c_source_compiles probe.

This ensures the AVX/AVX2 compiler capability check re-runs on every cmake configure, correctly reflecting the current CMAKE_C_FLAGS. Previously, a user who configured without -march=native (caching the result as 0) and then added the flag on a subsequent configure would silently get the stale detection.

The option() caching semantics are intentionally preserved: FFTW_ENABLE_AVX / FFTW_ENABLE_AVX2 are only auto-set from the detected default when not already present in the CMake cache, so explicit user overrides (-DFFTW_ENABLE_AVX2=ON) survive reconfigures. The commit message documents this for users who need to force re-evaluation.

hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request Apr 3, 2026
check_c_source_compiles stores its result in the CMake cache by variable
name.  Without unsetting first, a subsequent configure that adds
-march=native to CMAKE_C_FLAGS would silently reuse the stale cached
0, leaving FFTW_ENABLE_AVX at its initial default even though the
compiler is now generating AVX instructions.

Adding unset(_fftw_compiler_targets_avx CACHE) and
unset(_fftw_compiler_targets_avx2 CACHE) before each probe forces the
compile check to re-run on every configure, ensuring the auto-detected
default always reflects the current toolchain flags.  The option()
caching semantics are unchanged: FFTW_ENABLE_AVX/AVX2 are only updated
from the detected default when not already present in the CMake cache,
so explicit user overrides are preserved.

Addresses greptile P1 finding on PR InsightSoftwareConsortium#6007.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@hjmjohnson hjmjohnson force-pushed the fftw-simd-redistributable branch from 71aa996 to 5bc12c2 Compare April 3, 2026 12:26
…builds

Addresses seanm's review of ITK PR InsightSoftwareConsortium#6006: the previous
check_c_source_runs approach probed the BUILD HOST's CPU at configure
time, producing FFTW binaries that require the build machine's exact CPU
and SIGILL on any machine that lacks the detected SIMD extensions.  This
is unsafe for redistributed binary packages (conda, pip/PyPI, manylinux
Docker images) where build and target machines differ.

New detection policy (compile-time only, never runtime):

  x86_64 / AMD64:
    SSE and SSE2 are mandated by the AMD64 ABI -- every 64-bit x86 CPU
    supports them regardless of age.  Both are enabled by default.
    Safe for all manylinux2014 / manylinux_2_28 / conda x86_64 builds.

  aarch64 / arm64:
    NEON is mandated by the AArch64 ABI -- every arm64 CPU has it.
    Enabled by default.  Safe for all conda / manylinux aarch64 builds.

  AVX / AVX2 (Sandy Bridge 2011 / Haswell 2013 required):
    NOT universally available; default OFF for redistribution safety.
    Auto-enabled only when the compiler is already generating those
    instructions -- i.e. when the user passed -march=native, -mavx2,
    /arch:AVX2, or similar.  Detected via check_c_source_compiles
    (not _runs) which tests what the compiler targets, not what the
    build host's CPU can execute.  This implements seanm's recommended
    "the compiler knows what CPU it's compiling for" approach.
    The AVX/AVX2 cache variables are unset before each probe so that
    detection re-runs on every configure when compiler flags change
    (e.g. user later adds -march=native).

  macOS universal binary (CMAKE_OSX_ARCHITECTURES with >1 entry):
    SIMD defaults disabled; a single configure pass cannot produce
    correct per-slice codelets for both arm64 and x86_64.

This change is a strict improvement on the previous behaviour for the
most important redistribution platforms:
  - conda/pip on x86_64: SSE+SSE2 always ON (was OFF without runtime probe)
  - conda/pip on arm64:  NEON always ON (unchanged)
  - AVX2 on build host:  ON only when compiler targets it (was ON always)

Closes InsightSoftwareConsortium#6006 (follow-up addressing seanm review)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@hjmjohnson hjmjohnson force-pushed the fftw-simd-redistributable branch from 5bc12c2 to 3662401 Compare April 3, 2026 15:03
Copy link
Copy Markdown
Member

@dzenanz dzenanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on a glance. Somebody else should review too.

Replace the redundant `!defined(__AVX__) || !__AVX__` pattern with the
conventional `#ifndef __AVX__` form.  All major compilers (GCC, Clang,
MSVC /arch:AVX) define __AVX__ as 1 (never 0) when AVX is active, so
the `|| !__AVX__` branch is dead code.

Addresses Greptile P2 review comment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@hjmjohnson hjmjohnson merged commit c4fcb5e into InsightSoftwareConsortium:main Apr 8, 2026
19 of 21 checks passed
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request Apr 9, 2026
Cherry-pick of PR InsightSoftwareConsortium#6007 (commits 3662401, 934faaa) from main.

Enable FFTW SIMD codelets using ABI-guaranteed baselines:
  - x86_64: SSE + SSE2 (required by AMD64 ABI) -- default ON
  - aarch64: NEON (required by AArch64 ABI) -- default ON
  - AVX/AVX2: default OFF unless compiler already targets them

MSVC is excluded (FFTW SIMD codelets use GCC/Clang inline assembly).

Simplifies AVX detection guards to #ifndef form.

Closes InsightSoftwareConsortium#6025.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Performance Improvement in terms of compilation or execution time

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants