PERF: Use ABI-guaranteed SIMD baselines for redistribution-safe FFTW builds#6007
Conversation
|
| Filename | Overview |
|---|---|
| CMake/itkExternal_FFTW.cmake | Replaces runtime CPU-probing (check_c_source_runs) with ABI-guaranteed SSE/SSE2/NEON defaults and compile-time AVX/AVX2 detection via check_c_source_compiles; adds macOS universal-binary guard. Correct for redistribution safety, with one stale-cache edge case for users who change CMAKE_C_FLAGS between configure runs. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[cmake configure] --> B{ITK_USE_SYSTEM_FFTW?}
B -- yes --> Z[find_package FFTW]
B -- no --> C{APPLE AND CMAKE_OSX_ARCHITECTURES count > 1?}
C -- yes universal2 --> D[All SIMD OFF + status message]
C -- no --> E{CMAKE_SYSTEM_PROCESSOR}
E -- aarch64/arm64/ARM64 --> F[NEON=ON, SSE/AVX=OFF]
E -- x86_64/AMD64 --> G[SSE=ON, SSE2=ON]
G --> H[check_c_source_compiles __AVX__ macro]
H -- defined --> I[_fftw_default_avx=ON]
H -- not defined --> J[_fftw_default_avx=OFF]
I --> K[check_c_source_compiles __AVX2__ macro]
J --> K
K -- defined --> L[_fftw_default_avx2=ON]
K -- not defined --> M[_fftw_default_avx2=OFF]
E -- i686/i386 --> N[All SIMD OFF]
E -- other --> N
F --> O[option FFTW_ENABLE_* cached defaults]
L --> O
M --> O
D --> O
N --> O
O --> P[ExternalProject_Add fftwf/fftwd with -DENABLE_* flags]
Reviews (1): Last reviewed commit: "PERF: Use ABI-guaranteed SIMD baselines ..." | Re-trigger Greptile
check_c_source_compiles stores its result in the CMake cache by variable name. Without unsetting first, a subsequent configure that adds -march=native to CMAKE_C_FLAGS would silently reuse the stale cached 0, leaving FFTW_ENABLE_AVX at its initial default even though the compiler is now generating AVX instructions. Adding unset(_fftw_compiler_targets_avx CACHE) and unset(_fftw_compiler_targets_avx2 CACHE) before each probe forces the compile check to re-run on every configure, ensuring the auto-detected default always reflects the current toolchain flags. The option() caching semantics are unchanged: FFTW_ENABLE_AVX/AVX2 are only updated from the detected default when not already present in the CMake cache, so explicit user overrides are preserved. Addresses greptile P1 finding on PR InsightSoftwareConsortium#6007. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Addressed the greptile P1 finding: added This ensures the AVX/AVX2 compiler capability check re-runs on every The |
check_c_source_compiles stores its result in the CMake cache by variable name. Without unsetting first, a subsequent configure that adds -march=native to CMAKE_C_FLAGS would silently reuse the stale cached 0, leaving FFTW_ENABLE_AVX at its initial default even though the compiler is now generating AVX instructions. Adding unset(_fftw_compiler_targets_avx CACHE) and unset(_fftw_compiler_targets_avx2 CACHE) before each probe forces the compile check to re-run on every configure, ensuring the auto-detected default always reflects the current toolchain flags. The option() caching semantics are unchanged: FFTW_ENABLE_AVX/AVX2 are only updated from the detected default when not already present in the CMake cache, so explicit user overrides are preserved. Addresses greptile P1 finding on PR InsightSoftwareConsortium#6007. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
71aa996 to
5bc12c2
Compare
…builds Addresses seanm's review of ITK PR InsightSoftwareConsortium#6006: the previous check_c_source_runs approach probed the BUILD HOST's CPU at configure time, producing FFTW binaries that require the build machine's exact CPU and SIGILL on any machine that lacks the detected SIMD extensions. This is unsafe for redistributed binary packages (conda, pip/PyPI, manylinux Docker images) where build and target machines differ. New detection policy (compile-time only, never runtime): x86_64 / AMD64: SSE and SSE2 are mandated by the AMD64 ABI -- every 64-bit x86 CPU supports them regardless of age. Both are enabled by default. Safe for all manylinux2014 / manylinux_2_28 / conda x86_64 builds. aarch64 / arm64: NEON is mandated by the AArch64 ABI -- every arm64 CPU has it. Enabled by default. Safe for all conda / manylinux aarch64 builds. AVX / AVX2 (Sandy Bridge 2011 / Haswell 2013 required): NOT universally available; default OFF for redistribution safety. Auto-enabled only when the compiler is already generating those instructions -- i.e. when the user passed -march=native, -mavx2, /arch:AVX2, or similar. Detected via check_c_source_compiles (not _runs) which tests what the compiler targets, not what the build host's CPU can execute. This implements seanm's recommended "the compiler knows what CPU it's compiling for" approach. The AVX/AVX2 cache variables are unset before each probe so that detection re-runs on every configure when compiler flags change (e.g. user later adds -march=native). macOS universal binary (CMAKE_OSX_ARCHITECTURES with >1 entry): SIMD defaults disabled; a single configure pass cannot produce correct per-slice codelets for both arm64 and x86_64. This change is a strict improvement on the previous behaviour for the most important redistribution platforms: - conda/pip on x86_64: SSE+SSE2 always ON (was OFF without runtime probe) - conda/pip on arm64: NEON always ON (unchanged) - AVX2 on build host: ON only when compiler targets it (was ON always) Closes InsightSoftwareConsortium#6006 (follow-up addressing seanm review) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5bc12c2 to
3662401
Compare
dzenanz
left a comment
There was a problem hiding this comment.
Looks good on a glance. Somebody else should review too.
Replace the redundant `!defined(__AVX__) || !__AVX__` pattern with the conventional `#ifndef __AVX__` form. All major compilers (GCC, Clang, MSVC /arch:AVX) define __AVX__ as 1 (never 0) when AVX is active, so the `|| !__AVX__` branch is dead code. Addresses Greptile P2 review comment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
c4fcb5e
into
InsightSoftwareConsortium:main
Cherry-pick of PR InsightSoftwareConsortium#6007 (commits 3662401, 934faaa) from main. Enable FFTW SIMD codelets using ABI-guaranteed baselines: - x86_64: SSE + SSE2 (required by AMD64 ABI) -- default ON - aarch64: NEON (required by AArch64 ABI) -- default ON - AVX/AVX2: default OFF unless compiler already targets them MSVC is excluded (FFTW SIMD codelets use GCC/Clang inline assembly). Simplifies AVX detection guards to #ifndef form. Closes InsightSoftwareConsortium#6025. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Addresses seanm's review of PR #6006: `check_c_source_runs` probed the build host's CPU at configure time, making distributed FFTW binaries unsafe on machines that lack the detected SIMD extensions (SIGILL crash). This is incompatible with conda, pip/PyPI, and manylinux redistribution workflows.
Detection policy (compile-time only, never runtime)
AVX/AVX2 detection uses `check_c_source_compiles` (not `_runs`) against the predefined macros `AVX`/`AVX2`, which are only defined when the compiler is actually generating those instructions. Cache variables are unset before each probe so detection re-runs every configure when compiler flags change.
Impact on redistribution platforms
Follows up on / closes #6006.
Test plan
🤖 Generated with Claude Code