Description
Not too common on Rust-based projects, but it would be great to have an "universal" x86-64 build that can work on any modern machine (x86-64-v3 to target Haswell 2013, Excavator 2015) and to take advantage of broadly-available SIMD from that level and below.
Personally I use this:
RUSTFLAGS="-C target-cpu=x86-64-v3" \
CARGO_PROFILE_RELEASE_OPT_LEVEL="3" \
CARGO_PROFILE_RELEASE_LTO="fat" \
CARGO_PROFILE_RELEASE_CODEGEN_UNITS="1" \
cargo build --release
(I used target-cpu=native on MY machine for the lolz).
These builds shouldn't be the focus while the project is on development, but rather, once the version becomes stable. Given there are benchmarks, it would be great to see a preview of the performance differences to prepare accordingly, or just revisit later.
SIMD?
A quick AI prompt says the following:
| Microarchitecture Level |
Instruction Set Extension |
Core Functionality |
Direct Benefit for PHPantom / Mago |
Impact Status |
| x86-64-v3 Specific |
BMI1 & BMI2 (Bit Manipulation Instructions) |
Adds advanced, single-cycle instructions for bit-field extraction, packing, bit trailing/leading zero counts, and arbitrary bit selection. |
Massive. Rust uses bit manipulation heavily for tracking token state, syntax validation masks, and memory layouts. It streamlines the evaluation of complex logical boolean expressions within the linter. |
Highly Beneficial |
| |
AVX2 (Advanced Vector Extensions 2) |
Expands integer vector operations to full 256-bit SIMD registers. |
High. While Mago skips the math features, the compiler uses AVX2 to vectorize string operations. It allows PHPantom to scan multiple bytes of text simultaneously to locate syntax boundaries (like commas, braces, and strings). |
Highly Beneficial |
| |
LZCNT (Leading Zero Count) |
Counts the number of leading zero bits in an integer in a single clock cycle. |
High. Vital for fast low-level data structure alignment and hash map indexing, which Rust's compiler leverages heavily during the project-indexing phase. |
Highly Beneficial |
| |
AVX |
256-bit floating-point SIMD vector operations. |
None. These are the floating-point vector roots. As a text parsing application, Mago rarely interacts with decimal math. |
No Impact |
| |
FMA3 (Fused Multiply-Add) |
Computes $(A \times B) + C$ in a single instruction with infinite precision. |
None. Specifically engineered for deep learning, matrix algebra, and graphics. Not used in static analysis. |
No Impact |
| |
F16C |
Hardware conversion instructions between 16-bit and 32-bit floats. |
None. Unused by compilers and static checkers. |
No Impact |
| |
MOVBE |
Reverses the byte order of data during a register-to-memory move (Big-Endian to Little-Endian). |
Minimal/None. Mainstream PHP code bases are read sequentially as standard UTF-8 text on standard x86 systems; byte swapping is rarely required. |
No Impact |
| x86-64-v2 Inherited |
POPCNT (Population Count) |
Counts the number of bits set to 1 in a register in a single cycle. |
High. Essential for Rust's internal data structures, bit-sets, and sparse arrays used to hold syntax trees efficiently in memory. |
Highly Beneficial |
| |
SSE4.2 |
Adds advanced string and text processing instructions (e.g., PCMPESTRI). |
High. Directly accelerates string searching and character matching, allowing quicker detection of PHP keywords (function, class, public) within source text streams. |
Highly Beneficial |
| |
SSE4.1, SSSE3, SSE3 |
Standard 128-bit media and basic alignment extensions. |
Moderate. Provides basic, ubiquitous building blocks for memory copying (memcpy) and buffer clearing that Rust utilizes implicitly. |
Beneficial |
| |
CMPXCHG16B |
Allows atomic compare-and-swap operations on 16-byte (128-bit) values. |
Moderate. Critical for parallel execution. PHPantom utilizes this to safely coordinate background indexing threads without heavy locking overhead. |
Beneficial |
| |
LAHF/SAHF |
Loads/stores status flags into the CPU's AH register. |
Minimal. Helps optimize simple scalar branch logic conditionals inside deep loop constructs. |
Minor Benefit |
Use case
The main difference between x86-64-v3 and x86-64-v4 and AVX512. From this group of SIMD (yes, it's a group), only few are relevant (again, AI helping this one):
| AVX-512 Subset |
Core Functionality |
Specific Value to Mago / PHPantom |
Impact Status |
| AVX-512BW |
512-bit Byte/Word vectors |
Scans 64 bytes of PHP text simultaneously; radically accelerates whitespace stripping and symbol searching. |
Massive / Revolutionary |
| Opmask Registers |
Native vector predication |
Eliminates scalar loop tails; handles unaligned PHP keywords and variable names without dropping out of SIMD. |
Massive / Revolutionary |
| AVX-512CD |
Conflict detection |
Accelerates string interning (mago-atom) and deduplication during the workspace-indexing phase. |
Highly Beneficial |
| Ternary Logic |
3-input bitwise operations in 1 cycle |
Condenses multiple syntax checking/masking steps inside the linter into single-cycle executions. |
Highly Beneficial |
| AVX-512F / DQ |
Ultra-wide Floating-Point math |
Unused. Static analyzers do not perform heavy decimal matrix math. |
No Impact |
The reasoning why it's better to not use x86-64-v4, apart from historical compatibility, is that you require either Zen 4 (2022) or Zen 5 processors, or specifically Intel 10gen (2019) or 11gen. Later Intel CPUs do not have AVX512, meaning, any Intel CPU from 6 years ago would crash.
I don't know if the performance uplift of having full AVX512 support for Zen 4/5 users and Intel 10/11gen is big enough to warranty another version for "latest AMD CPU, Intel 10/11gen". Even then, AMD CPU owners could build PHPantom in less than 5 minutes with all target optimisations (target=native). I get the AI says "revolutionary performance", but it's inferring, not benchmarking.
Proposed solution
RUSTFLAGS="-C target-cpu=x86-64-v3" \
CARGO_PROFILE_RELEASE_OPT_LEVEL="3" \
CARGO_PROFILE_RELEASE_LTO="fat" \
CARGO_PROFILE_RELEASE_CODEGEN_UNITS="1" \
cargo build --release
Until then, the usual cargo build --release should be enough until the project is considered stable. A safe approach would be to use x86-64-v2, since I doubt anyone with a 10-decade old processor would run PHPantom at all.
Alternatives considered
x86-64-v1: MMX, SSE, and SSE2. Anything still running from 2003/2004.
x86-64-v2: SSE3, SSE4, POPCNT. Anything still running from 2008/2011.
I believe the best performance uplifts would be better with v3 rather than v2.
Code example
On the worflow, the flag would be added as this (or equivalent):
- name: Build
env:
# Conditionally set RUSTFLAGS if the target starts with x86_64
RUSTFLAGS: ${{ startsWith(matrix.target, 'x86_64') && '-C target-cpu=x86-64-v3' || '' }}
run: cargo build --release --target ${{ matrix.target }}
Description
Not too common on Rust-based projects, but it would be great to have an "universal" x86-64 build that can work on any modern machine (
x86-64-v3to target Haswell 2013, Excavator 2015) and to take advantage of broadly-available SIMD from that level and below.Personally I use this:
(I used
target-cpu=nativeon MY machine for the lolz).These builds shouldn't be the focus while the project is on development, but rather, once the version becomes stable. Given there are benchmarks, it would be great to see a preview of the performance differences to prepare accordingly, or just revisit later.
SIMD?
A quick AI prompt says the following:
Use case
The main difference between
x86-64-v3andx86-64-v4and AVX512. From this group of SIMD (yes, it's a group), only few are relevant (again, AI helping this one):The reasoning why it's better to not use
x86-64-v4, apart from historical compatibility, is that you require either Zen 4 (2022) or Zen 5 processors, or specifically Intel 10gen (2019) or 11gen. Later Intel CPUs do not have AVX512, meaning, any Intel CPU from 6 years ago would crash.I don't know if the performance uplift of having full AVX512 support for Zen 4/5 users and Intel 10/11gen is big enough to warranty another version for "latest AMD CPU, Intel 10/11gen". Even then, AMD CPU owners could build PHPantom in less than 5 minutes with all target optimisations (
target=native). I get the AI says "revolutionary performance", but it's inferring, not benchmarking.Proposed solution
Until then, the usual
cargo build --releaseshould be enough until the project is considered stable. A safe approach would be to usex86-64-v2, since I doubt anyone with a 10-decade old processor would run PHPantom at all.Alternatives considered
x86-64-v1: MMX, SSE, and SSE2. Anything still running from 2003/2004.x86-64-v2: SSE3, SSE4, POPCNT. Anything still running from 2008/2011.I believe the best performance uplifts would be better with
v3rather thanv2.Code example
On the worflow, the flag would be added as this (or equivalent):