Skip to content

Conversation

@anonrig
Copy link
Member

@anonrig anonrig commented Nov 15, 2025

It seems SSSE3 gives us a 5% performance boost.

  1. BasicBench_AdaURL_href (Full URL parsing with href getter)
Metric SSE2 Baseline (median) SSSE3 Optimized (median) Improvement
Time per URL 249.276 ns 237.648 ns 4.7% faster
Throughput 327.212 M/s 354.032 M/s 8.2% higher
URLs per second 3.767M 4.076M 8.2% more
Time per byte 3.056 ns 2.825 ns 7.6% faster
Cycles per URL 974.434 964.386 1.0% fewer
Instructions per URL 3.218k 3.190k 0.9% fewer
  1. BasicBench_AdaURL_aggregator_href (Aggregator parsing)
Metric SSE2 Baseline (median) SSSE3 Optimized (median) Improvement
Time per URL 155.386 ns 150.837 ns 2.9% faster
Throughput 517.071 M/s 526.278 M/s 1.8% higher
URLs per second 5.953M 6.059M 1.8% more
Time per byte 1.934 ns 1.900 ns 1.8% faster
Cycles per URL 619.107 603.857 2.5% fewer
Instructions per URL 2.209k 2.178k 1.4% fewer

@anonrig anonrig requested a review from lemire November 15, 2025 20:04
Copy link
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is reasonable, but note that unless the library is compiled for SSSE3 support, the new path will not be used. And ada will typically not be compiled for SSSE3 (unless you are a gentoo user or something of the sort), thus the support will not be used. (Note that whether the cpu defaults on SSSE3 support depends on the system. So there are definitely systems where your optimization will work out of the box. I think Suse and RedHat maybe?)

One possibility, if we don't want to manually do the runtime dispatching (like in simdutf and simdjson) would be to specialize the feature for GCC/clang. (So no Visual Studio support.)

Then you can use __builtin_cpu_supports.

https://clang.llvm.org/docs/LanguageExtensions.html

https://gcc.gnu.org/onlinedocs/gcc/x86-Built-in-Functions.html

It is likely that copilot/claude/... can do it for you pretty close to correctly.

One downside of such an optimization is that it makes testing more difficult because you have to test two or more paths depending on what the CPU supports. In a library like simdjson or simdutf, we do everything manually.

@lemire
Copy link
Member

lemire commented Nov 15, 2025

Ok, so according to grok, I was possibly pessimistic:

 Ubuntu, starting with version 24.04 LTS (Noble Numbat) released in April 2024, sets the x86-64 baseline to x86-64-v2 in GCC. This includes SSSE3, along with SSE3, POPCNT, and CMPXCHG16B. The change was made to align with hardware realities, as SSSE3 has been standard since Intel Core 2 (Merom) and AMD K10 processors. Earlier Ubuntu LTS releases like 22.04 use the older x86-64 baseline (requiring -mssse3 for SSSE3), but 24.04 and later assume it by default.
  Debian follows a similar path. Debian 13 (Trixie), expected to release in mid-2025, adopts the x86-64-v2 baseline in its GCC packages, including SSSE3. Debian 12 (Bookworm) still uses the original x86-64 baseline, so SSSE3 requires explicit flags there.
 Fedora has been more aggressive. Since Fedora 38 in 2023, the default GCC target is x86-64-v3, which encompasses SSSE3 plus additional instructions like SSE4.1, SSE4.2, and others. This means even earlier Fedora releases generate SSSE3 code by default.
 Arch Linux, being rolling-release, updated its GCC defaults to x86-64-v3 around mid-2023, assuming SSSE3 and beyond.
  Other distros like openSUSE Tumbleweed and Gentoo (with default profiles) also use x86-64-v3 or higher baselines in recent versions. To confirm for a specific setup, check the output of gcc -v or gcc -march=native -Q --help=target on the target system; the enabled instructions will list "ssse3" under the default march if assumed.

Assuming that this is broadly correct, then your optimization is well justified.

@lemire
Copy link
Member

lemire commented Nov 15, 2025

Rocky Linux 9 uses x86-64-v2 which includes SSSE3. But I cannot verify the claim regarding Ubuntu defaulting on SSSE3 support. I think that grok was hallucinating.

@anonrig
Copy link
Member Author

anonrig commented Nov 15, 2025

Any downsides to landing it? @lemire

@anonrig anonrig merged commit c38f67a into main Nov 16, 2025
50 checks passed
@anonrig anonrig deleted the yagiz/add-ssse3-optimization branch November 16, 2025 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants