Skip to content

perf: add hex-simd benchmarks and optimize SIMD implementations#35

Closed
zerosnacks wants to merge 1 commit intoDaniPopes:masterfrom
zerosnacks:zerosnacks/hex-simd-bench
Closed

perf: add hex-simd benchmarks and optimize SIMD implementations#35
zerosnacks wants to merge 1 commit intoDaniPopes:masterfrom
zerosnacks:zerosnacks/hex-simd-bench

Conversation

@zerosnacks
Copy link
Copy Markdown

@zerosnacks zerosnacks commented Jan 24, 2026

Adds hex-simd to benchmarks (closes #25) and optimizes SIMD implementations to match/exceed hex-simd performance.

Benchmark results

CPU: Intel Core i9-13900K (x86_64, AVX2)

check

Size const-hex (before) const-hex (after) hex-simd faster-hex
32B 2.47 ns 2.44 ns 1.71 ns 3.36 ns
256B 17.58 ns 16.69 ns 13.50 ns 23.36 ns
2KB 150.6 ns 81.69 ns 113 ns 146.7 ns
16KB 1.13 µs 570.6 ns 885.6 ns 1.13 µs
128KB 8.96 µs 5.50 µs 6.99 µs 10.2 µs
1MB 68.73 µs 46.64 µs 53.29 µs 71.92 µs

encode

Size const-hex (before) const-hex (after) hex-simd faster-hex
32B 8.50 ns 8.98 ns 6.91 ns 12.03 ns
256B 23.69 ns 15.61 ns 16.22 ns 21.08 ns
2KB 88.69 ns 72.69 ns 107.3 ns 110.6 ns
16KB 599.6 ns 548.1 ns 845.6 ns 726.6 ns
128KB 4.42 µs 3.75 µs 6.64 µs 7.59 µs
1MB 47.43 µs 47.96 µs 53.70 µs 77.20 µs

encode_to_slice

Size const-hex (before) const-hex (after) hex-simd faster-hex
32B 1.90 ns 1.73 ns 1.71 ns 4.09 ns
256B 10.49 ns 8.28 ns 12.64 ns 13.38 ns
2KB 77.63 ns 60.10 ns 103.3 ns 85.44 ns
16KB 613.1 ns 539.9 ns 825.6 ns 653.9 ns
128KB 5.28 µs 5.11 µs 6.61 µs 5.22 µs
1MB 55.39 µs 55.40 µs 52.73 µs 51.51 µs

decode

Size const-hex (before) const-hex (after) hex-simd faster-hex
32B 28.69 ns 16.05 ns 9.46 ns 13.71 ns
256B 46.75 ns 26.16 ns 35.16 ns 43.66 ns
2KB 256.6 ns 161.8 ns 252.6 ns 270.4 ns
16KB 1.97 µs 1.28 µs 1.94 µs 2.03 µs
128KB 15.54 µs 9.62 µs 14.80 µs 15.98 µs
1MB 118.2 µs 82.16 µs 123.8 µs 135.8 µs

decode_to_slice

Size const-hex (before) const-hex (after) hex-simd faster-hex
32B 5.57 ns 7.20 ns 3.62 ns 6.49 ns
256B 30.77 ns 21.58 ns 28.66 ns 36.10 ns
2KB 249.3 ns 163.1 ns 235.1 ns 250.9 ns
16KB 1.96 µs 1.28 µs 1.97 µs 1.98 µs
128KB 15.55 µs 10.11 µs 14.85 µs 15.65 µs
1MB 118.2 µs 81.89 µs 124.5 µs 122 µs

format

Size const-hex (before) const-hex (after) std
32B 13.95 ns 7.30 ns 549.6 ns
256B 20.35 ns 18.96 ns 3.81 µs
2KB 121.1 ns 108.5 ns 30.7 µs
16KB 1.22 µs 1.10 µs 135.5 µs
128KB 11.11 µs 11.04 µs 1.06 ms
1MB 151.1 µs 161.4 µs 8.45 ms

Changes

  • Keep original 6-comparison SSE2 algorithm for small inputs (addresses, hashes)
  • Add AVX2 check with signed overflow trick for larger inputs ≥128 bytes (Muła & Langdale)
  • Double encode throughput (32→64 bytes/iter) via new AVX2 path
  • Add 13 new edge case tests covering SIMD boundaries and all byte values

Adds hex-simd to benchmarks (closes DaniPopes#25) and significantly improves
performance of check and encode operations.

Performance optimizations:
- Replace check algorithm with signed overflow trick (3x faster)
- Add AVX2 check path processing 32 bytes per iteration
- Double encode throughput (32→64 bytes/iter) via new AVX2 path

Testing:
- Add 13 new edge case tests covering SIMD boundaries and all byte values

References:
- http://0x80.pl/notesen/2022-01-17-validating-hex-parse.html

Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019bf1ee-2643-76af-8139-99d7c2c02486
@zerosnacks zerosnacks force-pushed the zerosnacks/hex-simd-bench branch from 36017f8 to 64f908a Compare January 24, 2026 21:57
@zerosnacks zerosnacks marked this pull request as ready for review January 24, 2026 22:01
@DaniPopes
Copy link
Copy Markdown
Owner

Split into #38 (benchmarks) and #39 (optimizations). Thanks @zerosnacks!

@DaniPopes DaniPopes closed this Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add hex-simd to benches

2 participants