perf: optimize SIMD check and encode implementations by DaniPopes · Pull Request #39 · DaniPopes/const-hex

DaniPopes · 2026-02-25T07:54:55Z

Replace check algorithm with signed overflow trick for large inputs (≥128 bytes). Add AVX2 check path processing 32 bytes per iteration. Double encode throughput (32→64 bytes/iter) via new AVX2 path.

References:

http://0x80.pl/notesen/2022-01-17-validating-hex-parse.html

Split from #35, credit to @zerosnacks.

Replace check algorithm with signed overflow trick for large inputs (≥128 bytes). Add AVX2 check path processing 32 bytes per iteration. Double encode throughput (32→64 bytes/iter) via new AVX2 path. References: - http://0x80.pl/notesen/2022-01-17-validating-hex-parse.html

codspeed-hq · 2026-02-25T07:56:09Z

Merging this PR will degrade performance by 30.01%

⚡ 12 improved benchmarks
❌ 4 regressed benchmarks
✅ 20 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
❌	`bench1_32b`	321.9 ns	460 ns	-30.01%
⚡	`bench5_128k`	206.2 µs	151.7 µs	+35.89%
⚡	`bench3_2k`	3.5 µs	2.8 µs	+26.51%
⚡	`bench4_16k`	26 µs	19.3 µs	+34.71%
❌	`bench2_256b`	812.5 ns	915 ns	-11.2%
⚡	`bench3_2k`	7.8 µs	7.1 µs	+10.14%
❌	`bench1_32b`	429.4 ns	493.1 ns	-12.9%
⚡	`bench3_2k`	5.6 µs	4.8 µs	+16.34%
⚡	`bench5_128k`	314.6 µs	260.1 µs	+20.94%
⚡	`bench6_1m`	2.5 ms	2.1 ms	+21.24%
⚡	`bench4_16k`	39.7 µs	32.9 µs	+20.46%
⚡	`bench6_1m`	2.5 ms	2.1 ms	+21.27%
⚡	`bench6_1m`	1.6 ms	1.2 ms	+36.07%
❌	`bench1_32b`	1.3 µs	1.5 µs	-10.07%
⚡	`bench4_16k`	41.6 µs	34.9 µs	+19.16%
⚡	`bench5_128k`	312 µs	257.5 µs	+21.17%

_{Comparing dani/optimize-simd (852c862) with master (023646a)}

Apply Muła & Langdale signed overflow trick to SSE2 check path too, reducing from 10 to 5 operations per chunk. Remove length threshold dispatch since both paths now use the same efficient algorithm.

Extract check_chunk_sse2 and use it directly in check_avx2's remainder handler instead of calling check_sse2 which goes through chunks_exact. Since the AVX2 remainder is at most 31 bytes, there's at most one 16-byte chunk — no need for a loop.

DaniPopes mentioned this pull request Feb 25, 2026

perf: add hex-simd benchmarks and optimize SIMD implementations #35

Closed

DaniPopes added 19 commits February 25, 2026 17:46

fix: use signed overflow trick for SSE2 check, clean up tests

612c134

Apply Muła & Langdale signed overflow trick to SSE2 check path too, reducing from 10 to 5 operations per chunk. Remove length threshold dispatch since both paths now use the same efficient algorithm.

refactor: cascade check impls avx2 → sse2 → scalar

575f510

docs: document signed overflow bias trick with derivations

90bb4d9

perf: add SSSE3 encode path, cascade encode avx2 → ssse3 → scalar

d604779

docs: fix attribution comment scope

f0362c2

refactor: pass __m128i by value in check_chunk_sse2

fc7649e

refactor: use check_one_unaligned_chunk for AVX2 remainder

cf2c0ac

clean

3682bd0

refactor: rename encode_bytes{16,32} to encode{16,32}

7a10f5c

refactor: rename encode_sse2 to encode_ssse3

3aec18e

refactor: rename encode chunk fns to encode_chunk_{avx2,ssse3}

57e3236

refactor: reorder check fns grouped by target feature

497f8cc

perf: avoid SSSE3 loop in AVX2 encode remainder

556ea36

unoutline

655c087

perf: skip remainder check for exact chunk multiples

e27abcd

more empty fast paths

cbc85f7

refactor: reorder one_chunk fns below with_ fns, add debug assertions

c8e42cf

refactor: remove unnecessary closure type hints

852c862

DaniPopes merged commit eab3daf into master Mar 2, 2026
20 of 21 checks passed

DaniPopes deleted the dani/optimize-simd branch March 2, 2026 07:02

DaniPopes mentioned this pull request Mar 2, 2026

perf: single-pass AVX2 decode with validation #41

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize SIMD check and encode implementations#39

perf: optimize SIMD check and encode implementations#39
DaniPopes merged 20 commits intomasterfrom
dani/optimize-simd

DaniPopes commented Feb 25, 2026

Uh oh!

codspeed-hq bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DaniPopes commented Feb 25, 2026

Uh oh!

codspeed-hq bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 30.01%

Performance Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codspeed-hq bot commented Feb 25, 2026 •

edited

Loading