Optimize SparkHex function using lookup table #19585

bhanreddy1973 · 2025-12-31T16:41:43Z

Which issue does this PR close?

Rationale for this change

The current hex function implementation uses write!(&mut s, "{b:02x}") for each byte, which has significant overhead from format string parsing.

What changes are included in this PR?

Replaced the slow format-based hex encoding with pre-computed lookup tables (HEX_UPPER/HEX_LOWER). Each byte value maps directly to its two-character hex representation via simple array indexing.

Benefits:

Eliminates format string parsing overhead
Uses a single pre-sized allocation
Simple array indexing instead of function calls per byte

How are these changes tested?

All existing tests pass:

test_hex_int64
test_spark_hex_int64
test_dictionary_hex_utf8
test_dictionary_hex_int64
test_dictionary_hex_binary

Are these changes safe?

Yes - the lookup tables contain only valid ASCII hex characters, so the conversion is safe. The behavior is identical to before, just faster.

Replace slow write!() format-based hex encoding with a pre-computed lookup table for significant performance improvement. The previous implementation used write!(&mut s, "{b:02x}") for each byte, which has format string parsing overhead. The new implementation uses const lookup tables (HEX_UPPER/HEX_LOWER) that map each byte value directly to its two-character hex representation. Closes apache#15986

github-actions bot added the spark label Dec 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize SparkHex function using lookup table #19585

Optimize SparkHex function using lookup table #19585

bhanreddy1973 commented Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Optimize SparkHex function using lookup table #19585

Are you sure you want to change the base?

Optimize SparkHex function using lookup table #19585

Conversation

bhanreddy1973 commented Dec 31, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Are these changes safe?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants