Skip to content

Conversation

@bhanreddy1973
Copy link

Which issue does this PR close?

Closes #15986

Rationale for this change

The current hex function implementation uses write!(&mut s, "{b:02x}") for each byte, which has significant overhead from format string parsing.

What changes are included in this PR?

Replaced the slow format-based hex encoding with pre-computed lookup tables (HEX_UPPER/HEX_LOWER). Each byte value maps directly to its two-character hex representation via simple array indexing.

Benefits:

  • Eliminates format string parsing overhead
  • Uses a single pre-sized allocation
  • Simple array indexing instead of function calls per byte

How are these changes tested?

All existing tests pass:

  • test_hex_int64
  • test_spark_hex_int64
  • test_dictionary_hex_utf8
  • test_dictionary_hex_int64
  • test_dictionary_hex_binary

Are these changes safe?

Yes - the lookup tables contain only valid ASCII hex characters, so the conversion is safe. The behavior is identical to before, just faster.

Replace slow write!() format-based hex encoding with a
pre-computed lookup table for significant performance improvement.

The previous implementation used write!(&mut s, "{b:02x}") for each
byte, which has format string parsing overhead. The new implementation
uses const lookup tables (HEX_UPPER/HEX_LOWER) that map each byte
value directly to its two-character hex representation.

Closes apache#15986
@github-actions github-actions bot added the spark label Dec 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[datafusion-spark] Optimize hex function

2 participants