Skip to content

API, Spark: Add direct UUID string-to-ByteBuffer conversion #16003

@abhishek593

Description

@abhishek593

Feature Request / Improvement

Resolve the TODO in SparkValueWriters.UUIDWriter by adding a direct UUID string-to-ByteBuffer conversion path that avoids intermediate object allocations.

Problem

The current Spark UUID write path creates two unnecessary intermediate objects per row:

  1. s.toString() - allocates a String and decodes UTF-8
  2. UUID.fromString() - heavy parsing (regex, splits, Long.parseLong) to create a UUID that is immediately destructured back into two longs for the ByteBuffer.

Query engine

None

Willingness to contribute

  • I can contribute this improvement/feature independently
  • I would be willing to contribute this improvement/feature with guidance from the Iceberg community
  • I cannot contribute this improvement/feature at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    improvementPR that improves existing functionality

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions