Skip to content

[lance] Support timestamp_ltz by normalizing timezone strings for Arrow-Rust compatibility#7703

Closed
0dunay0 wants to merge 1 commit into
apache:masterfrom
0dunay0:fix/lance-timestamp-ltz-timezone
Closed

[lance] Support timestamp_ltz by normalizing timezone strings for Arrow-Rust compatibility#7703
0dunay0 wants to merge 1 commit into
apache:masterfrom
0dunay0:fix/lance-timestamp-ltz-timezone

Conversation

@0dunay0
Copy link
Copy Markdown
Contributor

@0dunay0 0dunay0 commented Apr 26, 2026

Summary

Fixes #6648.

When using Lance format with TIMESTAMP WITH LOCAL TIME ZONE, Lance's Rust-side schema parser fails on timezone strings like GMT-10:00 with Unsupported timestamp type: timestamp:us:GMT-10:00. This happens because Arrow-Rust only accepts IANA timezone names (e.g. America/New_York) or standard UTC offset format (e.g. +05:30, -10:00, Z), but Java's ZoneId.systemDefault().toString() produces the GMT-10:00 prefix form.

The fix uses ZoneId.normalized() before converting to string. This converts GMT-10:00 to -10:00, GMT+05:30 to +05:30, UTC to Z, and leaves IANA names like America/New_York unchanged. All of these are formats that Arrow-Rust accepts.

Changes:

  • In ArrowFieldTypeConversion, call .normalized() on the ZoneId before .toString() when creating Arrow Timestamp types
  • In LanceFileFormat, remove the UnsupportedOperationException that blocked LOCAL_ZONED_TIMESTAMP entirely
  • Add timestamp_ltz coverage to LanceFileFormatTest (validation) and LanceFileFormatReadWriteTest (round-trip read/write)

@0dunay0
Copy link
Copy Markdown
Contributor Author

0dunay0 commented Apr 27, 2026

Closing this PR. While investigating, I found that the root cause of #6648 goes deeper than the timezone string format.

Lance v0.39.0 has a bug in its type serialization. It serializes timestamp types as colon-delimited strings like timestamp:us:+05:00, then deserializes by splitting on :. Any timezone containing a colon (which includes all UTC offset formats like +05:00, -10:00, +05:30) produces too many segments and causes a Rust panic/SIGABRT. This bug is still present on Lance's main branch.

IANA timezone names like America/New_York work fine since they don't contain colons, but there's no colon-free format that covers all timezone offsets. So timestamp_ltz can't be fully supported in Lance until this is fixed upstream.

I'll open a separate PR for the ArrowFieldTypeConversion.normalized() improvement, which benefits other Arrow consumers but is unrelated to #6648.

@0dunay0 0dunay0 closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Paimon lance format should support timestamp_ltz

1 participant