-
Notifications
You must be signed in to change notification settings - Fork 25
feat: add support for fluss tables using zstd compression #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
plus implement deserialize for timestamps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds support for Fluss temporal data types (Time, Timestamp, TimestampLTz) with zstd compression by enabling the ipc_compression feature in the Arrow dependency. The implementation includes Arrow type conversions based on precision values and JSON serialization/deserialization for these temporal types.
Key changes:
- Enabled
ipc_compressionfeature for Arrow dependency to support zstd compression - Implemented Arrow type conversions for Time, Timestamp, and TimestampLTz with precision-based TimeUnit selection
- Added JSON serialization/deserialization support for temporal types with precision handling
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| Cargo.toml | Added ipc_compression feature to arrow dependency for zstd compression support |
| crates/fluss/src/record/arrow.rs | Added TimeUnit import, temporal type builders, and to_arrow_type implementations for Time/Timestamp/TimestampLTz |
| crates/fluss/src/metadata/json_serde.rs | Added JSON serialization/deserialization for Time/Timestamp/TimestampLTz types with precision handling |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
crates/fluss/src/record/arrow.rs
Outdated
| 1..=3 => ArrowDataType::Time32(TimeUnit::Millisecond), | ||
| 4..=6 => ArrowDataType::Time64(TimeUnit::Microsecond), | ||
| 7..=9 => ArrowDataType::Time64(TimeUnit::Nanosecond), | ||
| _ => ArrowDataType::Time32(TimeUnit::Second), // fallback |
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback comment is misleading. According to TimeType::MAX_PRECISION (9), precision values greater than 9 should be considered invalid rather than silently falling back to Time32(Second). Consider either adding validation to reject invalid precision values, or updating the comment to clarify this is handling out-of-range precision values (e.g., '// fallback for invalid precision > 9').
| _ => ArrowDataType::Time32(TimeUnit::Second), // fallback | |
| _ => ArrowDataType::Time32(TimeUnit::Second), // fallback for invalid precision > 9 |
crates/fluss/src/record/arrow.rs
Outdated
| 1..=3 => TimeUnit::Millisecond, | ||
| 4..=6 => TimeUnit::Microsecond, | ||
| 7..=9 => TimeUnit::Nanosecond, | ||
| _ => TimeUnit::Microsecond, // default is 6, so microseconds |
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback branches in both Timestamp and TimestampLTz match arms (lines 566 and 577) use Microsecond with the comment 'default is 6, so microseconds'. This duplicates the logic. Consider extracting a helper function like precision_to_time_unit(precision: u32) -> TimeUnit to avoid code duplication and ensure consistency between Timestamp and TimestampLTz conversions.
crates/fluss/src/record/arrow.rs
Outdated
| 7..=9 => TimeUnit::Nanosecond, | ||
| _ => TimeUnit::Microsecond, // default is 6, so microseconds | ||
| }; | ||
| // Use UTC as the timezone for local timezone timestamps |
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment 'Use UTC as the timezone for local timezone timestamps' is confusing. TimestampLTz represents 'TIMESTAMP WITH LOCAL TIME ZONE', which typically means the timestamp should be stored in UTC but interpreted in the local timezone. The implementation is correct, but the comment should clarify that UTC is used for storage while the local timezone is applied during interpretation/display.
| // Use UTC as the timezone for local timezone timestamps | |
| // Store timestamps in UTC; local timezone is applied during interpretation/display |
| obj.insert(Self::FIELD_NAME_PRECISION.to_string(), json!(_type.precision())); | ||
| } | ||
| DataType::Timestamp(_type) => { | ||
| todo!() | ||
| obj.insert(Self::FIELD_NAME_PRECISION.to_string(), json!(_type.precision())); | ||
| } | ||
| DataType::TimestampLTz(_type) => { | ||
| todo!() | ||
| obj.insert(Self::FIELD_NAME_PRECISION.to_string(), json!(_type.precision())); |
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The three consecutive match arms (Time, Timestamp, TimestampLTz) have identical implementations - all insert precision into the JSON object. These can be combined into a single match arm pattern: DataType::Time(_type) | DataType::Timestamp(_type) | DataType::TimestampLTz(_type) => { obj.insert(Self::FIELD_NAME_PRECISION.to_string(), json!(_type.precision())); }
Add ipc_compression feature to arrow dependency