[rust] integration tests for MAP dt + restructure ITs for complex dts#560
Conversation
|
@charlesdong1991 @leekeiabstraction @luoyuxia PTAL 🙏 , I need this as a preparation to refactor complex types to use typed writers, it's mostly tests consolidation and new map tests. Appreciate if you can take a quick look |
| Ok(()) | ||
| } | ||
|
|
||
| // FlussArray carries no schema; nested row/map elements need the typed |
There was a problem hiding this comment.
will be fixed with refactoring, now I need this for test to pass
There was a problem hiding this comment.
Pull request overview
This PR expands and consolidates Rust integration test coverage for complex/compound data types (notably MAP) while restructuring datatype integration tests to reduce repeated table-creation overhead. It also adjusts Arrow→Fluss conversion logic for ARRAY/MAP decoding and improves nested MAP writing support in the column writer.
Changes:
- Consolidate log-table compound datatype integration coverage (ARRAY/ROW/MAP, including nested combinations) and add projection coverage over compound types.
- Expand KV-table integration coverage for compound types and partial updates (ensuring absent columns are preserved), plus add an “all supported datatypes” KV test.
- Update row conversion/writer internals to better handle lossy Arrow type mapping (e.g., TIMESTAMP_LTZ) and nested MAP/ROW elements.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/fluss/tests/integration/log_table.rs | Consolidates/expands log-table integration tests for compound + scalar datatypes, adds compound-type projection coverage, and adds extensive MAP/ARRAY/ROW scenarios. |
| crates/fluss/tests/integration/kv_table.rs | Extends KV integration tests for compound types, partial updates, partitioned lookups, and adds a comprehensive datatype coverage test. |
| crates/fluss/src/row/column.rs | Removes strict Arrow field type equality checks for list/map element types to accommodate lossy Arrow typing; updates related unit test expectations. |
| crates/fluss/src/row/column_writer.rs | Fixes nested MAP element writing by decoding row/map elements from FlussArray using typed accessors before writing. |
| crates/fluss/src/row/binary_map.rs | Extends FlussMapWriter to support writing ROW values into MAPs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@luoyuxia addressed Copilot comments, added couple of helpers to help with boundaries and shared schema between log and kv for "all datatypes" tests, extended kv tests to include more precisions(it was required to unify). |
|
Merging to continue working on typed writers refactoring |
charlesdong1991
left a comment
There was a problem hiding this comment.
very minor comments, no blocker for me
| }); | ||
| } | ||
|
|
||
| // `to_arrow_type` is lossy (e.g. TIMESTAMP_LTZ → plain Arrow Timestamp); |
There was a problem hiding this comment.
so to confirm since not mentioned in PR description: i guess this will be a behaviour, correct? that users won't know schema mismatch when calling get_array on position
There was a problem hiding this comment.
Intentional. The old strict check tripped on lossy Arrow round-trips (e.g. TIMESTAMP_LTZ -> plain Arrow Timestamp) and rejected valid schemas.
Real shape mismatches still error from the per-element conversion below
| ] | ||
| } | ||
|
|
||
| #[derive(Default)] |
There was a problem hiding this comment.
nice idea on ColumnPlan that helps reduce redundancy a lot!
Just a nit: do we need derive[Default] for internal test only feature?
There was a problem hiding this comment.
well, it was easier this way, another option - inline Self { cols: vec![], index: HashMap::new(), sections: vec![] }, but it's all the same. Arguably derive is a bit cleaner overall
|
|
||
| /// KV upsert + lookup against a schema covering every supported data type. | ||
| #[tokio::test] | ||
| async fn all_supported_datatypes() { |
There was a problem hiding this comment.
Very nit: somehow i am a bit conservative to have such huge test, i think now if something breaks, developers will see all unrelated assertions and not sure if there is meaningful tie with the failing column index? We can revisit in future if encounter such issue in development
There was a problem hiding this comment.
Each row's assertions include the column name in their failure messages, so a break should at least point to the offending column.
but we can revisit later if it becomes a concern 👍
closes #549
Added Map integration tests, consolidated dt tests to avoid overhead in table creations and exponential tests per complex dts.