Skip to content

[rust] integration tests for MAP dt + restructure ITs for complex dts#560

Merged
fresh-borzoni merged 2 commits into
apache:mainfrom
fresh-borzoni:feat/map-integration-tests-549
May 25, 2026
Merged

[rust] integration tests for MAP dt + restructure ITs for complex dts#560
fresh-borzoni merged 2 commits into
apache:mainfrom
fresh-borzoni:feat/map-integration-tests-549

Conversation

@fresh-borzoni
Copy link
Copy Markdown
Member

closes #549

Added Map integration tests, consolidated dt tests to avoid overhead in table creations and exponential tests per complex dts.

@fresh-borzoni fresh-borzoni changed the title [rust] integration tests for MAP dt + restructure tests around comple… [rust] integration tests for MAP dt + restructure ITs for complex dts May 23, 2026
@fresh-borzoni
Copy link
Copy Markdown
Member Author

@charlesdong1991 @leekeiabstraction @luoyuxia PTAL 🙏 , I need this as a preparation to refactor complex types to use typed writers, it's mostly tests consolidation and new map tests.

Appreciate if you can take a quick look

Ok(())
}

// FlussArray carries no schema; nested row/map elements need the typed
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be fixed with refactoring, now I need this for test to pass

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands and consolidates Rust integration test coverage for complex/compound data types (notably MAP) while restructuring datatype integration tests to reduce repeated table-creation overhead. It also adjusts Arrow→Fluss conversion logic for ARRAY/MAP decoding and improves nested MAP writing support in the column writer.

Changes:

  • Consolidate log-table compound datatype integration coverage (ARRAY/ROW/MAP, including nested combinations) and add projection coverage over compound types.
  • Expand KV-table integration coverage for compound types and partial updates (ensuring absent columns are preserved), plus add an “all supported datatypes” KV test.
  • Update row conversion/writer internals to better handle lossy Arrow type mapping (e.g., TIMESTAMP_LTZ) and nested MAP/ROW elements.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
crates/fluss/tests/integration/log_table.rs Consolidates/expands log-table integration tests for compound + scalar datatypes, adds compound-type projection coverage, and adds extensive MAP/ARRAY/ROW scenarios.
crates/fluss/tests/integration/kv_table.rs Extends KV integration tests for compound types, partial updates, partitioned lookups, and adds a comprehensive datatype coverage test.
crates/fluss/src/row/column.rs Removes strict Arrow field type equality checks for list/map element types to accommodate lossy Arrow typing; updates related unit test expectations.
crates/fluss/src/row/column_writer.rs Fixes nested MAP element writing by decoding row/map elements from FlussArray using typed accessors before writing.
crates/fluss/src/row/binary_map.rs Extends FlussMapWriter to support writing ROW values into MAPs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/fluss/tests/integration/log_table.rs Outdated
Comment thread crates/fluss/tests/integration/kv_table.rs
@fresh-borzoni
Copy link
Copy Markdown
Member Author

@luoyuxia addressed Copilot comments, added couple of helpers to help with boundaries and shared schema between log and kv for "all datatypes" tests, extended kv tests to include more precisions(it was required to unify).
PTAL 🙏

@fresh-borzoni
Copy link
Copy Markdown
Member Author

Merging to continue working on typed writers refactoring

@fresh-borzoni fresh-borzoni merged commit 840a41a into apache:main May 25, 2026
10 checks passed
Copy link
Copy Markdown
Contributor

@charlesdong1991 charlesdong1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor comments, no blocker for me

});
}

// `to_arrow_type` is lossy (e.g. TIMESTAMP_LTZ → plain Arrow Timestamp);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so to confirm since not mentioned in PR description: i guess this will be a behaviour, correct? that users won't know schema mismatch when calling get_array on position

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional. The old strict check tripped on lossy Arrow round-trips (e.g. TIMESTAMP_LTZ -> plain Arrow Timestamp) and rejected valid schemas.

Real shape mismatches still error from the per-element conversion below

]
}

#[derive(Default)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice idea on ColumnPlan that helps reduce redundancy a lot!

Just a nit: do we need derive[Default] for internal test only feature?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, it was easier this way, another option - inline Self { cols: vec![], index: HashMap::new(), sections: vec![] }, but it's all the same. Arguably derive is a bit cleaner overall


/// KV upsert + lookup against a schema covering every supported data type.
#[tokio::test]
async fn all_supported_datatypes() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nit: somehow i am a bit conservative to have such huge test, i think now if something breaks, developers will see all unrelated assertions and not sure if there is meaningful tie with the failing column index? We can revisit in future if encounter such issue in development

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each row's assertions include the column name in their failure messages, so a break should at least point to the offending column.

but we can revisit later if it becomes a concern 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[rust] Integration tests for map dt

3 participants