-
Notifications
You must be signed in to change notification settings - Fork 13
Support ColumnPruning #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@luoyuxia PTAL if u have time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds column projection (pruning) support to the Fluss Rust client, enabling users to fetch only specific columns from tables to reduce network transfer costs. The implementation supports both server-side projection pushdown and client-side projection, with the ability to project by column indices or names.
Key Changes:
- Added projection support to the scanner API with
project()andproject_by_name()methods - Implemented custom Arrow IPC message parsing to support projection at the record batch level
- Enhanced
ReadContextto handle different projection modes (pushdown vs. client-side)
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
crates/fluss/src/record/arrow.rs |
Added parse_ipc_message() method for custom IPC parsing, modified records() to support projection, and extended ReadContext with projection capabilities |
crates/fluss/src/client/table/scanner.rs |
Added project() and project_by_name() methods to TableScan, threaded projection fields through LogScanner and LogFetcher, and updated fetch request generation to include projection information |
crates/examples/src/example_projection.rs |
New example demonstrating column projection usage with both full scan and projected scan scenarios |
crates/examples/src/example_table.rs |
Updated bootstrap server address for consistency |
crates/examples/Cargo.toml |
Added env_logger dependency and registered the new projection example |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let (projection_enabled, projected_fields) = if let Some(fields) = &self.projected_fields { | ||
| if fields.is_empty() { | ||
| (false, vec![]) | ||
| } else { |
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting usize to i32 using as i32 could lead to data loss or incorrect values on 64-bit systems where usize can exceed i32::MAX. This could cause issues when projecting columns with very high indices. Consider either:
- Using a bounded type for column indices (e.g., validating they're within i32 range)
- Changing the protobuf field type to support larger indices
- Adding an explicit check and error if the index exceeds i32::MAX
| } else { | |
| } else { | |
| // Check for out-of-range indices before converting | |
| if let Some(&idx) = fields.iter().find(|&&i| i > i32::MAX as usize) { | |
| // Return early with error if any index is too large | |
| return HashMap::new(); // Or, if possible, return Err(Error::msg(...)) | |
| } |
crates/fluss/src/record/arrow.rs
Outdated
| data: &'a [u8], | ||
| ) -> Option<(arrow::ipc::RecordBatch<'a>, Buffer, arrow::ipc::MetadataVersion)> { | ||
| const CONTINUATION_MARKER: u32 = 0xFFFFFFFF; | ||
|
|
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Trailing whitespace detected. Please remove it to maintain code cleanliness.
crates/fluss/src/record/arrow.rs
Outdated
| if continuation != CONTINUATION_MARKER { | ||
| return None; | ||
| } | ||
|
|
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Trailing whitespace detected. Please remove it to maintain code cleanliness.
| let projected_schema = arrow_schema::Schema::new( | ||
| projected_fields | ||
| .iter() | ||
| .map(|&idx| full_arrow_schema.field(idx).clone()) | ||
| .collect::<Vec<_>>(), | ||
| ); |
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the bounds checking issue in ReadContext::to_arrow_metadata, there's no validation when creating the projected schema here. If projected_fields contains an invalid index, calling full_arrow_schema.field(idx) will panic. While TableScan::project validates indices, this code path could still be reached with invalid indices if the projection is set up through other means.
| let projected_schema = arrow_schema::Schema::new( | ||
| projected_fields | ||
| .iter() | ||
| .map(|&idx| full_arrow_schema.field(idx).clone()) | ||
| .collect::<Vec<_>>(), | ||
| ); |
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code for building a projected schema is duplicated from the logic in ReadContext::to_arrow_metadata (lines 652-658 in arrow.rs). This duplication creates a maintenance burden and increases the risk of inconsistencies. Consider extracting this logic into a shared helper method.
| let projected_schema = arrow_schema::Schema::new( | |
| projected_fields | |
| .iter() | |
| .map(|&idx| full_arrow_schema.field(idx).clone()) | |
| .collect::<Vec<_>>(), | |
| ); | |
| let projected_schema = build_projected_schema(&full_arrow_schema, &projected_fields); |
crates/fluss/src/record/arrow.rs
Outdated
| if data.len() < 8 { | ||
| return None; | ||
| } | ||
|
|
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Trailing whitespace detected. Please remove it to maintain code cleanliness.
| let (projection_enabled, _) = if !projected_fields.is_empty() { | ||
| (true, projected_fields.clone()) | ||
| } else { | ||
| (false, vec![]) | ||
| }; |
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The check for empty projected fields (line 190) and subsequent handling creates unnecessary complexity. If projected_fields is Some(vec![]) (an empty vector), it should probably be treated the same as None - i.e., fetch all fields. The current logic sets projection_enabled = false for empty vectors, but it's unclear why an empty projection list would ever be intentionally created. Consider either preventing empty vectors from being set in the first place (e.g., in the project method) or simplifying this logic.
luoyuxia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhaohaidao Thanks for the pr. Left minor comments. PTAL.
Also, seem this pr doesn't include reordeing the project fields returned by server for final output. See java impl https://github.com/apache/fluss/blob/ae84521aaaef5448a0bc5a63fc83e6ca536ca452/fluss-common/src/main/java/org/apache/fluss/record/LogRecordReadContext.java#L84
don't forget to create an issue to track it. It's critical when project fields is out of orders like [3, 2, 1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub async fn main() -> Result<()> { | ||
| let mut config = Config::parse(); | ||
| config.bootstrap_server = Some("127.0.0.1:56405".to_string()); | ||
| config.bootstrap_server = Some("127.0.0.1:9123".to_string()); |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded bootstrap server changed from port 56405 to 9123. This change appears to be unrelated to the column pruning feature and may have been committed accidentally. If this is an intentional configuration change for development/testing, it should be explained in the PR description or reverted to avoid affecting other developers' local environments.
| config.bootstrap_server = Some("127.0.0.1:9123".to_string()); | |
| config.bootstrap_server = Some("127.0.0.1:56405".to_string()); |
crates/fluss/src/record/arrow.rs
Outdated
| return None; | ||
| } | ||
|
|
||
| // Calculate reordering indexes to transform from sorted order to user-requested order |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .expect() call could panic if there's a mismatch between projected_fields and projection_in_order. While this should never happen with the current code logic (since both are derived from the same source in create_read_context), this represents a potentially unrecoverable error condition. Consider:
- Adding debug assertions to validate invariants in
with_projection_pushdown():
debug_assert_eq!(projected_fields.len(), projection_in_order.len());
debug_assert!(projected_fields.iter().all(|&f| projection_in_order.contains(&f)));- Or, returning a
Resultfromreordering_indexes()instead of using.expect()to make error handling more explicit and recoverable.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@luoyuxia Thank you for your concise and elegant suggestions. Comments are addressed. PTAL if u have time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@zhaohaidao Hi, I append a minior commit to improve reordering. PTAL. |
@luoyuxia Thanks, LGTM |
|
@zhaohaidao Thanks for updating. Left few comments. Don't forget rebase main branch to resovle conflicts. |
Thanks for reminding. The comments are addressed. PTAL if u have time |
luoyuxia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Purpose
Support ColumnPruning to save significant network costs
Brief change log
Tests
API and Format
Documentation