-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add TableSchema helper to encapsulate file schema + partition fields #18178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a TableSchema helper struct to better encapsulate file schema and partition field information within FileScanConfig. The refactoring consolidates the previously separate file_schema and table_partition_cols fields into a single table_schema field of type TableSchema, which manages both components and provides accessor methods.
Key changes:
- Introduced
TableSchemastruct indatafusion/common/src/dfschema.rsto encapsulate file schema, partition columns, and the combined table schema - Refactored
FileScanConfigto useTableSchemainstead of separatefile_schemaandtable_partition_colsfields - Updated all references throughout the codebase to use accessor methods (
file_schema(),table_partition_cols())
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| datafusion/common/src/dfschema.rs | Adds new TableSchema struct with file schema, partition columns, and combined table schema |
| datafusion/common/src/lib.rs | Exports TableSchema from the dfschema module |
| datafusion/datasource/src/file_scan_config.rs | Refactors FileScanConfig to use TableSchema and adds accessor methods |
| datafusion/datasource/src/file_stream.rs | Updates field access to use table_partition_cols() method |
| datafusion/datasource-parquet/src/source.rs | Updates field access to use file_schema() and table_partition_cols() methods |
| datafusion/proto/src/physical_plan/to_proto.rs | Updates field access to use accessor methods |
| datafusion-testing | Updates subproject commit reference |
Comments suppressed due to low confidence (1)
datafusion/datasource/src/file_scan_config.rs:1
- [nitpick] The new
TableSchemastruct and its purpose should be documented in the module-level documentation to help users understand when and how to use it versus directly working with schemas.
// Licensed to the Apache Software Foundation (ASF) under one
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
0bce854 to
467fad2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to update the pin in datafusion-testing? I suspect this was not intended
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adriangb -- I think this is a significant improvement 🙏
I think we need to fix the datafusion-testing pin before merging this PR, otherwise the rest of the comments aren't necessary in my mind
I do think moving this structure into datafusion-datasource would align it better with its use
datafusion/common/src/dfschema.rs
Outdated
| /// | ||
| /// This struct also holds a full table schema to be able to cheaply hand out | ||
| /// references to any one of the representations without needing to reconstruct them. | ||
| #[derive(Debug, Clone)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good structure
It seems pretty specific to the DataSource code, so I suggest putting it in the datasource module. Maybe alongside FileScanConfig in datafusion/datasource/src/file_scan_config.rs or its own module datafusion/datasource/src/table_schema.rs
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…pache#18178) Hoping this helps with apache#14993 --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…pache#18178) Hoping this helps with apache#14993 --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Hoping this helps with #14993