Skip to content

Conversation

@adriangb
Copy link
Contributor

Hoping this helps with #14993

@github-actions github-actions bot added common Related to common crate proto Related to proto crate datasource Changes to the datasource crate labels Oct 20, 2025
@adriangb adriangb requested a review from Copilot October 20, 2025 15:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a TableSchema helper struct to better encapsulate file schema and partition field information within FileScanConfig. The refactoring consolidates the previously separate file_schema and table_partition_cols fields into a single table_schema field of type TableSchema, which manages both components and provides accessor methods.

Key changes:

  • Introduced TableSchema struct in datafusion/common/src/dfschema.rs to encapsulate file schema, partition columns, and the combined table schema
  • Refactored FileScanConfig to use TableSchema instead of separate file_schema and table_partition_cols fields
  • Updated all references throughout the codebase to use accessor methods (file_schema(), table_partition_cols())

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
datafusion/common/src/dfschema.rs Adds new TableSchema struct with file schema, partition columns, and combined table schema
datafusion/common/src/lib.rs Exports TableSchema from the dfschema module
datafusion/datasource/src/file_scan_config.rs Refactors FileScanConfig to use TableSchema and adds accessor methods
datafusion/datasource/src/file_stream.rs Updates field access to use table_partition_cols() method
datafusion/datasource-parquet/src/source.rs Updates field access to use file_schema() and table_partition_cols() methods
datafusion/proto/src/physical_plan/to_proto.rs Updates field access to use accessor methods
datafusion-testing Updates subproject commit reference
Comments suppressed due to low confidence (1)

datafusion/datasource/src/file_scan_config.rs:1

  • [nitpick] The new TableSchema struct and its purpose should be documented in the module-level documentation to help users understand when and how to use it versus directly working with schemas.
// Licensed to the Apache Software Foundation (ASF) under one

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@github-actions github-actions bot added core Core DataFusion crate substrait Changes to the substrait crate labels Oct 20, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to update the pin in datafusion-testing? I suspect this was not intended

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adriangb -- I think this is a significant improvement 🙏

I think we need to fix the datafusion-testing pin before merging this PR, otherwise the rest of the comments aren't necessary in my mind

I do think moving this structure into datafusion-datasource would align it better with its use

///
/// This struct also holds a full table schema to be able to cheaply hand out
/// references to any one of the representations without needing to reconstruct them.
#[derive(Debug, Clone)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good structure

It seems pretty specific to the DataSource code, so I suggest putting it in the datasource module. Maybe alongside FileScanConfig in datafusion/datasource/src/file_scan_config.rs or its own module datafusion/datasource/src/table_schema.rs

adriangb and others added 5 commits October 21, 2025 09:49
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@github-actions github-actions bot removed the common Related to common crate label Oct 21, 2025
@adriangb adriangb added this pull request to the merge queue Oct 21, 2025
Merged via the queue into apache:main with commit 8d54e7b Oct 21, 2025
28 checks passed
@adriangb adriangb deleted the table-schema branch October 21, 2025 16:28
adriangb added a commit to pydantic/datafusion that referenced this pull request Oct 27, 2025
…pache#18178)

Hoping this helps with apache#14993

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
tobixdev pushed a commit to tobixdev/datafusion that referenced this pull request Nov 2, 2025
…pache#18178)

Hoping this helps with apache#14993

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate datasource Changes to the datasource crate proto Related to proto crate substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants