Skip to content

refactor: cache schema_without_virtual_columns and remove TableSchema::with_virtual_columns#22600

Merged
adriangb merged 1 commit into
apache:mainfrom
mbutrovich:virtual-columns-followup
May 28, 2026
Merged

refactor: cache schema_without_virtual_columns and remove TableSchema::with_virtual_columns#22600
adriangb merged 1 commit into
apache:mainfrom
mbutrovich:virtual-columns-followup

Conversation

@mbutrovich
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Followup to #22026 — addresses two review comments from @adriangb that landed after the PR was already in the merge queue.

Rationale for this change

Two pieces of feedback on the TableSchema API introduced in #22026:

  1. feat: Plumb Parquet virtual columns (row_number) through TableSchema and ParquetOpener #22026 (comment)TableSchema::with_virtual_columns shouldn't exist as a non-deprecated counterpart to the already-deprecated TableSchema::with_table_partition_cols. Callers should use TableSchemaBuilder directly. Every existing call site already does, so the method has no users.
  2. feat: Plumb Parquet virtual columns (row_number) through TableSchema and ParquetOpener #22026 (comment)schema_without_virtual_columns was rebuilding the schema on every call. It can be computed once at construction and returned by reference, matching the convention used by every other accessor on TableSchema (file_schema, table_schema, table_partition_cols, virtual_columns all return &).

What changes are included in this PR?

  • Remove TableSchema::with_virtual_columns. Callers must use TableSchemaBuilder::with_virtual_columns (no in-tree callers needed updating).
  • Cache schema_without_virtual_columns on TableSchema, computed once in TableSchemaBuilder::build. When there are no virtual columns the cached field shares the same Arc as table_schema.
  • Accessor schema_without_virtual_columns(&self) now returns &SchemaRef instead of an owned SchemaRef, matching the rest of the struct's accessors. Updated the one in-tree caller in datasource-parquet/src/source.rs.

Are these changes tested?

Yes — covered by the existing table_schema unit tests and the datasource-parquet test suite, all of which still pass. The change is a refactor with no behavioral difference: the cached schema produced in build() is byte-for-byte identical to what the previous accessor allocated on each call.

Are there any user-facing changes?

Two API changes against the TableSchema surface added in #22026 (which has not been released):

  • TableSchema::with_virtual_columns removed. Use TableSchemaBuilder::with_virtual_columns instead.
  • TableSchema::schema_without_virtual_columns return type changed from SchemaRef to &SchemaRef. Callers that need an owned value should Arc::clone the result.

@mbutrovich mbutrovich requested a review from adriangb May 28, 2026 16:59
@github-actions github-actions Bot added the datasource Changes to the datasource crate label May 28, 2026
@github-actions
Copy link
Copy Markdown

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-datasource v53.1.0 (current)
       Built [  40.309s] (current)
     Parsing datafusion-datasource v53.1.0 (current)
      Parsed [   0.035s] (current)
    Building datafusion-datasource v53.1.0 (baseline)
       Built [  35.680s] (baseline)
     Parsing datafusion-datasource v53.1.0 (baseline)
      Parsed [   0.035s] (baseline)
    Checking datafusion-datasource v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.340s] 222 checks: 221 pass, 1 fail, 0 warn, 30 skip

--- failure inherent_method_missing: pub method removed or renamed ---

Description:
A publicly-visible method or associated fn is no longer available under its prior name. It may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/inherent_method_missing.ron

Failed in:
  TableSchema::with_virtual_columns, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/3cf7effc87b9920734a7bd585f7256ccb157c4b2/datafusion/datasource/src/table_schema.rs:184
  TableSchema::with_virtual_columns, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/3cf7effc87b9920734a7bd585f7256ccb157c4b2/datafusion/datasource/src/table_schema.rs:184

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  78.701s] datafusion-datasource
    Building datafusion-datasource-parquet v53.1.0 (current)
       Built [  41.557s] (current)
     Parsing datafusion-datasource-parquet v53.1.0 (current)
      Parsed [   0.030s] (current)
    Building datafusion-datasource-parquet v53.1.0 (baseline)
       Built [  41.340s] (baseline)
     Parsing datafusion-datasource-parquet v53.1.0 (baseline)
      Parsed [   0.031s] (baseline)
    Checking datafusion-datasource-parquet v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.193s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  84.655s] datafusion-datasource-parquet

@github-actions github-actions Bot added the auto detected api change Auto detected API change label May 28, 2026
Copy link
Copy Markdown
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@adriangb adriangb enabled auto-merge May 28, 2026 17:44
@adriangb adriangb added this pull request to the merge queue May 28, 2026
Merged via the queue into apache:main with commit 69786d8 May 28, 2026
35 checks passed
@mbutrovich mbutrovich deleted the virtual-columns-followup branch May 28, 2026 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants