refactor: cache schema_without_virtual_columns and remove TableSchema::with_virtual_columns#22600
Merged
Merged
Conversation
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Followup to #22026 — addresses two review comments from @adriangb that landed after the PR was already in the merge queue.
Rationale for this change
Two pieces of feedback on the
TableSchemaAPI introduced in #22026:TableSchema::with_virtual_columnsshouldn't exist as a non-deprecated counterpart to the already-deprecatedTableSchema::with_table_partition_cols. Callers should useTableSchemaBuilderdirectly. Every existing call site already does, so the method has no users.schema_without_virtual_columnswas rebuilding the schema on every call. It can be computed once at construction and returned by reference, matching the convention used by every other accessor onTableSchema(file_schema,table_schema,table_partition_cols,virtual_columnsall return&).What changes are included in this PR?
TableSchema::with_virtual_columns. Callers must useTableSchemaBuilder::with_virtual_columns(no in-tree callers needed updating).schema_without_virtual_columnsonTableSchema, computed once inTableSchemaBuilder::build. When there are no virtual columns the cached field shares the sameArcastable_schema.schema_without_virtual_columns(&self)now returns&SchemaRefinstead of an ownedSchemaRef, matching the rest of the struct's accessors. Updated the one in-tree caller indatasource-parquet/src/source.rs.Are these changes tested?
Yes — covered by the existing
table_schemaunit tests and thedatasource-parquettest suite, all of which still pass. The change is a refactor with no behavioral difference: the cached schema produced inbuild()is byte-for-byte identical to what the previous accessor allocated on each call.Are there any user-facing changes?
Two API changes against the
TableSchemasurface added in #22026 (which has not been released):TableSchema::with_virtual_columnsremoved. UseTableSchemaBuilder::with_virtual_columnsinstead.TableSchema::schema_without_virtual_columnsreturn type changed fromSchemaRefto&SchemaRef. Callers that need an owned value shouldArc::clonethe result.