[SPARK-57057][SQL] Target a specific branch on SELECT / INSERT via temporal clause and session config#56103
Open
viirya wants to merge 2 commits into
Open
[SPARK-57057][SQL] Target a specific branch on SELECT / INSERT via temporal clause and session config#56103viirya wants to merge 2 commits into
viirya wants to merge 2 commits into
Conversation
bad8a24 to
f5e3d5a
Compare
… DDL
Add a SupportsBranching mix-in interface for DSv2 tables so data sources
can expose table branching through standard Spark SQL DDL:
ALTER TABLE t CREATE [OR REPLACE] BRANCH [IF NOT EXISTS] name
[AS OF VERSION <id>]
ALTER TABLE t DROP BRANCH [IF EXISTS] name
ALTER TABLE t FAST FORWARD branch TO target
SHOW BRANCHES (FROM|IN) t
The interface defines createBranch / dropBranch / fastForward /
listBranches operations and a TableBranch value type. Reads and writes
against a specific branch are not part of this change.
Includes parser grammar, logical plans, analyzer dispatch through
ResolvedTable, exec nodes, an in-memory implementation on InMemoryTable
for testing, and unit + integration tests.
Co-authored-by: Claude Code
…mporal clause and session config
Extend the temporal clause so reads and writes can target a named branch
on a SupportsBranching data source:
SELECT * FROM t FOR BRANCH 'dev'
SELECT * FROM t VERSION AS OF BRANCH 'dev'
INSERT INTO t FOR BRANCH 'dev' SELECT ...
INSERT OVERWRITE t FOR BRANCH 'dev' SELECT ...
Branch is the only temporal variant allowed on writes; VERSION / TIMESTAMP
writes remain rejected (existing constraint, now caught at parse time with
a clear message).
Also add `spark.sql.defaultBranch`, a session config that routes reads
and writes to the given branch when no explicit FOR BRANCH clause is
present. An explicit clause always overrides the config. Tables that do
not implement SupportsBranching ignore the config silently; an explicit
FOR BRANCH on such a table is a hard error.
Implementation:
* SupportsBranching gains loadBranch(name): Table.
* TimeTravelSpec gains AsOfBranch(branch, isExplicit).
* RelationTimeTravel carries an optional branch.
* UnresolvedRelation carries the branch on writes via a reserved
internal option (mirrors the REQUIRED_WRITE_PRIVILEGES pattern), so
the NamedRelation slot in InsertIntoStatement / OverwriteByExpression
remains intact.
* CatalogV2Util.getTable composes loadTable + loadBranch, lifting the
"no time travel on writes" assertion only for the branch case.
* RelationResolution applies the default branch only on the persistent
relation path (temp views unaffected).
* InMemoryTable stores per-branch data in independent InMemoryTable
instances so reads and writes through loadBranch are isolated.
Co-authored-by: Claude Code
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Extend the temporal clause so reads and writes can target a named branch on a
SupportsBranchingdata source:BRANCH is the only temporal variant allowed on writes.
VERSION AS OF <int>andTIMESTAMP AS OF <ts>on writes remain rejected (existing Spark constraint, now caught at parse time with a clearer error).Also add a new session config:
When non-empty, every read and write against a
SupportsBranchingtable is routed to the named branch. Tables that do not implementSupportsBranchingsilently ignore the config. An explicitFOR BRANCHclause always overrides the config.Precedence:
FOR BRANCH/VERSION AS OF BRANCHin the query.spark.sql.defaultBranch.Implementation:
SupportsBranchinggainsloadBranch(name): Table.TimeTravelSpecgainsAsOfBranch(branch, isExplicit).RelationTimeTravelcarries an optionalbranchfield.UnresolvedRelationcarries the branch on writes via a reserved internal option keyBRANCH_AS_OF(mirrors the existingREQUIRED_WRITE_PRIVILEGESpattern). This preserves theNamedRelationslot inInsertIntoStatement/OverwriteByExpressionwithout requiring a structural change to those nodes.CatalogV2Util.getTablecomposesloadTable+loadBranch, lifting the "no time travel on writes" assertion only for the branch case.RelationResolutionapplies the default branch only on the persistent-relation path; temp views are unaffected.InMemoryTable.loadBranchreturns an independentInMemoryTableinstance per branch so reads and writes are isolated end-to-end in tests.Why are the changes needed?
SPARK-57056 lets a data source declare named branches and provides DDL to manage them, but offers no way to actually read from or write to a specific branch. Without this PR, branches are effectively write-only-from-other-systems metadata. This PR closes the loop so a Spark user can:
and switch entire sessions to a branch via a config setting (useful for staging / CI environments).
Does this PR introduce any user-facing change?
Yes:
FOR BRANCH 'name'andVERSION AS OF BRANCH 'name'(and theSYSTEM_VERSIONsynonym).temporalClausebetween the table identifier and the rest of the statement; only the branch variant is allowed (others raise a parse-time error).spark.sql.defaultBranch(default empty string — no change in behavior unless set).Data sources that do not implement
SupportsBranching:spark.sql.defaultBranch.FOR BRANCHclause withAnalysisException.How was this patch tested?
PlanParserSuite: extended theas of syntaxtest with four new cases coveringFOR BRANCH,VERSION AS OF BRANCH,SYSTEM_VERSION AS OF BRANCH, andFOR VERSION AS OF BRANCH.SupportsBranchingSuite: 8 new end-to-end tests covering SELECTFOR BRANCH, INSERTFOR BRANCH, INSERT OVERWRITEFOR BRANCH, the equivalence ofFOR BRANCHandVERSION AS OF BRANCH,spark.sql.defaultBranchprecedence, the silent-ignore behavior on non-branching tables, and the explicit-clause hard-error behavior.DataSourceV2SQLSuiteV1Filtercontinue to pass — no regression in theVERSION AS OF/TIMESTAMP AS OFpaths.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.7)