[SPARK-57057][SQL] Target a specific branch on SELECT / INSERT via temporal clause and session config by viirya · Pull Request #56103 · apache/spark

viirya · 2026-05-25T17:59:21Z

What changes were proposed in this pull request?

Builds on #56102 (SPARK-57056). This PR's diff includes the SPARK-57056 commit. After SPARK-57056 merges, this PR will be rebased and shrink to its own commit.

Extend the temporal clause so reads and writes can target a named branch on a SupportsBranching data source:

SELECT * FROM t FOR BRANCH 'dev'
SELECT * FROM t VERSION AS OF BRANCH 'dev'
SELECT * FROM t SYSTEM_VERSION AS OF BRANCH 'dev'

INSERT INTO t FOR BRANCH 'dev' SELECT ...
INSERT OVERWRITE t FOR BRANCH 'dev' SELECT ...
INSERT INTO t FOR BRANCH 'dev' tableAlias REPLACE WHERE | ON ...
INSERT INTO t FOR BRANCH 'dev' tableAlias REPLACE USING ...

BRANCH is the only temporal variant allowed on writes. VERSION AS OF <int> and TIMESTAMP AS OF <ts> on writes remain rejected (existing Spark constraint, now caught at parse time with a clearer error).

Also add a new session config:

spark.sql.defaultBranch

When non-empty, every read and write against a SupportsBranching table is routed to the named branch. Tables that do not implement SupportsBranching silently ignore the config. An explicit FOR BRANCH clause always overrides the config.

Precedence:

Explicit FOR BRANCH / VERSION AS OF BRANCH in the query.
spark.sql.defaultBranch.
Today's behavior (no branch targeting).

Implementation:

SupportsBranching gains loadBranch(name): Table.
TimeTravelSpec gains AsOfBranch(branch, isExplicit). RelationTimeTravel carries an optional branch field.
UnresolvedRelation carries the branch on writes via a reserved internal option key BRANCH_AS_OF (mirrors the existing REQUIRED_WRITE_PRIVILEGES pattern). This preserves the NamedRelation slot in InsertIntoStatement / OverwriteByExpression without requiring a structural change to those nodes.
CatalogV2Util.getTable composes loadTable + loadBranch, lifting the "no time travel on writes" assertion only for the branch case.
RelationResolution applies the default branch only on the persistent-relation path; temp views are unaffected.
InMemoryTable.loadBranch returns an independent InMemoryTable instance per branch so reads and writes are isolated end-to-end in tests.

Why are the changes needed?

SPARK-57056 lets a data source declare named branches and provides DDL to manage them, but offers no way to actually read from or write to a specific branch. Without this PR, branches are effectively write-only-from-other-systems metadata. This PR closes the loop so a Spark user can:

INSERT INTO sales FOR BRANCH 'experimental' SELECT ...;
SELECT total FROM sales FOR BRANCH 'experimental';

and switch entire sessions to a branch via a config setting (useful for staging / CI environments).

Does this PR introduce any user-facing change?

Yes:

New temporal-clause variants: FOR BRANCH 'name' and VERSION AS OF BRANCH 'name' (and the SYSTEM_VERSION synonym).
INSERT statements accept an optional temporalClause between the table identifier and the rest of the statement; only the branch variant is allowed (others raise a parse-time error).
New session config spark.sql.defaultBranch (default empty string — no change in behavior unless set).

Data sources that do not implement SupportsBranching:

Silently ignore spark.sql.defaultBranch.
Reject an explicit FOR BRANCH clause with AnalysisException.

How was this patch tested?

PlanParserSuite: extended the as of syntax test with four new cases covering FOR BRANCH, VERSION AS OF BRANCH, SYSTEM_VERSION AS OF BRANCH, and FOR VERSION AS OF BRANCH.
SupportsBranchingSuite: 8 new end-to-end tests covering SELECT FOR BRANCH, INSERT FOR BRANCH, INSERT OVERWRITE FOR BRANCH, the equivalence of FOR BRANCH and VERSION AS OF BRANCH, spark.sql.defaultBranch precedence, the silent-ignore behavior on non-branching tables, and the explicit-clause hard-error behavior.
Pre-existing time-travel tests in DataSourceV2SQLSuiteV1Filter continue to pass — no regression in the VERSION AS OF / TIMESTAMP AS OF paths.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.7)

… DDL Add a SupportsBranching mix-in interface for DSv2 tables so data sources can expose table branching through standard Spark SQL DDL: ALTER TABLE t CREATE [OR REPLACE] BRANCH [IF NOT EXISTS] name [AS OF VERSION <id>] ALTER TABLE t DROP BRANCH [IF EXISTS] name ALTER TABLE t FAST FORWARD branch TO target SHOW BRANCHES (FROM|IN) t The interface defines createBranch / dropBranch / fastForward / listBranches operations and a TableBranch value type. Reads and writes against a specific branch are not part of this change. Includes parser grammar, logical plans, analyzer dispatch through ResolvedTable, exec nodes, an in-memory implementation on InMemoryTable for testing, and unit + integration tests. Co-authored-by: Claude Code

…mporal clause and session config Extend the temporal clause so reads and writes can target a named branch on a SupportsBranching data source: SELECT * FROM t FOR BRANCH 'dev' SELECT * FROM t VERSION AS OF BRANCH 'dev' INSERT INTO t FOR BRANCH 'dev' SELECT ... INSERT OVERWRITE t FOR BRANCH 'dev' SELECT ... Branch is the only temporal variant allowed on writes; VERSION / TIMESTAMP writes remain rejected (existing constraint, now caught at parse time with a clear message). Also add `spark.sql.defaultBranch`, a session config that routes reads and writes to the given branch when no explicit FOR BRANCH clause is present. An explicit clause always overrides the config. Tables that do not implement SupportsBranching ignore the config silently; an explicit FOR BRANCH on such a table is a hard error. Implementation: * SupportsBranching gains loadBranch(name): Table. * TimeTravelSpec gains AsOfBranch(branch, isExplicit). * RelationTimeTravel carries an optional branch. * UnresolvedRelation carries the branch on writes via a reserved internal option (mirrors the REQUIRED_WRITE_PRIVILEGES pattern), so the NamedRelation slot in InsertIntoStatement / OverwriteByExpression remains intact. * CatalogV2Util.getTable composes loadTable + loadBranch, lifting the "no time travel on writes" assertion only for the branch case. * RelationResolution applies the default branch only on the persistent relation path (temp views unaffected). * InMemoryTable stores per-branch data in independent InMemoryTable instances so reads and writes through loadBranch are isolated. Co-authored-by: Claude Code

viirya force-pushed the SPARK-57057 branch 3 times, most recently from bad8a24 to f5e3d5a Compare May 26, 2026 03:50

viirya added 2 commits May 25, 2026 21:28

viirya force-pushed the SPARK-57057 branch from f5e3d5a to 1d87227 Compare May 26, 2026 04:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57057][SQL] Target a specific branch on SELECT / INSERT via temporal clause and session config#56103

[SPARK-57057][SQL] Target a specific branch on SELECT / INSERT via temporal clause and session config#56103
viirya wants to merge 2 commits into
apache:masterfrom
viirya:SPARK-57057

viirya commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

viirya commented May 25, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant