Skip to content

[SPARK-57057][SQL] Target a specific branch on SELECT / INSERT via temporal clause and session config#56103

Open
viirya wants to merge 2 commits into
apache:masterfrom
viirya:SPARK-57057
Open

[SPARK-57057][SQL] Target a specific branch on SELECT / INSERT via temporal clause and session config#56103
viirya wants to merge 2 commits into
apache:masterfrom
viirya:SPARK-57057

Conversation

@viirya
Copy link
Copy Markdown
Member

@viirya viirya commented May 25, 2026

What changes were proposed in this pull request?

Builds on #56102 (SPARK-57056). This PR's diff includes the SPARK-57056 commit. After SPARK-57056 merges, this PR will be rebased and shrink to its own commit.

Extend the temporal clause so reads and writes can target a named branch on a SupportsBranching data source:

SELECT * FROM t FOR BRANCH 'dev'
SELECT * FROM t VERSION AS OF BRANCH 'dev'
SELECT * FROM t SYSTEM_VERSION AS OF BRANCH 'dev'

INSERT INTO t FOR BRANCH 'dev' SELECT ...
INSERT OVERWRITE t FOR BRANCH 'dev' SELECT ...
INSERT INTO t FOR BRANCH 'dev' tableAlias REPLACE WHERE | ON ...
INSERT INTO t FOR BRANCH 'dev' tableAlias REPLACE USING ...

BRANCH is the only temporal variant allowed on writes. VERSION AS OF <int> and TIMESTAMP AS OF <ts> on writes remain rejected (existing Spark constraint, now caught at parse time with a clearer error).

Also add a new session config:

spark.sql.defaultBranch

When non-empty, every read and write against a SupportsBranching table is routed to the named branch. Tables that do not implement SupportsBranching silently ignore the config. An explicit FOR BRANCH clause always overrides the config.

Precedence:

  1. Explicit FOR BRANCH / VERSION AS OF BRANCH in the query.
  2. spark.sql.defaultBranch.
  3. Today's behavior (no branch targeting).

Implementation:

  • SupportsBranching gains loadBranch(name): Table.
  • TimeTravelSpec gains AsOfBranch(branch, isExplicit). RelationTimeTravel carries an optional branch field.
  • UnresolvedRelation carries the branch on writes via a reserved internal option key BRANCH_AS_OF (mirrors the existing REQUIRED_WRITE_PRIVILEGES pattern). This preserves the NamedRelation slot in InsertIntoStatement / OverwriteByExpression without requiring a structural change to those nodes.
  • CatalogV2Util.getTable composes loadTable + loadBranch, lifting the "no time travel on writes" assertion only for the branch case.
  • RelationResolution applies the default branch only on the persistent-relation path; temp views are unaffected.
  • InMemoryTable.loadBranch returns an independent InMemoryTable instance per branch so reads and writes are isolated end-to-end in tests.

Why are the changes needed?

SPARK-57056 lets a data source declare named branches and provides DDL to manage them, but offers no way to actually read from or write to a specific branch. Without this PR, branches are effectively write-only-from-other-systems metadata. This PR closes the loop so a Spark user can:

INSERT INTO sales FOR BRANCH 'experimental' SELECT ...;
SELECT total FROM sales FOR BRANCH 'experimental';

and switch entire sessions to a branch via a config setting (useful for staging / CI environments).

Does this PR introduce any user-facing change?

Yes:

  • New temporal-clause variants: FOR BRANCH 'name' and VERSION AS OF BRANCH 'name' (and the SYSTEM_VERSION synonym).
  • INSERT statements accept an optional temporalClause between the table identifier and the rest of the statement; only the branch variant is allowed (others raise a parse-time error).
  • New session config spark.sql.defaultBranch (default empty string — no change in behavior unless set).

Data sources that do not implement SupportsBranching:

  • Silently ignore spark.sql.defaultBranch.
  • Reject an explicit FOR BRANCH clause with AnalysisException.

How was this patch tested?

  • PlanParserSuite: extended the as of syntax test with four new cases covering FOR BRANCH, VERSION AS OF BRANCH, SYSTEM_VERSION AS OF BRANCH, and FOR VERSION AS OF BRANCH.
  • SupportsBranchingSuite: 8 new end-to-end tests covering SELECT FOR BRANCH, INSERT FOR BRANCH, INSERT OVERWRITE FOR BRANCH, the equivalence of FOR BRANCH and VERSION AS OF BRANCH, spark.sql.defaultBranch precedence, the silent-ignore behavior on non-branching tables, and the explicit-clause hard-error behavior.
  • Pre-existing time-travel tests in DataSourceV2SQLSuiteV1Filter continue to pass — no regression in the VERSION AS OF / TIMESTAMP AS OF paths.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.7)

@viirya viirya force-pushed the SPARK-57057 branch 3 times, most recently from bad8a24 to f5e3d5a Compare May 26, 2026 03:50
viirya added 2 commits May 25, 2026 21:28
… DDL

Add a SupportsBranching mix-in interface for DSv2 tables so data sources
can expose table branching through standard Spark SQL DDL:

  ALTER TABLE t CREATE [OR REPLACE] BRANCH [IF NOT EXISTS] name
      [AS OF VERSION <id>]
  ALTER TABLE t DROP BRANCH [IF EXISTS] name
  ALTER TABLE t FAST FORWARD branch TO target
  SHOW BRANCHES (FROM|IN) t

The interface defines createBranch / dropBranch / fastForward /
listBranches operations and a TableBranch value type. Reads and writes
against a specific branch are not part of this change.

Includes parser grammar, logical plans, analyzer dispatch through
ResolvedTable, exec nodes, an in-memory implementation on InMemoryTable
for testing, and unit + integration tests.

Co-authored-by: Claude Code
…mporal clause and session config

Extend the temporal clause so reads and writes can target a named branch
on a SupportsBranching data source:

  SELECT * FROM t FOR BRANCH 'dev'
  SELECT * FROM t VERSION AS OF BRANCH 'dev'
  INSERT INTO t FOR BRANCH 'dev' SELECT ...
  INSERT OVERWRITE t FOR BRANCH 'dev' SELECT ...

Branch is the only temporal variant allowed on writes; VERSION / TIMESTAMP
writes remain rejected (existing constraint, now caught at parse time with
a clear message).

Also add `spark.sql.defaultBranch`, a session config that routes reads
and writes to the given branch when no explicit FOR BRANCH clause is
present. An explicit clause always overrides the config. Tables that do
not implement SupportsBranching ignore the config silently; an explicit
FOR BRANCH on such a table is a hard error.

Implementation:
  * SupportsBranching gains loadBranch(name): Table.
  * TimeTravelSpec gains AsOfBranch(branch, isExplicit).
  * RelationTimeTravel carries an optional branch.
  * UnresolvedRelation carries the branch on writes via a reserved
    internal option (mirrors the REQUIRED_WRITE_PRIVILEGES pattern), so
    the NamedRelation slot in InsertIntoStatement / OverwriteByExpression
    remains intact.
  * CatalogV2Util.getTable composes loadTable + loadBranch, lifting the
    "no time travel on writes" assertion only for the branch case.
  * RelationResolution applies the default branch only on the persistent
    relation path (temp views unaffected).
  * InMemoryTable stores per-branch data in independent InMemoryTable
    instances so reads and writes through loadBranch are isolated.

Co-authored-by: Claude Code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant