Skip to content

[SPARK-56619][SQL][TESTS] Add DSv2 repeated table access tests with internal/external changes #55462

Open
longvu-db wants to merge 14 commits intoapache:masterfrom
longvu-db:dsv2-pr2-repeated-sql
Open

[SPARK-56619][SQL][TESTS] Add DSv2 repeated table access tests with internal/external changes #55462
longvu-db wants to merge 14 commits intoapache:masterfrom
longvu-db:dsv2-pr2-repeated-sql

Conversation

@longvu-db
Copy link
Copy Markdown
Contributor

@longvu-db longvu-db commented Apr 21, 2026

What changes were proposed in this pull request?

Add 6 tests to DataSourceV2DataFrameSuite that verify DSv2 tables reflect the latest state when accessed repeatedly via sql() (without CACHE TABLE). Each sql("SELECT * FROM t") call creates a fresh QueryExecution, so it always sees the most recent data, schema, and table identity.

The tests cover three scenarios, each with a session-write and an external-write variant:

  • Scenario 1 (external writes): After a writer adds rows (via session SQL or catalog API), a subsequent sql() query sees the new data.
  • Scenario 2 (external schema changes): After a writer adds a column and inserts data with the new schema (via session SQL or catalog API), a subsequent sql() query reflects the updated schema.
  • Scenario 3 (drop/recreate): After a writer drops and recreates the table (via session SQL or catalog API), a subsequent sql() query sees the empty recreated table.

External writes use direct catalog API calls (loadTable with write privileges, alterTable, dropTable/createTable), matching the pattern used by the existing CACHE TABLE tests in the same suite.

Why are the changes needed?

These tests document and lock down the expected behavior: repeated sql() access without CACHE TABLE always sees the latest table state. This prevents regressions if internal resolution or caching logic changes.

Does this PR introduce any user-facing change?

No. This PR is test-only.

How was this patch tested?

6 new tests in DataSourceV2DataFrameSuite, all passing:

build/sbt 'sql/testOnly *DataSourceV2DataFrameSuite -- -z "repeated sql()"'

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-6)

@longvu-db longvu-db marked this pull request as draft April 21, 2026 20:48
@longvu-db longvu-db force-pushed the dsv2-pr2-repeated-sql branch from 77c6144 to 242bdaa Compare April 24, 2026 14:42
@longvu-db longvu-db changed the title [SPARK-XXXXX][SQL][TESTS] Add DSv2 repeated SQL access refresh tests [SPARK-XXXXX][SQL][TESTS] Add DSv2 repeated table access tests with external changes Apr 24, 2026
@longvu-db longvu-db changed the title [SPARK-XXXXX][SQL][TESTS] Add DSv2 repeated table access tests with external changes [SPARK-XXXXX][SQL][TESTS] Add DSv2 repeated table access tests with internal/external changes Apr 24, 2026
@longvu-db longvu-db changed the title [SPARK-XXXXX][SQL][TESTS] Add DSv2 repeated table access tests with internal/external changes [SPARK-56619][SQL][TESTS] Add DSv2 repeated table access tests with internal/external changes Apr 24, 2026
@longvu-db longvu-db marked this pull request as ready for review April 24, 2026 15:15
… external changes

Remove production code changes (DataFrameWriter, DataFrameWriterV2),
InMemoryBaseTable column-drop handling, CachingInMemoryTableCatalog,
NullIdInMemoryTableCatalog, Edge.7 test, and caching tests.

Keep 6 tests: session/external variants of write, schema change,
and drop/recreate scenarios.

Co-authored-by: Isaac
…l writes

Replace separate suite and SharedInMemoryTableCatalog with direct
catalog API calls (loadTable with write privileges, alterTable,
dropTable/createTable) matching the pattern in the CACHE TABLE tests.

Co-authored-by: Isaac
…eCatalog

Adds 3 tests verifying that when a DSv2 connector caches table state,
external changes are invisible through repeated sql() calls.

Co-authored-by: Isaac
…ests

After asserting stale data with the caching connector, REFRESH TABLE
invalidates the connector cache and verifies external changes become
visible. Also adds invalidateTable override to CachingInMemoryTableCatalog.

Co-authored-by: Isaac
…ng tests

Move each cache test next to its corresponding external test with
matching section numbers. Also adds invalidateTable override to
CachingInMemoryTableCatalog for REFRESH TABLE support.

Co-authored-by: Isaac
@longvu-db longvu-db force-pushed the dsv2-pr2-repeated-sql branch from 4b5c3a5 to 489e519 Compare April 30, 2026 16:52
…pr2-repeated-sql

# Conflicts:
#	sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSuite.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant