Skip to content

[SPARK-54216][SQL] Add regression tests for V2 table cache refresh with immutable Table instances#55416

Draft
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/SPARK-54216-v2-cache-stale
Draft

[SPARK-54216][SQL] Add regression tests for V2 table cache refresh with immutable Table instances#55416
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/SPARK-54216-v2-cache-stale

Conversation

@yadavay-amzn
Copy link
Copy Markdown

What changes were proposed in this pull request?

Add regression tests verifying that refreshTable() and recacheByPlan() return fresh data for DataSource V2 tables with immutable Table instances (copyOnLoad=true).

Why are the changes needed?

SPARK-54216 describes a bug where cached V2 table queries return stale data after refreshTable(). Investigation shows the underlying issue was already fixed by:

  • SPARK-54387: Fixed recaching of DSv2 tables by calling V2TableRefreshUtil.refresh() before re-executing cached plans
  • SPARK-54424: Refined the fix with tryRefreshPlan and tryRebuildCacheEntry for better error handling

However, there were no explicit regression tests for the exact scenario described in SPARK-54216. These tests serve as coverage for that scenario.

Does this PR introduce any user-facing change?

No (test-only).

How was this patch tested?

Two new tests in CachedTableSuite:

  1. refreshTable should return fresh data for V2 tables with immutable Table - tests CACHE TABLE + INSERT + refreshTable flow
  2. recacheByPlan should return fresh data for V2 tables - tests DataFrame.cache() + INSERT + recacheByPlan flow

Both tests pass on current master.

Was this patch authored or co-authored using generative AI tooling?

Yes

…th immutable Table instances

Add tests verifying that refreshTable() and recacheByPlan() return fresh data
for DataSource V2 tables with immutable Table instances (copyOnLoad=true).

The underlying issue was fixed by SPARK-54387 and SPARK-54424, which introduced
tryRefreshPlan to reload table metadata from the catalog before re-caching.
These tests serve as explicit regression coverage for the scenario described
in SPARK-54216.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant