Skip to content

Spark (3.4,3.5,4.0) : Include snapshotId and branch in SparkTable equals and hashCode#15840

Open
bharos wants to merge 1 commit intoapache:mainfrom
bharos:fix/spark-table-equals-snapshot-caching
Open

Spark (3.4,3.5,4.0) : Include snapshotId and branch in SparkTable equals and hashCode#15840
bharos wants to merge 1 commit intoapache:mainfrom
bharos:fix/spark-table-equals-snapshot-caching

Conversation

@bharos
Copy link
Copy Markdown
Contributor

@bharos bharos commented Mar 31, 2026

SparkTable.equals() and hashCode() only compared the table name, causing Spark to return cached query results for time-travel and branch queries. When a user reads from branch 'main' and then branch 'audit', Spark's cache considered them equal and returned stale data from 'main'.

Include snapshotId and branch fields in equals() and hashCode() so that Spark correctly distinguishes tables loaded at different snapshots or branches. This matches the fix already applied in Spark 4.1 (current main).

Closes #15741

… and hashCode

SparkTable.equals() and hashCode() only compared the table name, causing
Spark to return cached query results for time-travel and branch queries.
When a user reads from branch 'main' and then branch 'audit', Spark's
cache considered them equal and returned stale data from 'main'.

Include snapshotId and branch fields in equals() and hashCode() so that
Spark correctly distinguishes tables loaded at different snapshots or
branches. This matches the fix already applied in Spark 4.1 (main).

Closes apache#15741
@github-actions github-actions bot added the spark label Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Running 2 queries on the same table but different snapshot ID in Spark results in first snapshot's data returned for both queries

1 participant