-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Running 2 queries on the same table but different snapshot ID in Spark results in first snapshot's data returned for both queries #15741
Description
Apache Iceberg version
1.5.0
Query engine
Spark
Please describe the bug 🐞
When querying Iceberg through Spark in Scala, running 2 successive spark.sql queries for 2 different snapshot IDs of the same Iceberg table results in data from the snapshot that was queried earlier being returned for both queries.
Steps to reproduce
To reproduce, query an Iceberg table with 2 different resolved snapshot IDs from it and save it into 2 dataframe vals.
val df1 = spark.sql(
s"""
SELECT *
FROM $fullTableName
VERSION AS OF $resolvedVersion1
""")
val df2 = spark.sql(
s"""
SELECT *
FROM $fullTableName
VERSION AS OF $resolvedVersion2
""")df2 will contain the exact data that is returned in df1. It looks to me like either Spark or Iceberg is assuming that the 2 snapshots are identical and returning the cached data for df1 instead of correctly retrieving the 2nd snapshot ID.
Additional information
The values of resolvedVersion1 and resolvedVersion2 are returned from a helper method. They are the snapshot IDs that 2 different branches of the table are pointing to. I don't believe the fact that the 2 snapshot IDs are the heads of 2 branches should make a difference in the reproducibility of this bug, but I'm adding this here for context.
def getTableBranchLatestVersion(spark: SparkSession, fullTableName: String, branchName: String): Long = {
val refs = spark.sql(
s"""
SELECT snapshot_id
FROM $fullTableName.refs
WHERE name = '$branchName'
""")
.collect()
if (refs.isEmpty) {
throw new NoSuchElementException(s"Branch name $branchName was not found in $fullTableName.refs")
}
refs(0).getAs[Long]("snapshot_id")
}Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time