Skip to content

Conversation

@huaxingao
Copy link
Contributor

Fix flaky ordering in Spark TestSelect.testSelectWithSpecifiedTargetSplitSize()

TestRemoteScanPlanning > testSelectWithSpecifiedTargetSplitSize() > catalogName = testrest, implementation = org.apache.iceberg.spark.SparkCatalog, config = {type=rest, cache-enabled=false, uri=http://localhost:36105/, rest-scan-planning-enabled=true}, binaryTableName = testrest.default.binary_table FAILED
    org.opentest4j.AssertionFailedError: [Should return all expected rows: row 1 contents should match] 
    expected: 1L
     but was: 2L
        at app//org.apache.iceberg.spark.SparkTestHelperBase.assertEquals(SparkTestHelperBase.java:91)
        at app//org.apache.iceberg.spark.SparkTestHelperBase.assertEquals(SparkTestHelperBase.java:67)
        at app//org.apache.iceberg.spark.sql.TestSelect.testSelectWithSpecifiedTargetSplitSize(TestSelect.java:124)

Also fix a few other places that could cause ordering problem.

@github-actions github-actions bot added the spark label Jan 3, 2026
Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, seems like we might have missed these in earlier fix

should we port this to 4.0 too ?

assertThat(sql("SELECT * FROM %s LIMIT 3", tableName)).containsExactly(first, second, third);
// Note: without ORDER BY, the specific rows returned are not deterministic, especially when
// remote scan planning or split planning changes the physical scan order. This test only
// asserts that the LIMIT is enforced.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would still recommend checking the records to make sure same record is not returned twice ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. I change to order by.

Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @huaxingao !

@singhpk234 singhpk234 merged commit e131329 into apache:main Jan 3, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants