Conversation
szehon-ho
left a comment
There was a problem hiding this comment.
This works for me, and thanks for the unit test. I think would be valuable in other spark versions like 3.2, but can be done later.
singhpk234
left a comment
There was a problem hiding this comment.
LGTM as well, Thanks @huaxingao !
| try { | ||
| expr = SparkFilters.convert(filter); | ||
| } catch (IllegalArgumentException e) { | ||
| // converting to Iceberg Expression failed, so this expression cannot be pushed down |
There was a problem hiding this comment.
[minor] should we log what were the filters that spark pushed but were ignored ?
There was a problem hiding this comment.
nvm, i see we log the pushed filters in spark already here
|
Thanks @huaxingao for change, and @singhpk234 for additional review! |
|
Thank you very much! @szehon-ho Also thank you @singhpk234 for reviewing! |
|
@huaxingao do you want to make a change for other spark branches (ie, 3.2)? Otherwise I or someone can look at it too. |
|
@szehon-ho I will do this later today or tomorrow. Do we need this in all the spark branches (3.2, 3.1, 3.0, 2.4)? |
|
BTW, I saw the CI failed (https://github.com/apache/iceberg/runs/7223837382?check_suite_focus=true) because of this test The failure is not related to the changes in this PR. I think the failure is because bloom filter can be false positive. I will fix the bloom filter test. |
## Summary Prevents pushdown of filters containing complex types (Array, Struct, Map) to Iceberg REST API. Iceberg does not support complex type literals in filter expressions. ### Background Iceberg's expression converter (`Literals.java`) explicitly does not support complex types: - Reference: [Iceberg Literals.java](https://github.com/apache/iceberg/blob/15485f5523d08aae2a503c143c51b6df2debb655/api/src/main/java/org/apache/iceberg/expressions/Literals.java#L90) - Also blocked in: [Apache Iceberg PR #5204](apache/iceberg#5204) There is no way to convert: - Spark Array → Iceberg Array literal - Spark Struct → Iceberg Struct literal - Spark Map → Iceberg Map literal **Solution**: Don't push down complex type filters. Keep them as residuals and let Spark evaluate them after reading data. ## Changes ### Type Blocking (Original Implementation) - Reject complex types (arrays, structs, maps, rows) in filter values - Update exception handling in `convert()` to handle complex type rejection gracefully - Keep complex type filters as residuals for Spark evaluation - Update documentation to mention complex type limitations ### Code Simplification (Review Feedback) - Removed `isSupportedType()` method that duplicated pattern matching - Removed `UnsupportedOperationException` catch block - Simplified `toIcebergValue()` to validate via pattern matching only - Throw `IllegalArgumentException` in default case for unsupported types - Net reduction: 33 lines of code while maintaining identical functionality --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Currently iceberg throws Exception when using a filter on complex type:
This PR fixes the problem by not pushing down filters for complex type.