Not push down complex filter by huaxingao · Pull Request #5204 · apache/iceberg

huaxingao · 2022-07-05T19:27:51Z

Currently iceberg throws Exception when using a filter on complex type:

org.apache.iceberg.spark.sql.TestSelect > testComplexTypeFilter[catalogName = testhadoop, implementation = org.apache.iceberg.spark.SparkCatalog, config = {type=hadoop}] FAILED
    java.lang.IllegalArgumentException: Cannot create expression literal from org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema: [3,v1]
        at org.apache.iceberg.expressions.Literals.from(Literals.java:87)
        at org.apache.iceberg.expressions.UnboundPredicate.<init>(UnboundPredicate.java:40)
        at org.apache.iceberg.expressions.Expressions.equal(Expressions.java:175)
        at org.apache.iceberg.spark.SparkFilters.handleEqual(SparkFilters.java:239)
        at org.apache.iceberg.spark.SparkFilters.convert(SparkFilters.java:152)
        at org.apache.iceberg.spark.source.SparkScanBuilder.pushFilters(SparkScanBuilder.java:101)
        at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.pushFilters(PushDownUtils.scala:69)
        at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownFilters$1.applyOrElse(V2ScanRelationPushDown.scala:60)
        at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownFilters$1.applyOrElse(V2ScanRelationPushDown.scala:47)
        at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)

This PR fixes the problem by not pushing down filters for complex type.

szehon-ho

This works for me, and thanks for the unit test. I think would be valuable in other spark versions like 3.2, but can be done later.

singhpk234

LGTM as well, Thanks @huaxingao !

singhpk234 · 2022-07-06T03:53:41Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java

+      try {
+        expr = SparkFilters.convert(filter);
+      } catch (IllegalArgumentException e) {
+        // converting to Iceberg Expression failed, so this expression cannot be pushed down


[minor] should we log what were the filters that spark pushed but were ignored ?

nvm, i see we log the pushed filters in spark already here

szehon-ho · 2022-07-06T22:19:11Z

Thanks @huaxingao for change, and @singhpk234 for additional review!

huaxingao · 2022-07-06T22:35:26Z

Thank you very much! @szehon-ho Also thank you @singhpk234 for reviewing!

szehon-ho · 2022-07-07T23:12:18Z

@huaxingao do you want to make a change for other spark branches (ie, 3.2)? Otherwise I or someone can look at it too.

huaxingao · 2022-07-07T23:32:57Z

@szehon-ho I will do this later today or tomorrow. Do we need this in all the spark branches (3.2, 3.1, 3.0, 2.4)?

huaxingao · 2022-07-07T23:52:43Z

BTW, I saw the CI failed (https://github.com/apache/iceberg/runs/7223837382?check_suite_focus=true) because of this test

org.apache.iceberg.parquet.TestBloomRowGroupFilter > testBytesEq FAILED
    java.lang.AssertionError: Should not read: cannot match a new generated binary
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.junit.Assert.assertFalse(Assert.java:65)
        at org.apache.iceberg.parquet.TestBloomRowGroupFilter.testBytesEq(TestBloomRowGroupFilter.java:587)

The failure is not related to the changes in this PR. I think the failure is because bloom filter can be false positive. I will fix the bloom filter test.

## Summary Prevents pushdown of filters containing complex types (Array, Struct, Map) to Iceberg REST API. Iceberg does not support complex type literals in filter expressions. ### Background Iceberg's expression converter (`Literals.java`) explicitly does not support complex types: - Reference: [Iceberg Literals.java](https://github.com/apache/iceberg/blob/15485f5523d08aae2a503c143c51b6df2debb655/api/src/main/java/org/apache/iceberg/expressions/Literals.java#L90) - Also blocked in: [Apache Iceberg PR #5204](apache/iceberg#5204) There is no way to convert: - Spark Array → Iceberg Array literal - Spark Struct → Iceberg Struct literal - Spark Map → Iceberg Map literal **Solution**: Don't push down complex type filters. Keep them as residuals and let Spark evaluate them after reading data. ## Changes ### Type Blocking (Original Implementation) - Reject complex types (arrays, structs, maps, rows) in filter values - Update exception handling in `convert()` to handle complex type rejection gracefully - Keep complex type filters as residuals for Spark evaluation - Update documentation to mention complex type limitations ### Code Simplification (Review Feedback) - Removed `isSupportedType()` method that duplicated pattern matching - Removed `UnsupportedOperationException` catch block - Simplified `toIcebergValue()` to validate via pattern matching only - Throw `IllegalArgumentException` in default case for unsupported types - Net reduction: 33 lines of code while maintaining identical functionality --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

Not push down complex filter

476f4af

github-actions bot added the spark label Jul 5, 2022

huaxingao mentioned this pull request Jul 5, 2022

Spark: Iceberg Data Source does not support struct literal predicates #5132

Closed

szehon-ho approved these changes Jul 6, 2022

View reviewed changes

singhpk234 approved these changes Jul 6, 2022

View reviewed changes

szehon-ho merged commit d8d212f into apache:master Jul 6, 2022

huaxingao deleted the complex_filter branch July 6, 2022 22:35

huaxingao added a commit to huaxingao/iceberg that referenced this pull request Jul 8, 2022

backport apache#5204 to Spark3.2

24f2d64

huaxingao mentioned this pull request Jul 8, 2022

backport #5204 to Spark3.2 #5227

Merged

murali-db mentioned this pull request Jan 22, 2026

Block pushdown of complex type filters to Iceberg delta-io/delta#5909

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not push down complex filter#5204

Not push down complex filter#5204
szehon-ho merged 1 commit intoapache:masterfrom
huaxingao:complex_filter

huaxingao commented Jul 5, 2022

Uh oh!

szehon-ho left a comment

Uh oh!

singhpk234 left a comment

Uh oh!

singhpk234 Jul 6, 2022

Uh oh!

singhpk234 Jul 6, 2022

Uh oh!

szehon-ho commented Jul 6, 2022

Uh oh!

huaxingao commented Jul 6, 2022

Uh oh!

szehon-ho commented Jul 7, 2022

Uh oh!

huaxingao commented Jul 7, 2022

Uh oh!

huaxingao commented Jul 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

huaxingao commented Jul 5, 2022

Uh oh!

szehon-ho left a comment

Choose a reason for hiding this comment

Uh oh!

singhpk234 left a comment

Choose a reason for hiding this comment

Uh oh!

singhpk234 Jul 6, 2022

Choose a reason for hiding this comment

Uh oh!

singhpk234 Jul 6, 2022

Choose a reason for hiding this comment

Uh oh!

szehon-ho commented Jul 6, 2022

Uh oh!

huaxingao commented Jul 6, 2022

Uh oh!

szehon-ho commented Jul 7, 2022

Uh oh!

huaxingao commented Jul 7, 2022

Uh oh!

huaxingao commented Jul 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants