Skip to content

Not push down complex filter#5204

Merged
szehon-ho merged 1 commit intoapache:masterfrom
huaxingao:complex_filter
Jul 6, 2022
Merged

Not push down complex filter#5204
szehon-ho merged 1 commit intoapache:masterfrom
huaxingao:complex_filter

Conversation

@huaxingao
Copy link
Contributor

Currently iceberg throws Exception when using a filter on complex type:

org.apache.iceberg.spark.sql.TestSelect > testComplexTypeFilter[catalogName = testhadoop, implementation = org.apache.iceberg.spark.SparkCatalog, config = {type=hadoop}] FAILED
    java.lang.IllegalArgumentException: Cannot create expression literal from org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema: [3,v1]
        at org.apache.iceberg.expressions.Literals.from(Literals.java:87)
        at org.apache.iceberg.expressions.UnboundPredicate.<init>(UnboundPredicate.java:40)
        at org.apache.iceberg.expressions.Expressions.equal(Expressions.java:175)
        at org.apache.iceberg.spark.SparkFilters.handleEqual(SparkFilters.java:239)
        at org.apache.iceberg.spark.SparkFilters.convert(SparkFilters.java:152)
        at org.apache.iceberg.spark.source.SparkScanBuilder.pushFilters(SparkScanBuilder.java:101)
        at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.pushFilters(PushDownUtils.scala:69)
        at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownFilters$1.applyOrElse(V2ScanRelationPushDown.scala:60)
        at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownFilters$1.applyOrElse(V2ScanRelationPushDown.scala:47)
        at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)

This PR fixes the problem by not pushing down filters for complex type.

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for me, and thanks for the unit test. I think would be valuable in other spark versions like 3.2, but can be done later.

Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well, Thanks @huaxingao !

try {
expr = SparkFilters.convert(filter);
} catch (IllegalArgumentException e) {
// converting to Iceberg Expression failed, so this expression cannot be pushed down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] should we log what were the filters that spark pushed but were ignored ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, i see we log the pushed filters in spark already here

@szehon-ho szehon-ho merged commit d8d212f into apache:master Jul 6, 2022
@szehon-ho
Copy link
Member

Thanks @huaxingao for change, and @singhpk234 for additional review!

@huaxingao
Copy link
Contributor Author

Thank you very much! @szehon-ho Also thank you @singhpk234 for reviewing!

@huaxingao huaxingao deleted the complex_filter branch July 6, 2022 22:35
@szehon-ho
Copy link
Member

@huaxingao do you want to make a change for other spark branches (ie, 3.2)? Otherwise I or someone can look at it too.

@huaxingao
Copy link
Contributor Author

@szehon-ho I will do this later today or tomorrow. Do we need this in all the spark branches (3.2, 3.1, 3.0, 2.4)?

@huaxingao
Copy link
Contributor Author

BTW, I saw the CI failed (https://github.com/apache/iceberg/runs/7223837382?check_suite_focus=true) because of this test

org.apache.iceberg.parquet.TestBloomRowGroupFilter > testBytesEq FAILED
    java.lang.AssertionError: Should not read: cannot match a new generated binary
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.junit.Assert.assertFalse(Assert.java:65)
        at org.apache.iceberg.parquet.TestBloomRowGroupFilter.testBytesEq(TestBloomRowGroupFilter.java:587)

The failure is not related to the changes in this PR. I think the failure is because bloom filter can be false positive. I will fix the bloom filter test.

huaxingao added a commit to huaxingao/iceberg that referenced this pull request Jul 8, 2022
tdas pushed a commit to delta-io/delta that referenced this pull request Jan 23, 2026
## Summary

Prevents pushdown of filters containing complex types (Array, Struct,
Map) to Iceberg REST API. Iceberg does not support complex type literals
in filter expressions.

### Background

Iceberg's expression converter (`Literals.java`) explicitly does not
support complex types:
- Reference: [Iceberg
Literals.java](https://github.com/apache/iceberg/blob/15485f5523d08aae2a503c143c51b6df2debb655/api/src/main/java/org/apache/iceberg/expressions/Literals.java#L90)
- Also blocked in: [Apache Iceberg PR
#5204](apache/iceberg#5204)

There is no way to convert:
- Spark Array → Iceberg Array literal
- Spark Struct → Iceberg Struct literal  
- Spark Map → Iceberg Map literal

**Solution**: Don't push down complex type filters. Keep them as
residuals and let Spark evaluate them after reading data.

## Changes

### Type Blocking (Original Implementation)
- Reject complex types (arrays, structs, maps, rows) in filter values
- Update exception handling in `convert()` to handle complex type
rejection gracefully
- Keep complex type filters as residuals for Spark evaluation
- Update documentation to mention complex type limitations

### Code Simplification (Review Feedback)
- Removed `isSupportedType()` method that duplicated pattern matching
- Removed `UnsupportedOperationException` catch block
- Simplified `toIcebergValue()` to validate via pattern matching only
- Throw `IllegalArgumentException` in default case for unsupported types
- Net reduction: 33 lines of code while maintaining identical
functionality

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants