Parquet: Should not throw exception for filter contains non-reference term #8243

ConeyLiu · 2023-08-07T02:47:25Z

The parquet filter evaluator should not throw exceptions for those filters containing non-reference terms. For example, the filter evaluator apply on year(ts) = 20 will throw ValidationException;

Fokko · 2023-08-07T12:27:36Z

parquet/src/main/java/org/apache/iceberg/parquet/ParquetBloomRowGroupFilter.java

+
+    @Override
+    public <T> Boolean handleNonReference(Bound<T> term) {
+      return ROWS_MIGHT_MATCH;


When the column is not referenced, then the column is missing, right? How could it then match?

I think here the NonReference should be it is not a column reference term. It can be a transform term, for example equal(year(ts), 10).

@Fokko, I think at this point the columns are bound and transforms have already been removed if they can be (e.g. if they can be translated to a normal predicate on a partition field). This is the case where the transform is left over because there is no corresponding partition field. We could implement future optimizations to run the filter, but returning all rows is good enough for now.

rdblue · 2023-08-07T22:47:52Z

data/src/test/java/org/apache/iceberg/data/TestMetricsRowGroupFilter.java

+    boolean shouldRead =
+        new ParquetMetricsRowGroupFilter(SCHEMA, equal(truncate("required", 2), "some_value"), true)
+            .shouldRead(parquetSchema, rowGroupMetadata);
+    Assumptions.assumeThat(shouldRead)


I think you want to use an assertion rather than an assumption? If you use an assumption the test will be skipped if it fails, instead of failing.

Indeed, fixed.

rdblue

This looks good other than the assumption that I think should be an assertion. I think @Fokko also has some questions that I don't fully understand so I'll defer to him on merging this.

rdblue · 2023-08-10T15:48:40Z

Thanks, @ConeyLiu!

ConeyLiu · 2023-08-11T03:07:29Z

Thanks @rdblue @Fokko

handle non-references

091d612

github-actions bot added parquet data labels Aug 7, 2023

ConeyLiu mentioned this pull request Aug 7, 2023

Spark: Rule for converting StaticInvoke to ApplyFunctionExpression for V2 filter push down #8088

Merged

Fokko reviewed Aug 7, 2023

View reviewed changes

rdblue reviewed Aug 7, 2023

View reviewed changes

rdblue approved these changes Aug 7, 2023

View reviewed changes

fixes

6f85b80

rdblue merged commit 84d00b8 into apache:master Aug 10, 2023
41 checks passed

ConeyLiu deleted the parquet-transform-filter branch August 11, 2023 03:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parquet: Should not throw exception for filter contains non-reference term #8243

Parquet: Should not throw exception for filter contains non-reference term #8243

ConeyLiu commented Aug 7, 2023

Fokko Aug 7, 2023

ConeyLiu Aug 7, 2023

rdblue Aug 7, 2023

rdblue Aug 7, 2023

ConeyLiu Aug 8, 2023

rdblue left a comment

rdblue commented Aug 10, 2023

ConeyLiu commented Aug 11, 2023

Parquet: Should not throw exception for filter contains non-reference term #8243

Parquet: Should not throw exception for filter contains non-reference term #8243

Conversation

ConeyLiu commented Aug 7, 2023

Fokko Aug 7, 2023

Choose a reason for hiding this comment

ConeyLiu Aug 7, 2023

Choose a reason for hiding this comment

rdblue Aug 7, 2023

Choose a reason for hiding this comment

rdblue Aug 7, 2023

Choose a reason for hiding this comment

ConeyLiu Aug 8, 2023

Choose a reason for hiding this comment

rdblue left a comment

Choose a reason for hiding this comment

rdblue commented Aug 10, 2023

ConeyLiu commented Aug 11, 2023