Skip to content

Data: Handle null values properly in IN predicate filtering#16697

Open
hantangwangd wants to merge 1 commit into
apache:mainfrom
hantangwangd:fix_in_predicate
Open

Data: Handle null values properly in IN predicate filtering#16697
hantangwangd wants to merge 1 commit into
apache:mainfrom
hantangwangd:fix_in_predicate

Conversation

@hantangwangd
Copy link
Copy Markdown
Contributor

When scanning table records via IcebergGenerics.read(table) and specifying filter conditions with where(filter), if the filter contains an IN predicate and the corresponding target column contains null values, the query may fail directly with the following error:

java.lang.NullPointerException: Invalid object: null

The root cause is: when FilterIterator.advance() is called, it invokes the shouldKeep(item) closure method of CloseableIterable to determine whether to keep the read item, during which the in(...) method of EvalVisitor is executed for evaluation. In the original logic, it directly checks that the corresponding target column value is not null, and throws immediately if it is null.

However, in many scenarios (such as the one constructed in the newly added test case), when a data file contains both possible valid values and null values in the target column, the records that contain null values will be read and passed to this method for evaluation, at which point an error will be thrown directly.

This PR fixes the issue by properly handling null values.

@hantangwangd hantangwangd marked this pull request as ready for review June 6, 2026 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant