Data: Handle null values properly in `IN` predicate filtering by hantangwangd · Pull Request #16697 · apache/iceberg

hantangwangd · 2026-06-06T14:43:52Z

When scanning table records via IcebergGenerics.read(table) and specifying filter conditions with where(filter), if the filter contains an IN predicate and the corresponding target column contains null values, the query may fail directly with the following error:

java.lang.NullPointerException: Invalid object: null

The root cause is: when FilterIterator.advance() is called, it invokes the shouldKeep(item) closure method of CloseableIterable to determine whether to keep the read item, during which the in(...) method of EvalVisitor is executed for evaluation. In the original logic, it directly checks that the corresponding target column value is not null, and throws immediately if it is null.

However, in many scenarios (such as the one constructed in the newly added test case), when a data file contains both possible valid values and null values in the target column, the records that contain null values will be read and passed to this method for evaluation, at which point an error will be thrown directly.

This PR fixes the issue by properly handling null values.

Data: Handle null values properly in IN predicate filtering

e801599

github-actions Bot added API data labels Jun 6, 2026

hantangwangd marked this pull request as ready for review June 6, 2026 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data: Handle null values properly in `IN` predicate filtering#16697

Data: Handle null values properly in `IN` predicate filtering#16697
hantangwangd wants to merge 1 commit into
apache:mainfrom
hantangwangd:fix_in_predicate

hantangwangd commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hantangwangd commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant