API: Implement notStartsWith bounds check in StrictMetricsEvaluator by bharos · Pull Request #15883 · apache/iceberg

bharos · 2026-04-03T23:08:04Z

What

Implements bounds-based evaluation for notStartsWith in
StrictMetricsEvaluator, replacing the existing TODO with actual logic.

Previously, notStartsWith always returned ROWS_MIGHT_NOT_MATCH,
which prevented the engine from eliminating the residual predicate even
when file-level column bounds made it provable that no value could start
with the given prefix.

Changes

StrictMetricsEvaluator.notStartsWith: Added checks for nested
columns, all-nulls columns, and lower/upper bound comparisons against
the prefix. Returns ROWS_MUST_MATCH when bounds prove the prefix is
entirely outside the value range.
TestStrictMetricsEvaluator: Added 8 test methods covering:
all-nulls, bounds above/below/overlapping the prefix, wider ranges,
missing stats, some-nulls with bounds outside prefix, and prefix
longer than bounds.

How it works

For NOT STARTS WITH <prefix>:

If the lower bound (truncated to min(prefixLen, boundLen)) is
strictly greater than the prefix, all values are above the prefix
range → ROWS_MUST_MATCH
If the upper bound (truncated to min(prefixLen, boundLen)) is
strictly less than the prefix, all values are below the prefix range
→ ROWS_MUST_MATCH
Otherwise, fall through to ROWS_MIGHT_NOT_MATCH (conservative)

This follows the same pattern used by notEq and notIn in this
class, including the null-handling convention.

Closes #15882

When column bounds are entirely outside the prefix range, all rows must satisfy notStartsWith. Previously this always returned ROWS_MIGHT_NOT_MATCH regardless of bounds, missing an optimization opportunity for file-level pruning. Now returns ROWS_MUST_MATCH when: - Lower bound truncated to prefix length > prefix (all values above) - Upper bound truncated to prefix length < prefix (all values below) - Column contains only null values (nulls satisfy NOT predicates) Follows the same truncation pattern used in InclusiveMetricsEvaluator.startsWith and the null-handling pattern from StrictMetricsEvaluator.notEq.

anoopj · 2026-04-07T00:30:00Z

api/src/main/java/org/apache/iceberg/expressions/StrictMetricsEvaluator.java

-      // TODO: Handle cases that definitely cannot match, such as notStartsWith("x") when the bounds
-      // are ["a", "b"].
+      int id = ref.fieldId();
+      if (isNestedColumn(id)) {


Consider adding a test for nested column?

Thanks @anoopj , added a nested string field "nested_string_col" in test and a test case
PTAL

anoopj · 2026-04-07T02:47:44Z

The change looks reasonable to me. The only callout is the null handling: if there are null values, the implementation will return ROWS_MUST_MATCH for them as well. This doesn't follow SQL's 3-valued semantics, but is consistent with the current implementation of notEq, so I think the change is reasonable.

Please get this reviewed by a committer.

github-actions bot added the API label Apr 3, 2026

API: Fix comments and add test for prefix longer than bounds

bbb3deb

bharos force-pushed the perf/strict-metrics-not-starts-with-bounds branch from 432bb3e to bbb3deb Compare April 3, 2026 23:17

bharos mentioned this pull request Apr 6, 2026

API: Implement startsWith bounds check in StrictMetricsEvaluator #15902

Open

anoopj approved these changes Apr 7, 2026

View reviewed changes

API: Add nested column test for notStartsWith in StrictMetricsEvaluator

7ca819c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: Implement notStartsWith bounds check in StrictMetricsEvaluator#15883

API: Implement notStartsWith bounds check in StrictMetricsEvaluator#15883
bharos wants to merge 3 commits intoapache:mainfrom
bharos:perf/strict-metrics-not-starts-with-bounds

bharos commented Apr 3, 2026

Uh oh!

anoopj Apr 7, 2026

Uh oh!

bharos Apr 7, 2026

Uh oh!

anoopj commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bharos commented Apr 3, 2026

What

Changes

How it works

Uh oh!

anoopj Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

bharos Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

anoopj commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants