API: Fix Identity projection for mismatched transform types#16074
API: Fix Identity projection for mismatched transform types#16074yadavay-amzn wants to merge 1 commit into
Conversation
| @Test | ||
| public void testIdentityProjectionWithTransformPredicate() { | ||
| // Regression test for https://github.com/apache/iceberg/issues/15502 | ||
| // Identity-partitioned timestamptz field filtered with hours() should not throw |
There was a problem hiding this comment.
This sentence looks incomplete. "... should not throw ValidationException"?
|
|
||
| // The identity transform cannot project a hours-transform predicate, so it should | ||
| // be replaced with alwaysTrue (inclusive) rather than throwing ValidationException | ||
| assertThat(projected).isEqualTo(Expressions.alwaysTrue()); |
There was a problem hiding this comment.
ValidationException isn't thrown in this test even if I revert a change of Identity.java.
There was a problem hiding this comment.
The exception is caught internally by the projection framework and converted to alwaysTrue/alwaysFalse. Without the fix, projectStrict() creates an UnboundPredicate with an integer literal for a timestamptz field, which fails during binding — the test assertion fails because the projection result is neither alwaysTrue nor alwaysFalse. Updated the comments to explain this more clearly.
There was a problem hiding this comment.
@ebyhr I was able to confirm that the test fails without the fix and passes with it.
Without the fix, Projections.inclusive(spec).project(hourFilter) returns ref(name="ts") == 490674; an invalid predicate that binds an integer literal to a timestamptz field. The ValidationException from the original issue occurs downstream when this invalid predicate is evaluated against manifest data.
Our test catches the bug at the projection level (wrong predicate returned instead of alwaysTrue), which is the root cause.
72461b2 to
a078b0f
Compare
|
@ebyhr Replied to your comment about ValidationException not being thrown. Please let me know if that makes sense to you or if you'd like to test this differently? Thanks! |
…5502) Identity.project() and projectStrict() delegate to projectStrict() which creates an unbound predicate with the literal from the input predicate. When the predicate term is a transform (e.g. hours(ts) = 490674), the literal type (integer) does not match the partition field type (timestamptz), causing a ValidationException when the unbound predicate is later bound to the partition schema. Fix: Return null from projectStrict() when the predicate term is not a BoundReference, indicating the identity transform cannot project transform-based predicates. This causes the projection to fall back to alwaysTrue (inclusive) or alwaysFalse (strict), which is correct.
a078b0f to
6958a88
Compare
|
@ebyhr Updated the test comments to clarify the failure mode. Without the fix, the test fails because the projection returns |
Fixes #15502
Problem
When an Iceberg table has manifests written using a partition spec with
identity-transformed timestamp field, queries that filter on that field using a temporal transform likehours()fail with:Identity.projectStrict()creates an unbound predicate with the literal from the input predicate. When the predicate term is a transform (e.g.hours(ts) = 490674), the literal type (integer) does not match the partition field type (timestamptz), causing aValidationExceptionwhen the unbound predicate is later bound to the partition schema.Fix
Return
nullfromprojectStrict()when the predicate term is not aBoundReference, indicating the identity transform cannot project transform-based predicates. This causes the projection to fall back toalwaysTrue(inclusive) oralwaysFalse(strict), which is the correct behavior.The fix is a 4-line guard clause at the top of
projectStrict(). Sinceproject()delegates toprojectStrict(), both paths are covered.Testing
Added a regression test in
TestProjectionthat reproduces the exact scenario from the issue: identity-partitionedtimestamptzfield filtered withhours()transform, verifying both inclusive and strict projections.