fix: InList Dictionary filter pushdown type mismatch#20962
fix: InList Dictionary filter pushdown type mismatch#20962alamb merged 5 commits intoapache:mainfrom
Conversation
11eae6b to
f0f2e90
Compare
Guard the dictionary unwrap in ArrayStaticFilter::contains() to only fire when the dictionary value type matches in_array's type. This prevents a make_comparator type mismatch when both the needle and in_array are Dictionary-encoded, which occurs in HashJoin dynamic filter pushdown with pushdown_filters enabled. Closes apache#20937
f0f2e90 to
91e1e59
Compare
alamb
left a comment
There was a problem hiding this comment.
Thank you @erratic-pattern -- I am pretty convinced this PR fixes the bug
I have a few follow-up suggestions around comments and tests. Let me know if they make sense
I also filed a ticket for a more complete follow up here
| LOCATION 'test_files/scratch/parquet_filter_pushdown/dict_filter_bug.parquet'; | ||
|
|
||
| query error Can't compare arrays of different types | ||
| query TR |
| // Unwrap dictionary-encoded needles when the value type matches | ||
| // in_array, evaluating against distinct values and mapping back | ||
| // via keys. |
There was a problem hiding this comment.
| // Unwrap dictionary-encoded needles when the value type matches | |
| // in_array, evaluating against distinct values and mapping back | |
| // via keys. | |
| // Unwrap dictionary-encoded needles when the value type matches | |
| // in_array, evaluating against the dictionary values and mapping back | |
| // via keys. |
| // try_new (used by SQL IN expressions) always produces a non-Dictionary | ||
| // in_array because evaluate_list() flattens Dictionary scalars to their | ||
| // value type. try_new_from_array passes the array directly, so it is | ||
| // the only path that can produce a Dictionary in_array. |
There was a problem hiding this comment.
I am a little confused about this comment -- it implies that the Dictionary is on the in_list side, but the code fix above is handling a Dictionary needle (v) and a non Dictionary Haystack 🤔
I think it may be worthwhile to consider changing the code so that it doesn't support a Dictionary in in_array -- but rather normalizes the haystack (we can do this as a follow on PR)
| } | ||
|
|
||
| fn eval_in_list_from_array( | ||
| needle_type: DataType, |
There was a problem hiding this comment.
if you need the type of needle can't w just call needle.data_type and that way be sure it is consistent with the needle array? That would make the tests less fragile, likely shorter, and easier to validate they are doing the right thing
alamb
left a comment
There was a problem hiding this comment.
looks good to me -- thank you @erratic-pattern
|
I am now making backports to 52 and 53 |
Closes apache#20937 `ArrayStaticFilter::contains()` unconditionally unwraps dictionary-encoded needles to evaluate against distinct values. This only works when `in_array` has the same type as the dictionary's value type. When `in_array` is also Dictionary, the unwrap creates a type mismatch in `make_comparator`. This became reachable after: - apache#20505 which removed dictionary flattening in the InList builder, allowing Dictionary arrays to reach `ArrayStaticFilter` via HashJoin dynamic filter pushdown with `pushdown_filters` enabled. - Guard the dictionary unwrap in `ArrayStaticFilter::contains()` to only fire when the dictionary value type matches `in_array`'s type. When both sides are Dictionary, fall through to `make_comparator(Dict, Dict)` which arrow-ord handles natively. - Update the sqllogictest from apache#20960 to expect success. - Add unit tests covering all `try_new_from_array` type combinations (primitive specializations, Utf8, Dictionary, Struct). Yes — unit tests and sqllogictest.
Closes apache#20937 `ArrayStaticFilter::contains()` unconditionally unwraps dictionary-encoded needles to evaluate against distinct values. This only works when `in_array` has the same type as the dictionary's value type. When `in_array` is also Dictionary, the unwrap creates a type mismatch in `make_comparator`. This became reachable after: - apache#20505 which removed dictionary flattening in the InList builder, allowing Dictionary arrays to reach `ArrayStaticFilter` via HashJoin dynamic filter pushdown with `pushdown_filters` enabled. - Guard the dictionary unwrap in `ArrayStaticFilter::contains()` to only fire when the dictionary value type matches `in_array`'s type. When both sides are Dictionary, fall through to `make_comparator(Dict, Dict)` which arrow-ord handles natively. - Update the sqllogictest from apache#20960 to expect success. - Add unit tests covering all `try_new_from_array` type combinations (primitive specializations, Utf8, Dictionary, Struct). Yes — unit tests and sqllogictest.
Which issue does this PR close?
Closes #20937
Rationale for this change
ArrayStaticFilter::contains()unconditionally unwraps dictionary-encoded needles to evaluate against distinct values. This only works whenin_arrayhas the same type as the dictionary's value type. Whenin_arrayis also Dictionary, the unwrap creates a type mismatch inmake_comparator. This became reachable after:HashJoinpanic with String dictionary keys (don't flatten keys) #20505which removed dictionary flattening in the InList builder, allowing Dictionary arrays to reach
ArrayStaticFiltervia HashJoin dynamic filter pushdown withpushdown_filtersenabled.What changes are included in this PR?
ArrayStaticFilter::contains()to only fire when the dictionary value type matchesin_array's type. When both sides are Dictionary, fall through tomake_comparator(Dict, Dict)which arrow-ord handles natively.try_new_from_arraytype combinations (primitive specializations, Utf8, Dictionary, Struct).Are these changes tested?
Yes — unit tests and sqllogictest.