-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add additional tests for InListExpr #19050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I expect tests to fail. We can fix them once we confirm. |
| downcast_dictionary_array! { | ||
| v => { | ||
| let values_contains = self.contains(v.values().as_ref(), negated)?; | ||
| let result = take(&values_contains, v.keys(), None)?; | ||
| return Ok(downcast_array(result.as_ref())) | ||
| } | ||
| _ => {} | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lack of this was a bug
| let result = match (v.null_count() > 0, negated) { | ||
| (true, false) => { | ||
| // has nulls, not negated" | ||
| BooleanArray::from_iter( | ||
| v.iter().map(|value| Some(self.values.contains(&value?))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were not handling nulls properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i double checked with postgres and the new code looks right
postgres=# select NULL IN (1);
?column?
----------
(1 row)
postgres=# select 1 IN (1, NULL);
?column?
----------
t
(1 row)
postgres=# select 1 IN (2, NULL);
?column?
----------
(1 row)
postgres=# select 1 NOT IN (2, NULL);
?column?
----------
(1 row)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep I checked postgres and duckdb to get the right behavior
76fd945 to
f2eb513
Compare
|
@alamb I've added test harnesses to remove duplication as you requested 😄 |
772bca6 to
b3fca6a
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @adriangb -- I went through the code fixes and tests and this PR looks like a definitely improvement to me
I left some small suggestions
I also ran code coverage on this PR to double check and it did find one more case that is not covered
cargo llvm-cov test --html -p datafusion-physical-expr
| /// This validates data types, evaluates the list as constants, and uses specialized | ||
| /// StaticFilter implementations for better performance (e.g., Int32StaticFilter for Int32). | ||
| /// | ||
| /// Returns an error if data types don't match or if the list contains non-constant expressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this comment is correct -- the code appears to handle the case of non-constant expressions as well
I think this might also be clearer if it were simply named try_new() rather than try_from_static_filter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks updated
| BooleanArray::from_iter( | ||
| v.iter().map(|value| Some(self.values.contains(&value?))), | ||
| ) | ||
| // Either needle or haystack has nulls, not negated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Either needle or haystack has nulls, not negated | |
| // needle has nulls, not negated |
| BooleanArray::from_iter( | ||
| v.iter().map(|value| Some(!self.values.contains(&value?))), | ||
| ) | ||
| // Either needle or haystack has nulls, negated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Either needle or haystack has nulls, negated | |
| // needle has nulls, negated |
| negated, | ||
| Some(instantiate_static_filter(in_array)?), | ||
| )), | ||
| Err(_) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit -- I think it would be clearer if this try_evaluate_constant_list evaluated to Result<Option<..>> rather than Result -- that way we avoid the string allocation on error, and could pass real errors .
This code responds to all errors, even if the issue is something different than the types are not supported
If you went this route, I suspect you wouldn't need the second error check for type matches (it would already be handled)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed and refactored to Result<Option<>> as you suggested
| name: &'static str, | ||
| value_in: ScalarValue, | ||
| value_not_in: ScalarValue, | ||
| value_in_list: ScalarValue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my reading of these tests is that they always test a one element list -- as in the list does not have multiple values
Do you think we need coverage for multi-value lists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous tests all had mutli-value lists, such as
// expression: "a not in ("a", "b")"Maybe we could make this something like values_in: Vec<ScalarValue>
However, I see there are multi-value tests below too, so maybe this is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated to accept other_list_values: Vec<ScalarValue> and have the tests throw in some extra values to give coverage
| let result = match (v.null_count() > 0, negated) { | ||
| (true, false) => { | ||
| // has nulls, not negated" | ||
| BooleanArray::from_iter( | ||
| v.iter().map(|value| Some(self.values.contains(&value?))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i double checked with postgres and the new code looks right
postgres=# select NULL IN (1);
?column?
----------
(1 row)
postgres=# select 1 IN (1, NULL);
?column?
----------
t
(1 row)
postgres=# select 1 IN (2, NULL);
?column?
----------
(1 row)
postgres=# select 1 NOT IN (2, NULL);
?column?
----------
(1 row)| // Create dictionary-encoded batch with values [1, 2, 5] | ||
| // Dictionary: keys [0, 1, 2] -> values [1, 2, 5] | ||
| let keys = Int8Array::from(vec![0, 1, 2]); | ||
| let values = Int32Array::from(vec![1, 2, 5]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally recommend using values that are clearly not the keys - for example
| let values = Int32Array::from(vec![1, 2, 5]); | |
| let values = Int32Array::from(vec![100, 200, 500]); |
( you would also have to change the literals above)
| } | ||
| (false, true) => { | ||
| // no null, negated | ||
| // No nulls anywhere, negated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to code coverage, this branch is not covered by tests (see PR comments)
| RecordBatch::try_new(Arc::new(schema.clone()), vec![Arc::clone(&array)])?; | ||
|
|
||
| // Helper to format SQL-like representation for error messages | ||
| let _format_sql = |negated: bool, with_null: bool| -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this helper seems unused (why is it named starting with _ 🤔 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was a pipe dream to have nice error messages. I updated it and implemented it. now if a test fails you get something like:
assertion `left == right` failed: Failed for: a IN (5, 3, 7)
a: PrimitiveArray<Int32>
[
1,
3,
4,
]
left: BooleanArray
[
true,
true,
true,
]
right: BooleanArray
[
false,
true,
false,
]
b3fca6a to
89234b1
Compare
This adds tests that prove some bugs that are fixed in #18832 so we can level the playing field on benchmarks.