-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add proper support for null literal by introducing ScalarValue::Null
#2364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
For someone reviews this pr, I convert this pr to draft for code review. It's ok to go if new |
|
cc @alamb @andygrove @yjshen @xudong963 Please have a review, thank you |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cargo.toml
Outdated
| codegen-units = 1 | ||
| lto = true | ||
|
|
||
| [patch.crates-io] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW arrow-rs 13.0.0 with these changes should be released sometime early next wee
| #[derive(Clone)] | ||
| pub enum ScalarValue { | ||
| /// represents null | ||
| Null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
datafusion/common/src/scalar.rs
Outdated
| eq_array_primitive!(array, index, IntervalMonthDayNanoArray, val) | ||
| } | ||
| ScalarValue::Struct(_, _) => unimplemented!(), | ||
| ScalarValue::Null => false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
| SQLExpr::Value(Value::SingleQuotedString(s)) => Ok(lit(s)), | ||
| SQLExpr::Value(Value::Null) => { | ||
| Ok(Expr::Literal(ScalarValue::Utf8(None))) | ||
| Ok(Expr::Literal(ScalarValue::Null)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well this explains a lot of odd type coercion errors I have been seeing
|
|
||
| /// coercion rules from NULL type. Since NULL can be casted to most of types in arrow, | ||
| /// either lhs or rhs is NULL, if NULL can be casted to type of the other side, the coecion is valid. | ||
| fn null_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very cool
ScalarValue::Null to dfnull literal by introducing ScalarValue::Null
|
cc @alamb PTAL, thank you ❤️ |
501add5 to
0100277
Compare
|
Since #2382 is merged, I switch this pr to |
|
cc @alamb @andygrove @yjshen @xudong963 |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking very good @WinkerDu -- thank you!
I think there is a small issue with null = null but it should be fairly easy to solve.
Thanks again!
| "| 999 |", | ||
| "+----------------------------------------------------------------------------------------------+", | ||
| "+----------------------------------------------------------------------------------------+", | ||
| "| CASE WHEN #t1.c1 = Utf8(\"a\") THEN Int64(1) WHEN NULL THEN Int64(2) ELSE Int64(999) END |", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key difference here. Very nice 👍
| INNER JOIN (SELECT null AS id2) t2 ON id1 = id2"; | ||
|
|
||
| let expected = vec!["++", "++"]; | ||
| let expected = vec![ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This answer is not correct -- there should be no rows that match.
This is because the join should produce rows where id1 = id2 evaluates to true
However, null = null evaluates to null 🤯
Here is the query in postgres:
alamb=# SELECT * FROM (SELECT null AS id1) t1
INNER JOIN (SELECT null AS id2) t2 ON id1 = id2
alamb-# ;
id1 | id2
-----+-----
(0 rows)
| pub fn eq_null(left: &NullArray, _right: &NullArray) -> Result<BooleanArray> { | ||
| let length = left.len(); | ||
| make_boolean_array(length, false) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct -- specifically I think the resulting boolean array should be full of nulls not false
Perhaps something like:
std::iter::repeat(left.len(), None).collect()0100277 to
13a1601
Compare
|
cc @alamb PTAL, thank you |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome -- thank you so much @WinkerDu -- this is great
| INNER JOIN (SELECT null AS id2) t2 ON id1 = id2"; | ||
|
|
||
| let expected = vec!["++", "++"]; | ||
| #[rustfmt::skip] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
🎉 we are going to have real null support 😍 |
|
Thank you all @alamb @andygrove |
…ll` (apache#2364) * introduce null * fix fmt
…ll` (apache#2364) * introduce null * fix fmt
…ll` (apache#2364) * introduce null * fix fmt
…ll` (apache#2364) * introduce null * fix fmt
…ll` (apache#2364) * introduce null * fix fmt
…ll` (apache#2364) * introduce null * fix fmt
…ll` (apache#2364) * introduce null * fix fmt
…ll` (apache#2364) * introduce null * fix fmt
…ll` (apache#2364) * introduce null * fix fmt
Which issue does this PR close?
Closes #2363 .
Rationale for this change
To solve Null constants issues listed in #1184 , and since /apache/arrow-rs#1572 Null casted from and to most of types in arrow-rs kernel, it's reasonable that introduce Null type to df for type coercion.
What changes are included in this PR?
Introduce
ScalarValue::Nulltype to dfAre there any user-facing changes?
No.