-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add/Remove Division Rules #3824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| left: _, | ||
| op: Divide, | ||
| right, | ||
| } if is_zero(&right) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be !info.nullable(&left)? && is_zero(&right) like the modulo case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right! I should have checked if it is nullable or not just like the modulo case. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❯ create table foo as select * from (values (1), (2), (NULL)) as sql;
❯ select * from foo;
+---------+
| column1 |
+---------+
| 1 |
| 2 |
| |
+---------+
3 rows in set. Query took 0.002 seconds.
❯ select column1 / 0 from foo;
ArrowError(DivideByZero)👍
I agree on the should be checking for nullability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I will add !info.nullable(&left)? && is_zero(&right) check.
| right, | ||
| } if is_null(&right) => *right, | ||
| // A / A --> 1 (if a is not nullable) | ||
| // 0 / 0 -> null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are returning an error for the A / 0 case, shouldn't we also return an error in this case? It seems like they are both divide-by-zero instances. If you remove this block entirely, the next block will handle both cases in the said way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approached it a little bit complex than I should have. A / 0 is undefined however 0/0 is indeterminate so I represent undefined as error and indeterminate as null. Also, expected behavior in the #3615 pointed that col / col should return null when col = 0. I checked other languages and see that python implemented both (A/0 and 0/0) to return an error. Moreover, I like the offered solution. If it is OK for others, I am ready to change necessary parts. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the expected behavior of a rewrite rule is the same behavior as the current expression evaluator.
One way to check the existing behavior is to use datafusion-cli. In this case the result of 0/0 is null:
$ datafusion-cli
DataFusion CLI v13.0.0
❯ select 0 / 0
;
+---------------------+
| Int64(0) / Int64(0) |
+---------------------+
| |
+---------------------+There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW if the behavior (error or NULL) is inconsistent in DataFusion I think we should file that as a separate ticket / PR rather than try to make it consistent in this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the expected behavior of a rewrite rule is the same behavior as the current expression evaluator.
One way to check the existing behavior is to use
datafusion-cli. In this case the result of0/0is null:$ datafusion-cli DataFusion CLI v13.0.0 ❯ select 0 / 0 ; +---------------------+ | Int64(0) / Int64(0) | +---------------------+ | | +---------------------+
Thank you for your valuable feedback. If my understanding is wrong please correct me, we should return null for 0 / 0 to provide the same behavior with the current expression evaluator. So no changes are needed for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW if the behavior (error or NULL) is inconsistent in DataFusion I think we should file that as a separate ticket / PR rather than try to make it consistent in this one
I am a newbie rust developer and it is my first try to contribute to an open-source project. I am trying to understand the code base so I can't make appropriate comments on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if you can open an issue stating this discrepancy (A/0 -> Error vs. 0/0 -> NULL) so that the community can discuss/keep track of/fix it.
My understanding is that if you fix the lint issues and make the null-check (as in modulo), this could be good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can follow-up it on #3849.
I did the necessary changes and control it with cargo fmt. Thanks for the help.
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great -- thanks @retikulum and @ozankabak (we really appreciate the help reviewing!)
Please let me know what you plan to do re checking for nullability prior to rewriting
| right, | ||
| } if is_null(&right) => *right, | ||
| // A / A --> 1 (if a is not nullable) | ||
| // 0 / 0 -> null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the expected behavior of a rewrite rule is the same behavior as the current expression evaluator.
One way to check the existing behavior is to use datafusion-cli. In this case the result of 0/0 is null:
$ datafusion-cli
DataFusion CLI v13.0.0
❯ select 0 / 0
;
+---------------------+
| Int64(0) / Int64(0) |
+---------------------+
| |
+---------------------+| left: _, | ||
| op: Divide, | ||
| right, | ||
| } if is_zero(&right) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❯ create table foo as select * from (values (1), (2), (NULL)) as sql;
❯ select * from foo;
+---------+
| column1 |
+---------+
| 1 |
| 2 |
| |
+---------+
3 rows in set. Query took 0.002 seconds.
❯ select column1 / 0 from foo;
ArrowError(DivideByZero)👍
I agree on the should be checking for nullability
| right, | ||
| } if is_null(&right) => *right, | ||
| // A / A --> 1 (if a is not nullable) | ||
| // 0 / 0 -> null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW if the behavior (error or NULL) is inconsistent in DataFusion I think we should file that as a separate ticket / PR rather than try to make it consistent in this one
|
Thanks again @retikulum |

Which issue does this PR close?
Closes #3615.
Rationale for this change
As it is discussed in issue, I believe that it will improve simplification rules
What changes are included in this PR?
Added A / 0 division rule and corresponding test
Added 0 / 0 division rule and corresponding test
Removed A / A division rule and corresponding test
Are there any user-facing changes?
I don't think so