Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 27 additions & 5 deletions datafusion/optimizer/src/simplify_expressions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -788,12 +788,22 @@ impl<'a, S: SimplifyInfo> ExprRewriter for Simplifier<'a, S> {
op: Divide,
right,
}) if is_null(&right) => *right,
// A / A --> 1 (if a is not nullable)
// 0 / 0 -> null
Copy link
Contributor

@ozankabak ozankabak Oct 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are returning an error for the A / 0 case, shouldn't we also return an error in this case? It seems like they are both divide-by-zero instances. If you remove this block entirely, the next block will handle both cases in the said way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approached it a little bit complex than I should have. A / 0 is undefined however 0/0 is indeterminate so I represent undefined as error and indeterminate as null. Also, expected behavior in the #3615 pointed that col / col should return null when col = 0. I checked other languages and see that python implemented both (A/0 and 0/0) to return an error. Moreover, I like the offered solution. If it is OK for others, I am ready to change necessary parts. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the expected behavior of a rewrite rule is the same behavior as the current expression evaluator.

One way to check the existing behavior is to use datafusion-cli. In this case the result of 0/0 is null:

$ datafusion-cli 
DataFusion CLI v13.0.0
❯ select 0 / 0
;
+---------------------+
| Int64(0) / Int64(0) |
+---------------------+
|                     |
+---------------------+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW if the behavior (error or NULL) is inconsistent in DataFusion I think we should file that as a separate ticket / PR rather than try to make it consistent in this one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the expected behavior of a rewrite rule is the same behavior as the current expression evaluator.

One way to check the existing behavior is to use datafusion-cli. In this case the result of 0/0 is null:

$ datafusion-cli 
DataFusion CLI v13.0.0
❯ select 0 / 0
;
+---------------------+
| Int64(0) / Int64(0) |
+---------------------+
|                     |
+---------------------+

Thank you for your valuable feedback. If my understanding is wrong please correct me, we should return null for 0 / 0 to provide the same behavior with the current expression evaluator. So no changes are needed for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW if the behavior (error or NULL) is inconsistent in DataFusion I think we should file that as a separate ticket / PR rather than try to make it consistent in this one

I am a newbie rust developer and it is my first try to contribute to an open-source project. I am trying to understand the code base so I can't make appropriate comments on this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you can open an issue stating this discrepancy (A/0 -> Error vs. 0/0 -> NULL) so that the community can discuss/keep track of/fix it.

My understanding is that if you fix the lint issues and make the null-check (as in modulo), this could be good to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can follow-up it on #3849.

I did the necessary changes and control it with cargo fmt. Thanks for the help.

Expr::BinaryExpr(BinaryExpr {
left,
op: Divide,
right,
}) if !info.nullable(&left)? && left == right => lit(1),
}) if is_zero(&left) && is_zero(&right) => {
Expr::Literal(ScalarValue::Int32(None))
}
// A / 0 -> DivideByZero Error
Expr::BinaryExpr(BinaryExpr {
left,
op: Divide,
right,
}) if !info.nullable(&left)? && is_zero(&right) => {
return Err(DataFusionError::ArrowError(ArrowError::DivideByZero))
}

//
// Rules for Modulo
Expand Down Expand Up @@ -1179,13 +1189,25 @@ mod tests {
}

#[test]
fn test_simplify_divide_by_same_non_null() {
let expr = binary_expr(col("c2_non_null"), Operator::Divide, col("c2_non_null"));
let expected = lit(1);
fn test_simplify_divide_zero_by_zero() {
// 0 / 0 -> null
let expr = binary_expr(lit(0), Operator::Divide, lit(0));
let expected = Expr::Literal(ScalarValue::Int32(None));

assert_eq!(simplify(expr), expected);
}

#[test]
#[should_panic(
expected = "called `Result::unwrap()` on an `Err` value: ArrowError(DivideByZero)"
)]
fn test_simplify_divide_by_zero() {
// A / 0 -> DivideByZeroError
let expr = binary_expr(col("c2_non_null"), Operator::Divide, lit(0));

simplify(expr);
}

#[test]
fn test_simplify_modulo_by_null() {
let null = Expr::Literal(ScalarValue::Null);
Expand Down