Skip to content

Conversation

@retikulum
Copy link
Contributor

Which issue does this PR close?

Closes #3615.

Rationale for this change

As it is discussed in issue, I believe that it will improve simplification rules

What changes are included in this PR?

Added A / 0 division rule and corresponding test
Added 0 / 0 division rule and corresponding test
Removed A / A division rule and corresponding test

Are there any user-facing changes?

I don't think so

@github-actions github-actions bot added the optimizer Optimizer rules label Oct 13, 2022
left: _,
op: Divide,
right,
} if is_zero(&right) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be !info.nullable(&left)? && is_zero(&right) like the modulo case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! I should have checked if it is nullable or not just like the modulo case. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❯ create table foo as select * from (values (1), (2), (NULL)) as sql;
❯ select * from foo;
+---------+
| column1 |
+---------+
| 1       |
| 2       |
|         |
+---------+
3 rows in set. Query took 0.002 seconds.
❯ select column1 / 0 from foo;
ArrowError(DivideByZero)

👍

I agree on the should be checking for nullability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will add !info.nullable(&left)? && is_zero(&right) check.

right,
} if is_null(&right) => *right,
// A / A --> 1 (if a is not nullable)
// 0 / 0 -> null
Copy link
Contributor

@ozankabak ozankabak Oct 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are returning an error for the A / 0 case, shouldn't we also return an error in this case? It seems like they are both divide-by-zero instances. If you remove this block entirely, the next block will handle both cases in the said way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approached it a little bit complex than I should have. A / 0 is undefined however 0/0 is indeterminate so I represent undefined as error and indeterminate as null. Also, expected behavior in the #3615 pointed that col / col should return null when col = 0. I checked other languages and see that python implemented both (A/0 and 0/0) to return an error. Moreover, I like the offered solution. If it is OK for others, I am ready to change necessary parts. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the expected behavior of a rewrite rule is the same behavior as the current expression evaluator.

One way to check the existing behavior is to use datafusion-cli. In this case the result of 0/0 is null:

$ datafusion-cli 
DataFusion CLI v13.0.0
❯ select 0 / 0
;
+---------------------+
| Int64(0) / Int64(0) |
+---------------------+
|                     |
+---------------------+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW if the behavior (error or NULL) is inconsistent in DataFusion I think we should file that as a separate ticket / PR rather than try to make it consistent in this one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the expected behavior of a rewrite rule is the same behavior as the current expression evaluator.

One way to check the existing behavior is to use datafusion-cli. In this case the result of 0/0 is null:

$ datafusion-cli 
DataFusion CLI v13.0.0
❯ select 0 / 0
;
+---------------------+
| Int64(0) / Int64(0) |
+---------------------+
|                     |
+---------------------+

Thank you for your valuable feedback. If my understanding is wrong please correct me, we should return null for 0 / 0 to provide the same behavior with the current expression evaluator. So no changes are needed for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW if the behavior (error or NULL) is inconsistent in DataFusion I think we should file that as a separate ticket / PR rather than try to make it consistent in this one

I am a newbie rust developer and it is my first try to contribute to an open-source project. I am trying to understand the code base so I can't make appropriate comments on this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you can open an issue stating this discrepancy (A/0 -> Error vs. 0/0 -> NULL) so that the community can discuss/keep track of/fix it.

My understanding is that if you fix the lint issues and make the null-check (as in modulo), this could be good to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can follow-up it on #3849.

I did the necessary changes and control it with cargo fmt. Thanks for the help.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- thanks @retikulum and @ozankabak (we really appreciate the help reviewing!)

Please let me know what you plan to do re checking for nullability prior to rewriting

right,
} if is_null(&right) => *right,
// A / A --> 1 (if a is not nullable)
// 0 / 0 -> null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the expected behavior of a rewrite rule is the same behavior as the current expression evaluator.

One way to check the existing behavior is to use datafusion-cli. In this case the result of 0/0 is null:

$ datafusion-cli 
DataFusion CLI v13.0.0
❯ select 0 / 0
;
+---------------------+
| Int64(0) / Int64(0) |
+---------------------+
|                     |
+---------------------+

left: _,
op: Divide,
right,
} if is_zero(&right) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❯ create table foo as select * from (values (1), (2), (NULL)) as sql;
❯ select * from foo;
+---------+
| column1 |
+---------+
| 1       |
| 2       |
|         |
+---------+
3 rows in set. Query took 0.002 seconds.
❯ select column1 / 0 from foo;
ArrowError(DivideByZero)

👍

I agree on the should be checking for nullability

right,
} if is_null(&right) => *right,
// A / A --> 1 (if a is not nullable)
// 0 / 0 -> null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW if the behavior (error or NULL) is inconsistent in DataFusion I think we should file that as a separate ticket / PR rather than try to make it consistent in this one

@alamb
Copy link
Contributor

alamb commented Oct 17, 2022

It looks like this PR has accumulated some conflicts -- can you please merge up from master to resolve them and then I will merge

Screen Shot 2022-10-17 at 5 52 39 PM

@alamb alamb merged commit 62aeb75 into apache:master Oct 18, 2022
@alamb
Copy link
Contributor

alamb commented Oct 18, 2022

Thanks again @retikulum

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

col / col returns 1 when col = 0

3 participants