Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

throw Divide by zero error when use case when #8814

Closed
liukun4515 opened this issue Jan 10, 2024 · 9 comments · Fixed by #8833
Closed

throw Divide by zero error when use case when #8814

liukun4515 opened this issue Jan 10, 2024 · 9 comments · Fixed by #8833
Labels
bug Something isn't working

Comments

@liukun4515
Copy link
Contributor

Describe the bug

When i use the

case value1 > 0 then value/value1 else 0 end as result

and got the divide by zero

To Reproduce

create table users as values (1,1),(2,2),(3,3),(0,0),(4,0);

When run

select 
case when B.column1 > 0 and B.column2 > 0 then (A.column1/B.column1 - A.column2/B.column2) else 0 end as value3
from (select column1, column2 from users) as A ,  (select column1, column2 from users where column1 = 0) as B;

there is no issue.

But when run

select 
case when B.column1 > 0 then A.column1/B.column1 else 0 end as value1,
case when B.column1 > 0 and B.column2 > 0 then A.column1/B.column1 else 0 end as value3
from (select column1, column2 from users) as A ,  (select column1, column2 from users where column1 = 0) as B;

there is the issue: divide by zero

Expected behavior

no exception and get the result

Additional context

I am try to find the root cause, maybe it is caused by optimizer or the execution of the physical expr.

@liukun4515 liukun4515 added the bug Something isn't working label Jan 10, 2024
@liukun4515
Copy link
Contributor Author

get the plan

❯ explain select
case when B.column1 > 0 then A.column1/B.column1 else 0 end as value1,
case when B.column1 > 0 and B.column2 > 0 then A.column1/B.column1 else 0 end as value3
from (select column1, column2 from users) as A ,  (select column1, column2 from users where column1 = 0) as B;
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                                      |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: CASE WHEN b.column1 > Int64(0)Int64(0)b.column1 AS b.column1 > Int64(0) THEN a.column1 / b.column1b.column1a.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value1, CASE WHEN b.column1 > Int64(0)Int64(0)b.column1 AS b.column1 > Int64(0) AND b.column2 > Int64(0) THEN a.column1 / b.column1b.column1a.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value3 |
|               |   Projection: a.column1 / b.column1 AS a.column1 / b.column1b.column1a.column1, b.column1 > Int64(0) AS b.column1 > Int64(0)Int64(0)b.column1, b.column2                                                                                                                                                                                                                                  |
|               |     CrossJoin:                                                                                                                                                                                                                                                                                                                                                                            |
|               |       SubqueryAlias: a                                                                                                                                                                                                                                                                                                                                                                    |
|               |         TableScan: users projection=[column1]                                                                                                                                                                                                                                                                                                                                             |
|               |       SubqueryAlias: b                                                                                                                                                                                                                                                                                                                                                                    |
|               |         Filter: users.column1 = Int64(0)                                                                                                                                                                                                                                                                                                                                                  |
|               |           TableScan: users projection=[column1, column2]                                                                                                                                                                                                                                                                                                                                  |
| physical_plan | ProjectionExec: expr=[CASE WHEN b.column1 > Int64(0)Int64(0)b.column1@1 THEN a.column1 / b.column1b.column1a.column1@0 ELSE 0 END as value1, CASE WHEN b.column1 > Int64(0)Int64(0)b.column1@1 AND column2@2 > 0 THEN a.column1 / b.column1b.column1a.column1@0 ELSE 0 END as value3]                                                                                                     |
|               |   ProjectionExec: expr=[column1@0 / column1@1 as a.column1 / b.column1b.column1a.column1, column1@1 > 0 as b.column1 > Int64(0)Int64(0)b.column1, column2@2 as column2]                                                                                                                                                                                                                   |
|               |     CrossJoinExec                                                                                                                                                                                                                                                                                                                                                                         |
|               |       MemoryExec: partitions=1, partition_sizes=[1]                                                                                                                                                                                                                                                                                                                                       |
|               |       RepartitionExec: partitioning=RoundRobinBatch(12), input_partitions=1                                                                                                                                                                                                                                                                                                               |
|               |         CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                                                                                                                                       |
|               |           FilterExec: column1@0 = 0                                                                                                                                                                                                                                                                                                                                                       |
|               |             MemoryExec: partitions=1, partition_sizes=[1]                                                                                                                                                                                                                                                                                                                                 |
|               |                                                                                                                                                                                                                                                                                                                                                                                           |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

@haohuaijin
Copy link
Contributor

I use explain verbose find that common_sub_expression_eliminate rule extract a.column1 / b.column1 as a common sub expr , result in divide by zero.

| initial_logical_plan                                       | Projection: CASE WHEN b.column1 > Int64(0) THEN a.column1 / b.column1 ELSE Int64(0) END AS value1, CASE WHEN b.column2 > Int64(0) THEN a.column1 / b.column1 ELSE Int64(0) END AS value3                                                                                                                                                        |
|                                                            |   CrossJoin:                                                                                                                                                                                                                                                                                                                                    |
|                                                            |     SubqueryAlias: a                                                                                                                                                                                                                                                                                                                            |
|                                                            |       Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                  |
|                                                            |         TableScan: users                                                                                                                                                                                                                                                                                                                        |
|                                                            |     SubqueryAlias: b                                                                                                                                                                                                                                                                                                                            |
|                                                            |       Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                  |
|                                                            |         Filter: users.column1 = Int64(0)                                                                                                                                                                                                                                                                                                        |
|                                                            |           TableScan: users                                                                                                                                                                                                                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| logical_plan after common_sub_expression_eliminate         | Projection: CASE WHEN b.column1 > Int64(0) THEN a.column1 / b.column1b.column1a.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value1, CASE WHEN b.column2 > Int64(0) THEN a.column1 / b.column1b.column1a.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value3                                                                  |
|                                                            |   Projection: a.column1 / b.column1 AS a.column1 / b.column1b.column1a.column1, a.column1, a.column2, b.column1, b.column2                                                                                                                                                                                                                      |
|                                                            |     CrossJoin:                                                                                                                                                                                                                                                                                                                                  |
|                                                            |       SubqueryAlias: a                                                                                                                                                                                                                                                                                                                          |
|                                                            |         Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                |
|                                                            |           TableScan: users                                                                                                                                                                                                                                                                                                                      |
|                                                            |       SubqueryAlias: b                                                                                                                                                                                                                                                                                                                          |
|                                                            |         Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                |
|                                                            |           Filter: users.column1 = Int64(0)                                                                                                                                                                                                                                                                                                      |
|                                                            |             TableScan: users 

@dhamotharan-ps
Copy link

dhamotharan-ps commented Jan 10, 2024

I have tried to reproduce the problem with latest code datafusion 34.0.0. But it is working fine. What is the version of your datafusion?

DataFusion CLI v34.0.0
❯ create table users as values (1,1),(2,2),(3,3),(0,0),(4,0);
0 rows in set. Query took 0.016 seconds.

❯ select 
case when B.column1 > 0 and B.column2 > 0 then (A.column1/B.column1 - A.column2/B.column2) else 0 end as value3
from (select column1, column2 from users) as A ,  (select column1, column2 from users where column1 = 0) as B;
+--------+
| value3 |
+--------+
| 0      |
| 0      |
| 0      |
| 0      |
| 0      |
+--------+
5 rows in set. Query took 0.040 seconds.
❯ explain select 
case when B.column1 > 0 and B.column2 > 0 then (A.column1/B.column1 - A.column2/B.column2) else 0 end as value3
from (select column1, column2 from users) as A ,  (select column1, column2 from users where column1 = 0) as B;
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                               |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: CASE WHEN b.column1 > Int64(0) AND b.column2 > Int64(0) THEN a.column1 / b.column1 - a.column2 / b.column2 ELSE Int64(0) END AS value3 |
|               |   CrossJoin:                                                                                                                                       |
|               |     SubqueryAlias: a                                                                                                                               |
|               |       TableScan: users projection=[column1, column2]                                                                                               |
|               |     SubqueryAlias: b                                                                                                                               |
|               |       Filter: users.column1 = Int64(0)                                                                                                             |
|               |         TableScan: users projection=[column1, column2]                                                                                             |
| physical_plan | ProjectionExec: expr=[CASE WHEN column1@2 > 0 AND column2@3 > 0 THEN column1@0 / column1@2 - column2@1 / column2@3 ELSE 0 END as value3]           |
|               |   CrossJoinExec                                                                                                                                    |
|               |     MemoryExec: partitions=1, partition_sizes=[1]                                                                                                  |
|               |     RepartitionExec: partitioning=RoundRobinBatch(12), input_partitions=1                                                                          |
|               |       CoalesceBatchesExec: target_batch_size=8192                                                                                                  |
|               |         FilterExec: column1@0 = 0                                                                                                                  |
|               |           MemoryExec: partitions=1, partition_sizes=[1]                                                                                            |
|               |                                                                                                                                                    |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.029 seconds.

@liukun4515
Copy link
Contributor Author

I have tried to reproduce the problem with latest code datafusion 34.0.0. But it is working fine. What is the version of your datafusion?

DataFusion CLI v34.0.0
❯ create table users as values (1,1),(2,2),(3,3),(0,0),(4,0);
0 rows in set. Query took 0.016 seconds.

❯ select 
case when B.column1 > 0 and B.column2 > 0 then (A.column1/B.column1 - A.column2/B.column2) else 0 end as value3
from (select column1, column2 from users) as A ,  (select column1, column2 from users where column1 = 0) as B;
+--------+
| value3 |
+--------+
| 0      |
| 0      |
| 0      |
| 0      |
| 0      |
+--------+
5 rows in set. Query took 0.040 seconds.
❯ explain select 
case when B.column1 > 0 and B.column2 > 0 then (A.column1/B.column1 - A.column2/B.column2) else 0 end as value3
from (select column1, column2 from users) as A ,  (select column1, column2 from users where column1 = 0) as B;
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                               |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: CASE WHEN b.column1 > Int64(0) AND b.column2 > Int64(0) THEN a.column1 / b.column1 - a.column2 / b.column2 ELSE Int64(0) END AS value3 |
|               |   CrossJoin:                                                                                                                                       |
|               |     SubqueryAlias: a                                                                                                                               |
|               |       TableScan: users projection=[column1, column2]                                                                                               |
|               |     SubqueryAlias: b                                                                                                                               |
|               |       Filter: users.column1 = Int64(0)                                                                                                             |
|               |         TableScan: users projection=[column1, column2]                                                                                             |
| physical_plan | ProjectionExec: expr=[CASE WHEN column1@2 > 0 AND column2@3 > 0 THEN column1@0 / column1@2 - column2@1 / column2@3 ELSE 0 END as value3]           |
|               |   CrossJoinExec                                                                                                                                    |
|               |     MemoryExec: partitions=1, partition_sizes=[1]                                                                                                  |
|               |     RepartitionExec: partitioning=RoundRobinBatch(12), input_partitions=1                                                                          |
|               |       CoalesceBatchesExec: target_batch_size=8192                                                                                                  |
|               |         FilterExec: column1@0 = 0                                                                                                                  |
|               |           MemoryExec: partitions=1, partition_sizes=[1]                                                                                            |
|               |                                                                                                                                                    |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.029 seconds.

Can I get your commit id?

I use

commit f8d860365ef17d05cb033c1f39928cac6620e2f2 (HEAD -> upstream_main, upstream/main)
Author: Randall Spears <randall@rspears.me>
Date:   Tue Jan 9 13:35:38 2024 -0700

@liukun4515
Copy link
Contributor Author

I use explain verbose find that common_sub_expression_eliminate rule extract a.column1 / b.column1 as a common sub expr , result in divide by zero.

| initial_logical_plan                                       | Projection: CASE WHEN b.column1 > Int64(0) THEN a.column1 / b.column1 ELSE Int64(0) END AS value1, CASE WHEN b.column2 > Int64(0) THEN a.column1 / b.column1 ELSE Int64(0) END AS value3                                                                                                                                                        |
|                                                            |   CrossJoin:                                                                                                                                                                                                                                                                                                                                    |
|                                                            |     SubqueryAlias: a                                                                                                                                                                                                                                                                                                                            |
|                                                            |       Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                  |
|                                                            |         TableScan: users                                                                                                                                                                                                                                                                                                                        |
|                                                            |     SubqueryAlias: b                                                                                                                                                                                                                                                                                                                            |
|                                                            |       Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                  |
|                                                            |         Filter: users.column1 = Int64(0)                                                                                                                                                                                                                                                                                                        |
|                                                            |           TableScan: users                                                                                                                                                                                                                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| logical_plan after common_sub_expression_eliminate         | Projection: CASE WHEN b.column1 > Int64(0) THEN a.column1 / b.column1b.column1a.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value1, CASE WHEN b.column2 > Int64(0) THEN a.column1 / b.column1b.column1a.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value3                                                                  |
|                                                            |   Projection: a.column1 / b.column1 AS a.column1 / b.column1b.column1a.column1, a.column1, a.column2, b.column1, b.column2                                                                                                                                                                                                                      |
|                                                            |     CrossJoin:                                                                                                                                                                                                                                                                                                                                  |
|                                                            |       SubqueryAlias: a                                                                                                                                                                                                                                                                                                                          |
|                                                            |         Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                |
|                                                            |           TableScan: users                                                                                                                                                                                                                                                                                                                      |
|                                                            |       SubqueryAlias: b                                                                                                                                                                                                                                                                                                                          |
|                                                            |         Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                |
|                                                            |           Filter: users.column1 = Int64(0)                                                                                                                                                                                                                                                                                                      |
|                                                            |             TableScan: users 

I also find this reason, do you have the time to fix this issue?

I think this issue may be cause to other issues.

I want to find which commit caused this issue.

@haohuaijin

@haohuaijin
Copy link
Contributor

haohuaijin commented Jan 11, 2024

@liukun4515

I also find this reason, do you have the time to fix this

yes, I would like to fix this.

I think this is related to #8340 and #8454(and the discussion in #8296), that before above pr the projection pushdown undo the common_subexpr_eliminate , and after #8340 and #8454, the projection pushdown don't undo this. So result in the common sub expr a.column1 / b.column1 evaluate.

seem like we should not extract common sub_expr in CASE WHEN THEN(which some expr never run)? what do you think?

the below is the plan in datafusion-cli v33.0.0,

| logical_plan after common_sub_expression_eliminate         | Projection: CASE WHEN b.column1 > Int64(0)Int64(0)b.column1 AS b.column1 > Int64(0) AS b.column1 > Int64(0) THEN a.column1 / b.column1b.column1a.column1 AS a.column1 / b.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value1, CASE WHEN b.column1 > Int64(0)Int64(0)b.column1 AS b.column1 > Int64(0) AS b.column1 > Int64(0) AND b.column2 > Int64(0) THEN a.column1 / b.column1b.column1a.column1 AS a.column1 / b.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value3                                                                                                   |
|                                                            |   Projection: a.column1 / b.column1 AS a.column1 / b.column1b.column1a.column1, b.column1 > Int64(0) AS b.column1 > Int64(0)Int64(0)b.column1, a.column1, b.column1, b.column2                                                                                                                                                                                                                                                                                                                                                                                                                |
|                                                            |     CrossJoin:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|                                                            |       SubqueryAlias: a                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|                                                            |         TableScan: users projection=[column1]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|                                                            |       SubqueryAlias: b                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|                                                            |         Filter: users.column1 = Int64(0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|                                                            |           TableScan: users projection=[column1, column2]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| logical_plan after push_down_projection                    | Projection: CASE WHEN b.column1 > Int64(0) AS b.column1 > Int64(0) AS b.column1 > Int64(0) THEN a.column1 / b.column1 AS a.column1 / b.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value1, CASE WHEN b.column1 > Int64(0) AS b.column1 > Int64(0) AS b.column1 > Int64(0) AND b.column2 > Int64(0) THEN a.column1 / b.column1 AS a.column1 / b.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value3                                                                                                                                                                         |
|                                                            |   CrossJoin:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|                                                            |     Projection: a.column1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
|                                                            |       SubqueryAlias: a                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|                                                            |         Projection: users.column1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|                                                            |           TableScan: users projection=[column1]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
|                                                            |     Projection: b.column1, b.column2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|                                                            |       SubqueryAlias: b                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|                                                            |         Filter: users.column1 = Int64(0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|                                                            |           Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|                                                            |             TableScan: users projection=[column1, column2]  

@dhamotharan-ps
Copy link

@liukun4515 my codebase is upto below commit.

dhamo@prannav-pc:~/repos/arrow-datafusion$ git log
commit 0e53c6d816f3a9d3d27c6ebb6d25b1699e5553e7 (HEAD -> main, origin/main, origin/HEAD)
Author: comphead <comphead@users.noreply.github.com>
Date:   Mon Jan 8 15:15:55 2024 -0800

    Move tests from `expr.rs` to sqllogictests. Part1 (#8773)
    
    * move tests from  to sqllogictests part1
    
    * Update datafusion/sqllogictest/test_files/expr.slt
    
    Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
    
    * Update datafusion/sqllogictest/test_files/expr.slt
    
    Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
    
    ---------
    
    Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

@dhamotharan-ps
Copy link

@liukun4515 When I update my codebase to latest, I am also getting 'Divide by zero' error. So this is the regression which is caused by changes after

commit 0e53c6d816f3a9d3d27c6ebb6d25b1699e5553e7 (HEAD -> main, origin/main, origin/HEAD)
Author: comphead <comphead@users.noreply.github.com>
Date:   Mon Jan 8 15:15:55 2024 -0800

@liukun4515
Copy link
Contributor Author

thanks for @dhamotharan-ps @haohuaijin I will help to review your pr this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants