optimize cross join performance when where clause is or predicate and has common equal predicate exprs 

Queries like below cannot finish in a acceptable time, `store_sales` has 2800w rows, `customer_address` has 5w rows, for now Doris will create only one cross join node to execute this sql, 
the time of eval the where clause is about 200-300 ns, the total count of eval will be  2800w * 5w, this is extremely large, and this will cost 2800w * 5w * 250 ns = 4 billion seconds；

```
select avg(ss_quantity)
       ,avg(ss_ext_sales_price)
       ,avg(ss_ext_wholesale_cost)
       ,sum(ss_ext_wholesale_cost)
 from store_sales, customer_address 
 where  ((ss_addr_sk = ca_address_sk
  and ca_country = 'United States'
  and ca_state in ('CO', 'IL', 'MN')
  and ss_net_profit between 100 and 200  
     ) or
     (ss_addr_sk = ca_address_sk
  and ca_country = 'United States'
  and ca_state in ('OH', 'MT', 'NM')
  and ss_net_profit between 150 and 300  
     ) or
     (ss_addr_sk = ca_address_sk
  and ca_country = 'United States'
  and ca_state in ('TX', 'MO', 'MI')
  and ss_net_profit between 50 and 250  
     ))
```

but this  sql can be rewrite to 
```
select avg(ss_quantity)
       ,avg(ss_ext_sales_price)
       ,avg(ss_ext_wholesale_cost)
       ,sum(ss_ext_wholesale_cost)
 from store_sales, customer_address 
 where ss_addr_sk = ca_address_sk
  and ca_country = 'United States' and (((ca_state in ('CO', 'IL', 'MN')
  and ss_net_profit between 100 and 200  
     ) or
     (ca_state in ('OH', 'MT', 'NM')
  and ss_net_profit between 150 and 300  
     ) or
     (ca_state in ('TX', 'MO', 'MI')
  and ss_net_profit between 50 and 250  
     ))
 )
```
there for  we can do a hash join first and then use 
```
(((ca_state in ('CO', 'IL', 'MN')
  and ss_net_profit between 100 and 200  
     ) or
     (ca_state in ('OH', 'MT', 'NM')
  and ss_net_profit between 150 and 300  
     ) or
     (ca_state in ('TX', 'MO', 'MI')
  and ss_net_profit between 50 and 250  
     ))
 )
```
to filter the value,

in TPCDS 10g dataset,  the rewritten sql only cost about 1 seconds.
![image](https://user-images.githubusercontent.com/9098473/78207796-443e2180-74d5-11ea-8d01-a6598bfe59f9.png)

so we should implements this optimize

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimize cross join performance when where clause is or predicate and has common equal predicate exprs #3245

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

optimize cross join performance when where clause is or predicate and has common equal predicate exprs #3245

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions