-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Labels
enhancementNew feature or requestNew feature or requestperformanceMake DataFusion fasterMake DataFusion faster
Description
Is your feature request related to a problem or challenge?
I'm testing TPCDS Q69 correctness issues with Comet(apache/datafusion-comet#2667) and found the Q69 which contains multiple anti joins performs way worse comparing to hash join
Modified query
select
cd_gender,
cd_marital_status,
cd_education_status,
count(*) cnt1,
cd_purchase_estimate,
count(*) cnt2,
cd_credit_rating,
count(*) cnt3
from
customer c,customer_address ca,customer_demographics
where
c.c_current_addr_sk = ca.ca_address_sk and
ca_state in ('IN','VA','MS') and
cd_demo_sk = c.c_current_cdemo_sk and
exists (select *
from store_sales,date_dim
where c.c_customer_sk = ss_customer_sk and
ss_sold_date_sk = d_date_sk and
d_year = 2002 and
d_moy between 2 and 2+2) and (not exists (select *
from web_sales,date_dim
where c.c_customer_sk = ws_bill_customer_sk and
ws_sold_date_sk = d_date_sk and
d_year = 2002 and
d_moy between 2 and 2+2))
group by cd_gender,
cd_marital_status,
cd_education_status,
cd_purchase_estimate, cd_credit_rating order by 1, 2, 3, 4, 5, 6, 7, 8;
HJ Elapsed 9.879 seconds.
SMJ Elapsed 59.865 seconds.
Also it is looks like to be a reason for apache/datafusion-comet#901
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestperformanceMake DataFusion fasterMake DataFusion faster