Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multistage][opt] optimizer not correctly prune out columns #9944

Open
walterddr opened this issue Dec 8, 2022 · 0 comments
Open

[multistage][opt] optimizer not correctly prune out columns #9944

walterddr opened this issue Dec 8, 2022 · 0 comments
Labels
multi-stage Related to the multi-stage query engine

Comments

@walterddr
Copy link
Contributor

see:

SELECT 
  count(LO_ORDERKEY)
FROM 
  lineorder JOIN customer ON LO_CUSTKEY = C_CUSTKEY
WHERE 
  LO_QUANTITY < 10 AND C_REGION = 'AFRICA'

LogicalAggregate(group=[{}], EXPR$0=[$SUM0($0)]): rowcount = 1.0, cumulative cost = 17819.155154440043, id = 44195 
  LogicalExchange(distribution=[hash]): rowcount = 1.0, cumulative cost = 17818.030154440043, id = 44193 
    LogicalAggregate(group=[{}], EXPR$0=[COUNT()]): rowcount = 1.0, cumulative cost = 17814.030154440043, id = 44188 
      LogicalJoin(condition=[=($7, $30)], joinType=[inner]): rowcount = 112.5, cumulative cost = 17812.905154440043, id = 44186  
        LogicalExchange(distribution=[hash[7]]): rowcount = 50.0, cumulative cost = 15798.092021712584, id = 44181 
          LogicalFilter(condition=[<($2, 10)]): rowcount = 50.0, cumulative cost = 150.0, id = 44173 
            LogicalTableScan(table=[[lineorder]]): rowcount = 100.0, cumulative cost = 100.0, id = 44157 
        LogicalExchange(distribution=[hash[10]]): rowcount = 15.0, cumulative cost = 1902.3131327274587, id = 44182 
          LogicalFilter(condition=[=($7, 'AFRICA')]): rowcount = 15.0, cumulative cost = 115.0, id = 44174 
            LogicalTableScan(table=[[customer]]): rowcount = 100.0, cumulative cost = 100.0, id = 44159

The explained plan requires leaf stages to return the entire collection of columns

where as:

SELECT count(LO_ORDERKEY)
FROM (SELECT LO_ORDERKEY, LO_CUSTKEY FROM lineorder WHERE LO_QUANTITY < 10) AS l JOIN 
     (SELECT C_CUSTKEY FROM customer WHERE  C_REGION = 'AFRICA') AS c ON l.LO_CUSTKEY = c.C_CUSTKEY


LogicalAggregate(group=[{}], EXPR$0=[$SUM0($0)]): rowcount = 1.0, cumulative cost = 2176.042214237391, id = 44268 
  LogicalExchange(distribution=[hash]): rowcount = 1.0, cumulative cost = 2174.917214237391, id = 44266 
    LogicalAggregate(group=[{}], EXPR$0=[COUNT()]): rowcount = 1.0, cumulative cost = 2170.917214237391, id = 44261 
      LogicalJoin(condition=[=($1, $2)], joinType=[inner]): rowcount = 112.5, cumulative cost = 2169.792214237391, id = 44259  <-- NOTE: this is condition =($1, $2)
        LogicalExchange(distribution=[hash[1]]): rowcount = 50.0, cumulative cost = 1764.8092021712584, id = 44254 
          LogicalProject(LO_ORDERKEY=[$0], LO_CUSTKEY=[$7]): rowcount = 50.0, cumulative cost = 200.0, id = 44249 
            LogicalFilter(condition=[<($2, 10)]): rowcount = 50.0, cumulative cost = 150.0, id = 44239 
              LogicalTableScan(table=[[lineorder]]): rowcount = 100.0, cumulative cost = 100.0, id = 44198 
      	LogicalExchange(distribution=[hash[0]]): rowcount = 15.0, cumulative cost = 292.4830120661326, id = 44255 
      	  LogicalProject(C_CUSTKEY=[$10]): rowcount = 15.0, cumulative cost = 130.0, id = 44251 
      	    LogicalFilter(condition=[=($7, 'AFRICA')]): rowcount = 15.0, cumulative cost = 115.0, id = 44244 
      	      LogicalTableScan(table=[[customer]]): rowcount = 100.0, cumulative cost = 100.0, id = 44202

if we wrapped in sub-query, it properly put the LogicalProject in-place to only return the needed columns.

This should be done automatically - if a count is only operating on one of the columns, no others should be returned via the exchange.

@walterddr walterddr added the multi-stage Related to the multi-stage query engine label Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multi-stage Related to the multi-stage query engine
Projects
None yet
Development

No branches or pull requests

1 participant