Skip to content

Comments

[fix](planner)fix bug for missing slot#16601

Merged
morrySnow merged 3 commits intoapache:masterfrom
englefly:1504-prune-slot
Feb 13, 2023
Merged

[fix](planner)fix bug for missing slot#16601
morrySnow merged 3 commits intoapache:masterfrom
englefly:1504-prune-slot

Conversation

@englefly
Copy link
Contributor

@englefly englefly commented Feb 10, 2023

Proposed changes

In previous version, if the output slot of analyticExpr is not materialized, the analyticExpr is pruned.
But there are some cases that it cannot be pruned.
For example:

               SELECT
                    count(*)
                FROM T1,
                    (SELECT dd
                    FROM (
                        SELECT
                            1.1 as cc,
                            ROW_NUMBER() OVER() as dd
                        FROM T2
                        ) V1
                    ORDER BY cc DESC
                    limit 1
                    ) V2;

analyticExpr(ROW_NUMBER() OVER() as dd) is not materialized, but we have to generate
WindowGroup for it.
tmp.dd is used by upper count(*), we have to generate data for tmp.dd

In this fix, if an inline view only output one column(in this example, the 'dd'), we materialize this column.

TODO:
In order to prune 'ROW_NUMBER() OVER() as dd', we need to rethink the rule of choosing a column
for count(*). (refer to SingleNodePlanner.materializeTableResultForCrossJoinOrCountStar)
V2 can be transformed to

   SELECT cc
    FROM (
        SELECT
            1.1 as cc,
            ROW_NUMBER() OVER() as dd
        FROM T2
        ) V1
    ORDER BY cc DESC
    limit 1
    ) V2;

Except the byte size of cc and dd, we need to consider the cost to generate cc and dd.

Issue Number: close #xxx

Problem summary

Describe your changes.

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added the area/planner Issues or PRs related to the query planner label Feb 10, 2023
morrySnow
morrySnow previously approved these changes Feb 10, 2023
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 10, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@nextdreamblue
Copy link
Contributor

#16579

may be this pr can fix it

@hello-stephen
Copy link
Contributor

hello-stephen commented Feb 10, 2023

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 34.55 seconds
stream load tsv: 467 seconds loaded 74807831229 Bytes, about 152 MB/s
stream load json: 35 seconds loaded 2358488459 Bytes, about 64 MB/s
stream load orc: 68 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 28 seconds loaded 861443392 Bytes, about 29 MB/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230212042950_clickbench_pr_94627.html

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Feb 10, 2023
@morningman morningman added dev/1.2.2 usercase Important user case type label labels Feb 13, 2023
@englefly
Copy link
Contributor Author

#16579

may be this pr can fix it
hi @nextdreamblue
#16579 is about const expression, but this bug is not relative to const. The original sql is from customer, there is no const column. I use const column to make the sql simple.

@morrySnow morrySnow merged commit ded6981 into apache:master Feb 13, 2023
BiteTheDDDDt added a commit that referenced this pull request Feb 15, 2023
…ottom && Compatible with old … (#16750)

1.change mv rewrite from bottom up to up bottom
2.compatible with old version mv
3.restore some ut codes (but disable)
4. fix some ut introduced by [fix](planner)fix bug for missing slot #16601 and [Feature](Materialized-View) support multiple slot on one column in materialized view #16378
YangShaw pushed a commit to YangShaw/doris that referenced this pull request Feb 17, 2023
In previous version, if the output slot of analyticExpr is not materialized, the analyticExpr is pruned.
But there are some cases that it cannot be pruned.
For example:

                   SELECT
                        count(*)
                    FROM T1,
                        (SELECT dd
                        FROM (
                            SELECT
                                1.1 as cc,
                                ROW_NUMBER() OVER() as dd
                            FROM T2
                            ) V1
                        ORDER BY cc DESC
                        limit 1
                        ) V2;

 analyticExpr(ROW_NUMBER() OVER() as dd) is not materialized, but we have to generate
 WindowGroup for it.
 tmp.dd is used by upper count(*), we have to generate data for tmp.dd

In this fix, if an inline view only output one column(in this example, the 'dd'), we materialize this column.

TODO:
 In order to prune 'ROW_NUMBER() OVER() as dd', we need to rethink the rule of choosing a column
 for count(*). (refer to SingleNodePlanner.materializeTableResultForCrossJoinOrCountStar)
 V2 can be transformed to
                        
       SELECT cc
        FROM (
            SELECT
                1.1 as cc,
                ROW_NUMBER() OVER() as dd
            FROM T2
            ) V1
        ORDER BY cc DESC
        limit 1
        ) V2;

Except the byte size of cc and dd, we need to consider the cost to generate cc and dd.
morningman pushed a commit that referenced this pull request Feb 19, 2023
In previous version, if the output slot of analyticExpr is not materialized, the analyticExpr is pruned.
But there are some cases that it cannot be pruned.
For example:

                   SELECT
                        count(*)
                    FROM T1,
                        (SELECT dd
                        FROM (
                            SELECT
                                1.1 as cc,
                                ROW_NUMBER() OVER() as dd
                            FROM T2
                            ) V1
                        ORDER BY cc DESC
                        limit 1
                        ) V2;

 analyticExpr(ROW_NUMBER() OVER() as dd) is not materialized, but we have to generate
 WindowGroup for it.
 tmp.dd is used by upper count(*), we have to generate data for tmp.dd

In this fix, if an inline view only output one column(in this example, the 'dd'), we materialize this column.

TODO:
 In order to prune 'ROW_NUMBER() OVER() as dd', we need to rethink the rule of choosing a column
 for count(*). (refer to SingleNodePlanner.materializeTableResultForCrossJoinOrCountStar)
 V2 can be transformed to
                        
       SELECT cc
        FROM (
            SELECT
                1.1 as cc,
                ROW_NUMBER() OVER() as dd
            FROM T2
            ) V1
        ORDER BY cc DESC
        limit 1
        ) V2;

Except the byte size of cc and dd, we need to consider the cost to generate cc and dd.
yagagagaga pushed a commit to yagagagaga/doris that referenced this pull request Mar 9, 2023
…ottom && Compatible with old … (apache#16750)

1.change mv rewrite from bottom up to up bottom
2.compatible with old version mv
3.restore some ut codes (but disable)
4. fix some ut introduced by [fix](planner)fix bug for missing slot apache#16601 and [Feature](Materialized-View) support multiple slot on one column in materialized view apache#16378
@englefly englefly deleted the 1504-prune-slot branch June 16, 2023 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/planner Issues or PRs related to the query planner dev/1.2.3-merged reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants