Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: add aggregation hints TIDB_HASHAGG and TIDB_STREAMAGG #11364

Merged
merged 33 commits into from Aug 7, 2019

Conversation

foreyes
Copy link
Contributor

@foreyes foreyes commented Jul 22, 2019

What problem does this PR solve?

Add Optimizer Hints TIDB_HASHAGG and TIDB_STREAMAGG.

What is changed and how it works?

Handle the hint from parser, and enforce planner to choose the aggregation type.
Related parser PR: pingcap/parser#394

mysql> explain select count(*) from t t1, t t2 where t1.a = t2.b;
+--------------------------+----------+------+--------------------------------------------------------------------+
| id                       | count    | task | operator info                                                      |
+--------------------------+----------+------+--------------------------------------------------------------------+
| StreamAgg_13             | 1.00     | root | funcs:count(1)                                                     |
| └─HashLeftJoin_26        | 12500.00 | root | inner join, inner:TableReader_20, equal:[eq(test.t1.a, test.t2.b)] |
|   ├─TableReader_22       | 10000.00 | root | data:TableScan_21                                                  |
|   │ └─TableScan_21       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo        |
|   └─TableReader_20       | 10000.00 | root | data:TableScan_19                                                  |
|     └─TableScan_19       | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo        |
+--------------------------+----------+------+--------------------------------------------------------------------+
6 rows in set (0.00 sec)

mysql> explain select /*+ TIDB_HASHAGG() */ count(*) from t t1, t t2 where t1.a = t2.b;
+--------------------------+----------+------+--------------------------------------------------------------------+
| id                       | count    | task | operator info                                                      |
+--------------------------+----------+------+--------------------------------------------------------------------+
| HashAgg_11               | 1.00     | root | funcs:count(1)                                                     |
| └─HashLeftJoin_15        | 12500.00 | root | inner join, inner:TableReader_18, equal:[eq(test.t1.a, test.t2.b)] |
|   ├─TableReader_20       | 10000.00 | root | data:TableScan_19                                                  |
|   │ └─TableScan_19       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo        |
|   └─TableReader_18       | 10000.00 | root | data:TableScan_17                                                  |
|     └─TableScan_17       | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo        |
+--------------------------+----------+------+--------------------------------------------------------------------+
6 rows in set (0.01 sec)
mysql> explain select count(t1.a) from t t1, t t2 where t1.a = t2.a*2 group by t1.a;
+--------------------------+----------+------+---------------------------------------------------------------------------+
| id                       | count    | task | operator info                                                             |
+--------------------------+----------+------+---------------------------------------------------------------------------+
| HashAgg_13               | 8000.00  | root | group by:test.t1.a, funcs:count(test.t1.a)                                |
| └─HashLeftJoin_16        | 12500.00 | root | inner join, inner:Projection_21, equal:[eq(test.t1.a, mul(test.t2.a, 2))] |
|   ├─TableReader_20       | 10000.00 | root | data:TableScan_19                                                         |
|   │ └─TableScan_19       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo               |
|   └─Projection_21        | 10000.00 | root | test.t2.a, mul(test.t2.a, 2)                                              |
|     └─TableReader_23     | 10000.00 | root | data:TableScan_22                                                         |
|       └─TableScan_22     | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo               |
+--------------------------+----------+------+---------------------------------------------------------------------------+
7 rows in set (0.00 sec)

mysql> explain select /*+ TIDB_STREAMAGG() */ count(t1.a) from t t1, t t2 where t1.a = t2.a*2 group by t1.a;
+----------------------------+----------+------+---------------------------------------------------------------------------+
| id                         | count    | task | operator info                                                             |
+----------------------------+----------+------+---------------------------------------------------------------------------+
| StreamAgg_15               | 8000.00  | root | group by:test.t1.a, funcs:count(test.t1.a)                                |
| └─Sort_24                  | 12500.00 | root | test.t1.a:asc                                                             |
|   └─HashLeftJoin_16        | 12500.00 | root | inner join, inner:Projection_21, equal:[eq(test.t1.a, mul(test.t2.a, 2))] |
|     ├─TableReader_20       | 10000.00 | root | data:TableScan_19                                                         |
|     │ └─TableScan_19       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo               |
|     └─Projection_21        | 10000.00 | root | test.t2.a, mul(test.t2.a, 2)                                              |
|       └─TableReader_23     | 10000.00 | root | data:TableScan_22                                                         |
|         └─TableScan_22     | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo               |
+----------------------------+----------+------+---------------------------------------------------------------------------+
8 rows in set (0.00 sec)

Check List

Tests

  • Unit test

Code changes

  • Change plan builder to handling aggregation hints.
  • Change exhaust physical plan to apply aggregation hints.

Side effects

  • Change optimizer behaviors.

Related changes

  • Add new rule in parser

@codecov
Copy link

codecov bot commented Jul 22, 2019

Codecov Report

Merging #11364 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #11364   +/-   ##
===========================================
  Coverage   81.6243%   81.6243%           
===========================================
  Files           426        426           
  Lines         93640      93640           
===========================================
  Hits          76433      76433           
  Misses        11807      11807           
  Partials       5400       5400

@foreyes foreyes force-pushed the dev/add_agg_hints branch 2 times, most recently from ecd3e00 to ff1f239 Compare July 23, 2019 07:05
@foreyes
Copy link
Contributor Author

foreyes commented Jul 24, 2019

/run-all-tests

@foreyes foreyes requested review from alivxxx and zz-jason July 24, 2019 06:19
@foreyes
Copy link
Contributor Author

foreyes commented Jul 24, 2019

PTAL. @zz-jason @lamxTyler

@foreyes foreyes changed the title [WIP] planner: add aggregation hints TIDB_HASHAGG and TIDB_STREAMAGG planner: add aggregation hints TIDB_HASHAGG and TIDB_STREAMAGG Jul 24, 2019
planner/core/exhaust_physical_plans.go Outdated Show resolved Hide resolved
planner/core/logical_plans.go Outdated Show resolved Hide resolved
@foreyes
Copy link
Contributor Author

foreyes commented Jul 24, 2019

/run-all-tests

planner/core/logical_plans.go Outdated Show resolved Hide resolved
planner/core/logical_plan_builder.go Outdated Show resolved Hide resolved
planner/core/exhaust_physical_plans.go Outdated Show resolved Hide resolved
@zz-jason zz-jason removed their request for review July 24, 2019 08:39
@foreyes
Copy link
Contributor Author

foreyes commented Jul 24, 2019

Code improved, PTAL. @zz-jason @XuHuaiyu

Copy link
Contributor

@XuHuaiyu XuHuaiyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@XuHuaiyu
Copy link
Contributor

We should update the version of parser in go.mod before merging this commit.

@XuHuaiyu XuHuaiyu added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 25, 2019
@foreyes foreyes force-pushed the dev/add_agg_hints branch 3 times, most recently from 80af97d to 74c1bf3 Compare July 26, 2019 09:52
executor/join_test.go Outdated Show resolved Hide resolved
@foreyes
Copy link
Contributor Author

foreyes commented Aug 6, 2019

PTAL. @XuHuaiyu @eurekaka

all, desc := prop.AllSameOrder()
if len(la.possibleProperties) == 0 || !all {
if !all {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not resolved?

@@ -1570,3 +1570,71 @@ func (s *testPlanSuite) TestIndexJoinHint(c *C) {
}
}
}

func (s *testPlanSuite) TestAggregationHints(c *C) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test case which contains subquery?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first one, because possibleChildProperties are only possible... We'd better not rely too much on it, I handle this in line 1272 - 1275, you can take a look.

For the test case, I will add them soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When adding test case, find another bug. Fix it soon...

@foreyes
Copy link
Contributor Author

foreyes commented Aug 6, 2019

Add tests and fix a Merge Join bug, PTAL. @eurekaka @XuHuaiyu

Copy link
Contributor

@eurekaka eurekaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eurekaka eurekaka added the status/LGT1 Indicates that a PR has LGTM 1. label Aug 6, 2019
Copy link
Contributor

@XuHuaiyu XuHuaiyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@foreyes
Copy link
Contributor Author

foreyes commented Aug 7, 2019

/run-all-tests

@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Aug 7, 2019

I'm still curious that, is this expectable?
Or, this will be fixed in another PR?

CREATE TABLE `t` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  KEY `a` (`a`)
);
tidb> desc select /*+ TIDB_HJ(t)  */ a, count(b) from t group by a order by a;
+--------------------------+----------+------+--------------------------------------------------------------------+
| id                       | count    | task | operator info                                                      |
+--------------------------+----------+------+--------------------------------------------------------------------+
| Projection_22            | 8000.00  | root | test.t.a, 2_col_0                                                  |
| └─StreamAgg_24           | 8000.00  | root | group by:test.t.a, funcs:count(test.t.b), firstrow(test.t.a)       |
|   └─Projection_21        | 10000.00 | root | test.t.a, test.t.b                                                 |
|     └─IndexLookUp_20     | 10000.00 | root |                                                                    |
|       ├─IndexScan_18     | 10000.00 | cop  | table:t, index:a, range:[NULL,+inf], keep order:true, stats:pseudo |
|       └─TableScan_19     | 10000.00 | cop  | table:t, keep order:false, stats:pseudo                            |
+--------------------------+----------+------+--------------------------------------------------------------------+
6 rows in set, 1 warning (0.00 sec)

tidb> show warnings;
+---------+------+-----------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                               |
+---------+------+-----------------------------------------------------------------------------------------------------------------------+
| Warning | 1815 | There are no matching table names for (t) in optimizer hint /*+ TIDB_HJ(t) */. Maybe you can use the table alias name |
+---------+------+-----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

@XuHuaiyu XuHuaiyu added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Aug 7, 2019
@foreyes foreyes merged commit a530f87 into pingcap:master Aug 7, 2019
@foreyes foreyes deleted the dev/add_agg_hints branch August 7, 2019 02:51
@foreyes
Copy link
Contributor Author

foreyes commented Aug 7, 2019

I'm still curious that, is this expectable?
Or, this will be fixed in another PR?

CREATE TABLE `t` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  KEY `a` (`a`)
);
tidb> desc select /*+ TIDB_HJ(t)  */ a, count(b) from t group by a order by a;
+--------------------------+----------+------+--------------------------------------------------------------------+
| id                       | count    | task | operator info                                                      |
+--------------------------+----------+------+--------------------------------------------------------------------+
| Projection_22            | 8000.00  | root | test.t.a, 2_col_0                                                  |
| └─StreamAgg_24           | 8000.00  | root | group by:test.t.a, funcs:count(test.t.b), firstrow(test.t.a)       |
|   └─Projection_21        | 10000.00 | root | test.t.a, test.t.b                                                 |
|     └─IndexLookUp_20     | 10000.00 | root |                                                                    |
|       ├─IndexScan_18     | 10000.00 | cop  | table:t, index:a, range:[NULL,+inf], keep order:true, stats:pseudo |
|       └─TableScan_19     | 10000.00 | cop  | table:t, keep order:false, stats:pseudo                            |
+--------------------------+----------+------+--------------------------------------------------------------------+
6 rows in set, 1 warning (0.00 sec)

tidb> show warnings;
+---------+------+-----------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                               |
+---------+------+-----------------------------------------------------------------------------------------------------------------------+
| Warning | 1815 | There are no matching table names for (t) in optimizer hint /*+ TIDB_HJ(t) */. Maybe you can use the table alias name |
+---------+------+-----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

I know this case, it's expected, but looks weird, I will fix it in another PR. @XuHuaiyu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner status/LGT2 Indicates that a PR has LGTM 2. type/new-feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants