[CALCITE-3963] Maintains logical properties at RelSet (equivalent gro…#1992
[CALCITE-3963] Maintains logical properties at RelSet (equivalent gro…#1992xndai wants to merge 4 commits intoapache:mainfrom
Conversation
| + " EnumerableTableScan(table=[[hr, emps]])\n" | ||
| + " EnumerableProject(deptno=[$0], name=[$1], employees=[$2], x=[$3.x], y=[$3.y])\n" | ||
| + " EnumerableTableScan(table=[[hr, depts]])"; | ||
| + "EnumerableProject(empid=[$0], deptno=[$1], name=[$2], salary=[$3], commission=[$4], deptno0=[$5], name0=[$6], employees=[$7], location=[ROW($8, $9)], empid0=[$10], name1=[$11])\n" |
There was a problem hiding this comment.
In LoptOptimizeRule, it always swap inputs to make sure smaller input is on the right side, but this would cause the cost of hash join to increase, so we end up picking merge join as the best plan. Previously since the row count of MultiJoin always returns 1 (using the default estimateRowCount() implementation which is wrong), it was incorrectly treated as smaller input, and thus generated hash join plan. With this change, the row count is corrected, but based on the rule behavior and cost model, the best plan now is merge join plan. If hash join is expected, then the LoptOptimizeRule needs to be fixed.
The other two plan changes are due to the same issue.
| * The logical properties of the RelSet, including row count, uniqueness, etc, | ||
| * are determined by this RelNode. | ||
| */ | ||
| RelNode originalRel; |
There was a problem hiding this comment.
The name is a little bit misleading. Before this patch, it is indeed original rel, but after this patch, it isn't original rel anymore. We could just call it as it is.
There was a problem hiding this comment.
Yep, I find it awkward too. Do you have any suggestions?
| assert planner != null; | ||
|
|
||
| for (RelNode rel : getParentRels()) { | ||
| RelSet set = planner.getSet(rel); |
There was a problem hiding this comment.
If it is already pruned, can we skip?
core/src/main/java/org/apache/calcite/plan/volcano/RelSubset.java
Outdated
Show resolved
Hide resolved
…up) instead of RelNode 1. Add new LogicalNode interface that supports reporting stats estimation confidence. 2. Re-purpose set.rel and rename it into set.originalRel to report logical properties of RelSet. 3. When a new RelNode is added to the set, we check the stats confidence of the new node, and update set.originalRel if it has a higher confidence level. 4. Meta data handler will always report logical properties from set.originalRel for RelSubset.
| EnumerableCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{7}]) | ||
| EnumerableSort(sort0=[$1], dir0=[ASC]) | ||
| EnumerableSort(sort0=[$1], dir0=[ASC]) | ||
| EnumerableCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{7}]) |
There was a problem hiding this comment.
Correlate node doesn't implement row count estimate so it always returns 1 as the default implementation, which makes it the best plan with minimal cost. After this change, since we report stats from RelSet using Join row count, we are able to get the truly best plan according to current cost model.
| final RelSubset subset = getOrCreateSubset( | ||
| rel.getCluster(), traitSet, rel.isEnforcer()); | ||
| subset.add(rel); | ||
| checkAndUpdateOriginalRel(rel); |
There was a problem hiding this comment.
This call seems duplicate with the call in addInternal, as subset.add(rel) will call addInternal?
| /** | ||
| * Confidence levels of statistics estimation | ||
| */ | ||
| enum StatsEstimateConfidenceLevel { |
There was a problem hiding this comment.
It seems the elements form a partial order?
If so, is it approriate to compare them using compareTo?
8a5cf83 to
cf7f71b
Compare
…up) instead of RelNode
CALCITE-3963