Skip to content

[CALCITE-3963] Maintains logical properties at RelSet (equivalent gro…#1992

Open
xndai wants to merge 4 commits intoapache:mainfrom
xndai:calcite-3963
Open

[CALCITE-3963] Maintains logical properties at RelSet (equivalent gro…#1992
xndai wants to merge 4 commits intoapache:mainfrom
xndai:calcite-3963

Conversation

@xndai
Copy link
Contributor

@xndai xndai commented May 29, 2020

…up) instead of RelNode

  1. Add new LogicalNode interface that supports reporting stats estimation confidence.
  2. Re-purpose set.rel and rename it into set.originalRel to report logical properties of RelSet.
  3. When a new RelNode is added to the set, we check the stats confidence of the new node, and update set.originalRel if it has a higher confidence level.
  4. Meta data handler will always report logical properties from set.originalRel for RelSubset.

CALCITE-3963

+ " EnumerableTableScan(table=[[hr, emps]])\n"
+ " EnumerableProject(deptno=[$0], name=[$1], employees=[$2], x=[$3.x], y=[$3.y])\n"
+ " EnumerableTableScan(table=[[hr, depts]])";
+ "EnumerableProject(empid=[$0], deptno=[$1], name=[$2], salary=[$3], commission=[$4], deptno0=[$5], name0=[$6], employees=[$7], location=[ROW($8, $9)], empid0=[$10], name1=[$11])\n"
Copy link
Contributor Author

@xndai xndai May 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In LoptOptimizeRule, it always swap inputs to make sure smaller input is on the right side, but this would cause the cost of hash join to increase, so we end up picking merge join as the best plan. Previously since the row count of MultiJoin always returns 1 (using the default estimateRowCount() implementation which is wrong), it was incorrectly treated as smaller input, and thus generated hash join plan. With this change, the row count is corrected, but based on the rule behavior and cost model, the best plan now is merge join plan. If hash join is expected, then the LoptOptimizeRule needs to be fixed.

The other two plan changes are due to the same issue.

* The logical properties of the RelSet, including row count, uniqueness, etc,
* are determined by this RelNode.
*/
RelNode originalRel;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is a little bit misleading. Before this patch, it is indeed original rel, but after this patch, it isn't original rel anymore. We could just call it as it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I find it awkward too. Do you have any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about continue using rel?

assert planner != null;

for (RelNode rel : getParentRels()) {
RelSet set = planner.getSet(rel);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is already pruned, can we skip?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@hsyuan hsyuan changed the title [CACLITE-3963] Maintains logical properties at RelSet (equivalent gro… [CALCITE-3963] Maintains logical properties at RelSet (equivalent gro… May 31, 2020
xndai added 4 commits June 15, 2020 11:15
…up) instead of RelNode

1. Add new LogicalNode interface that supports reporting stats estimation confidence.
2. Re-purpose set.rel and rename it into set.originalRel to report logical properties of RelSet.
3. When a new RelNode is added to the set, we check the stats confidence of the new node, and update set.originalRel if it has a higher confidence level.
4. Meta data handler will always report logical properties from set.originalRel for RelSubset.
EnumerableCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{7}])
EnumerableSort(sort0=[$1], dir0=[ASC])
EnumerableSort(sort0=[$1], dir0=[ASC])
EnumerableCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{7}])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correlate node doesn't implement row count estimate so it always returns 1 as the default implementation, which makes it the best plan with minimal cost. After this change, since we report stats from RelSet using Join row count, we are able to get the truly best plan according to current cost model.

final RelSubset subset = getOrCreateSubset(
rel.getCluster(), traitSet, rel.isEnforcer());
subset.add(rel);
checkAndUpdateOriginalRel(rel);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call seems duplicate with the call in addInternal, as subset.add(rel) will call addInternal?

/**
* Confidence levels of statistics estimation
*/
enum StatsEstimateConfidenceLevel {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the elements form a partial order?
If so, is it approriate to compare them using compareTo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants