Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-482] Implement sql and planner hints #1354

Open
wants to merge 3 commits into
base: master
from

Conversation

@danny0405
Copy link
Contributor

commented Aug 4, 2019

Copy link

left a comment

Glad to see your effort driving this long-expected feature! Here're my questions for the overall implementation(Please correct me if I understand incorrectly):

  1. The HintStrategy & HintStrategies abstraction: I understand this try to make the hint strategy some how pluggable but limited to pre-defined types (QUERY, JOIN, TABLE_SCAN and PROJECT) that different engines can not customize it since they may have various requirements. I think we should figure out a more flexible way to target this.

  2. I like the idea of the two kinds of hints: top-level (query) hint & table hint that most RDBMS vendors support, I'm not sure if necessary to add the hint types for current SqlHint?

  3. The inheritPath in RelHint looks designed to represent the hints inheritance, but in current implementation the hints propagation is incomplete that only Join, TableScan and Project node will store the corresponding hints info, so the inheritPath is broken, is it as your expects?

* The hint is used for the whole query, kind of like a query config.
* This kind of hints would never be propagated.
*/
QUERY(RelNode.class),

This comment has been minimized.

Copy link
@lincoln-lil

lincoln-lil Aug 6, 2019

Here OTHER or UNDEFINED maybe a more proper one?

This comment has been minimized.

Copy link
@danny0405

danny0405 Aug 6, 2019

Author Contributor

OTHER and UNDEDINED are both adjective and ambiguous. What does it mean for OTHER, i have no idea when i first saw this word, instead QUERY tell us that this hint is used for the whole sql query statement.

This comment has been minimized.

Copy link
@hsyuan

hsyuan Aug 16, 2019

Member

It is more like SET_VAR.

This comment has been minimized.

Copy link
@danny0405

danny0405 Oct 16, 2019

Author Contributor

I agree, thanks.

*
* <p>It can be attached with a sql hint statement, see {@link SqlHint} for details.
*/
public class SqlTableRef extends SqlCall {

This comment has been minimized.

Copy link
@lincoln-lil

lincoln-lil Aug 6, 2019

tableRef can be seen as a table reference but not related to the hints, use a more explicitly one: SqlTableWithHint ?


@Override public ImmutableList<RelHint> getHints() {
return hints;
}

This comment has been minimized.

Copy link
@chunweilei

chunweilei Aug 6, 2019

Contributor

What is the behavior after a rule fires? Is the RelNode still keep the hints?

This comment has been minimized.

Copy link
@danny0405

danny0405 Aug 6, 2019

Author Contributor

If this RelNode is copied, new copied node would keep the hints, but if you create a new node, you should pass the hints explicitly.

This comment has been minimized.

Copy link
@danny0405

danny0405 Aug 6, 2019

Author Contributor

Maybe i should write a design doc, this seems a little complicated.

final String sql = "select "
+ "/*+ properties(k1='v1', k2='v2'), no_hash_join, Index(idx1, idx2) */ "
+ "empno, ename, deptno from emps";
final String expected = "SELECT\n"

This comment has been minimized.

Copy link
@chunweilei

chunweilei Aug 6, 2019

Contributor

What does properties(k1='v1', k2='v2') mean?

This comment has been minimized.

Copy link
@danny0405

danny0405 Aug 6, 2019

Author Contributor

It is a fake hint now, i just add it to have a test. For this demo hint, it means a hint with name properties and options k1=v1, k2=v2

@danny0405

This comment has been minimized.

Copy link
Contributor Author

commented Aug 6, 2019

Glad to see your effort driving this long-expected feature! Here're my questions for the overall implementation(Please correct me if I understand incorrectly):

  1. The HintStrategy & HintStrategies abstraction: I understand this try to make the hint strategy some how pluggable but limited to pre-defined types (QUERY, JOIN, TABLE_SCAN and PROJECT) that different engines can not customize it since they may have various requirements. I think we should figure out a more flexible way to target this.
  2. I like the idea of the two kinds of hints: top-level (query) hint & table hint that most RDBMS vendors support, I'm not sure if necessary to add the hint types for current SqlHint?
  3. The inheritPath in RelHint looks designed to represent the hints inheritance, but in current implementation the hints propagation is incomplete that only Join, TableScan and Project node will store the corresponding hints info, so the inheritPath is broken, is it as your expects?
  1. HintStrategy is not designed to be pluggable, i think no one knows how to make it to be, if you want to support a new node, there needs some modifications(the node and the logic to propagate), then just add a new enum val.
  2. The HintStrategy is already kind of hint types, is can be used to distinguish what kind of node this hint expects to apply to, what the usage of what you proposed ?
  3. Yes, it is expected, AFAIK, the only usage of inheritPath is to resolve conflict when there are multiple same name hints for a node. And what do you mean by broken, the inherit path is complete (from the root node).
@lincoln-lil

This comment has been minimized.

Copy link

commented Aug 6, 2019

  1. HintStrategy is not designed to be pluggable, i think no one knows how to make it to be, if you want to support a new node, there needs some modifications(the node and the logic to propagate), then just add a new enum val.
  2. The HintStrategy is already kind of hint types, is can be used to distinguish what kind of node this hint expects to apply to, what the usage of what you proposed ?
  3. Yes, it is expected, AFAIK, the only usage of inheritPath is to resolve conflict when there are multiple same name hints for a node. And what do you mean by broken, the inherit path is complete (from the root node).
  1. How about using the existing SqlKind enum for the HintStrategy? Then different engines can register their own kind of node for hints propagation strategy. Thus HintStrategy can be an abstract class, what do you think?
  2. I do not have a strong opinion on the type flag directly in SqlHint, using HintStrategy to identify the type is ok for me.
  3. Yes, the 'inheritPath' for the whole RelNode tree can be complete by simple calculating since few RelNodes stored the hints.
/**
* The hint would be propagated to the Project nodes.
*/
PROJECT(Project.class);

This comment has been minimized.

Copy link
@hsyuan

hsyuan Aug 16, 2019

Member

What is this for? Any example?

This comment has been minimized.

Copy link
@danny0405

danny0405 Oct 18, 2019

Author Contributor

In our production, we need to config special resource for some UDF, say, the GPU. It would be very convenient if we have such hint.


private Class<?> relClass;

HintStrategy(Class relClass) {

This comment has been minimized.

Copy link
@hsyuan

hsyuan Aug 16, 2019

Member

HintStrategy is confusing, HintType is more appropriate.

This comment has been minimized.

Copy link
@danny0405

danny0405 Oct 16, 2019

Author Contributor

Okey, thanks, I still think HintStrategy is better name, because they can be used orthogonally.

* The hint is used for the whole query, kind of like a query config.
* This kind of hints would never be propagated.
*/
QUERY(RelNode.class),

This comment has been minimized.

Copy link
@hsyuan

hsyuan Aug 16, 2019

Member

It is more like SET_VAR.

@danny0405

This comment has been minimized.

Copy link
Contributor Author

commented Aug 16, 2019

Thanks so much for the review @hsyuan , i'm preparing for the design doc, which should be in front of the PR first, sorry for that, the doc is ready soon.

@danny0405 danny0405 force-pushed the danny0405:CALCITE-482 branch 2 times, most recently from 8be8521 to be21835 Sep 7, 2019
@julianhyde julianhyde force-pushed the apache:master branch from 2c19098 to f3758a6 Sep 16, 2019
@danny0405 danny0405 force-pushed the danny0405:CALCITE-482 branch 2 times, most recently from d09f92c to ef10a3c Oct 11, 2019
import org.apache.calcite.rel.core.TableScan;

/**
* A hint strategy that specifies which kind of relational

This comment has been minimized.

Copy link
@xndai

xndai Oct 15, 2019

Contributor

Actually I am not very sure how useful to apply hints on a relational node without considering its inputs. Based on my experiences, almost all relational hints would be specific to a particular node (such as a join node with specific inputs).

This comment has been minimized.

Copy link
@hsyuan

hsyuan Oct 15, 2019

Member

Some may not attach to a specific node, like parallelism, memory

This comment has been minimized.

Copy link
@danny0405

danny0405 Oct 16, 2019

Author Contributor

For parallelism and memory, we have a node type NodeType.QUERY which indicates that the hint would not propagate.

This comment has been minimized.

Copy link
@danny0405

danny0405 Oct 16, 2019

Author Contributor

Oops, maybe with the name SET_VAR you suggested.

* expression with special match conditions(not only the relational expression type).
* i.e. "hash_join(r, st)", this hint can only be applied to JOIN expression that
* has "r" and "st" as the input table names. To implement this, you may need to customize an
* {@link ExplicitHintStrategy} with an {@link ExplicitHintMatcher}.

This comment has been minimized.

Copy link
@xndai

xndai Oct 15, 2019

Contributor

In your example hash_join(r, st), I think a better solution will be attaching this hint to the particular join node during parsing phase. With current design, it's still too much burden on the user to build a custom matcher for handling cases like this.

This comment has been minimized.

Copy link
@danny0405

danny0405 Oct 16, 2019

Author Contributor

There is no way to figurate out the hint strategy during sql node parsing. During sql-to-rel conversion, we have information like:

  1. What the hint arguments is, for hash_join(r, st), the arguments are "r" and "st"
  2. What kind of node this hint want to apply for, for hash_join(r, st), it is a JOIN node

So i passed all these info to use and let them customize the strategies. Of cause we can have some default ExplicitHintMatcher implementations, but i don't think we can cover all the impls by default, the join node maybe very complex, such as multi-join, join with subQuery.

This comment has been minimized.

Copy link
@xndai

xndai Oct 16, 2019

Contributor

I think fundamentally the question is whether we want to treat some certain hints as part of calcite language standard. If hash_join() is one of that supported by calcite, we then don't need user to supply an ExplicitHintMatcher and calcite defines the hinting syntax for hash_join(). To me this is preferable as it standardize some basic and useful hints. Otherwise one implementation can choose to use hash_join(r, st), and the other can use hashjoin(r st). That would create a lot of confusion.

This comment has been minimized.

Copy link
@danny0405

danny0405 Oct 17, 2019

Author Contributor

Thanks, I agree we should add some default hints in the very near future, for this patch, what i want to make sure is that the "framework" works correctly, after we finished this, i think it is not big work to add some very common builtin hints.

@@ -52,11 +55,12 @@
* The set of output rows is a subset of the cartesian product of the two
* inputs; precisely which subset depends on the join condition.
*/
public abstract class Join extends BiRel {
public abstract class Join extends BiRel implements RelWithHint {

This comment has been minimized.

Copy link
@xndai

xndai Oct 15, 2019

Contributor

Will hint be part of relnode digest?

This comment has been minimized.

Copy link
@danny0405

danny0405 Oct 16, 2019

Author Contributor

For current implementation it is not, so there may be some problem when there are two same digest nodes that just differ with hints during planner planning, i know it is a problem, i'm just hesitant to add hints to digest because it would be very redundant and plan would looks too complex.(Not every hints on the node would meaningful, say, it's just a hint and not forced attribute).

I'm considering if i can give an internal implementation for digest + hints node reuse without modify the node digests.

This comment has been minimized.

Copy link
@xndai

xndai Oct 16, 2019

Contributor

It also hurts debugability when hints are not included in the digest - there's no way to tell if a relnode has hint associated by just looking at the generated plan. There are both pros and cons. I agree we need more thinkings on this one.

@@ -88,6 +89,7 @@ protected VolcanoRuleCall(

// implement RelOptRuleCall
public void transformTo(RelNode rel, Map<RelNode, RelNode> equiv) {
rel = RelOptUtil.copyRelHints(rels[0], rel);

This comment has been minimized.

Copy link
@xndai

xndai Oct 15, 2019

Contributor

Is it safe to assume the original hint alway coming from rels[0]?

This comment has been minimized.

Copy link
@xndai

xndai Oct 16, 2019

Contributor

Also is it possible to customize hint propagation? Looks not possible for a rule since it happens within transformTo().

This comment has been minimized.

Copy link
@danny0405

danny0405 Oct 16, 2019

Author Contributor

I only consider to copy hints of the rule matched root node(here it is the rels[0]), because it is hard to figure out what transformations user would do during the rule plan re-write, so just let use manage the hints by-themselves.

What we supply is a default behavior that copy and propagate the hints in the root node.

This comment has been minimized.

Copy link
@xndai

xndai Oct 16, 2019

Contributor

I understand you try to provide a reasonable default behavior. All I am saying is when this behavior is not desired, the calcite user should have a way to override it. But it looks like not possible with this implementation.

This comment has been minimized.

Copy link
@hsyuan

hsyuan Oct 17, 2019

Member

In FilterJoinRule, if we have a hint called selectivity, after the filter is pushed into join inputs, you will copy selectivity hint to Join node, which is undesired.

This comment has been minimized.

Copy link
@danny0405

danny0405 Oct 17, 2019

Author Contributor

@hsyuan, yes, i agree we should give some default strategies for copying here, not just simplify copy without any filtering.

BTW, does selectivity suitable as a hint ? It is a metadata which is varying during planning, it is non-stable, so how can it be a valid value when we already do many re-write. Or do you mean the filter condition's selectivity ? Then user can copy them by themselves in the FilterJoinRule.

This comment has been minimized.

Copy link
@danny0405

danny0405 Oct 17, 2019

Author Contributor

@xndai Thanks for pointing out, i think we should give some default strategies for copying and also make the strategies can be overridden.

This comment has been minimized.

Copy link
@hsyuan

hsyuan Oct 17, 2019

Member

I mean the filter condition's selectivity. like https://www.ibm.com/developerworks/data/library/tips/dm-0312yip/

This comment has been minimized.

Copy link
@danny0405

danny0405 Oct 18, 2019

Author Contributor

@hsyuan @xndai I have made the hints propagation in planning rule totally customizable, see method RelOptRuleCall.transformTo(RelNode, Map<RelNode, RelNode>, BiFunction<RelNode, RelNode, RelNode>)

I don't know that the selectivity is after applying FilterJoinRule, but if anyone knows that , he can customize through a given hander function to #transformTo.

Copy link
Member

left a comment

Any use cases for HintStrategy. cascade()? Any example for a hint on Project? A hint on Project node doesn't make sense to me.

@danny0405 danny0405 force-pushed the danny0405:CALCITE-482 branch from ef10a3c to 3849f88 Oct 18, 2019
@danny0405

This comment has been minimized.

Copy link
Contributor Author

commented Oct 18, 2019

For use case HintStrategy. cascade see SqlHintsConverterTest.createHintStrategies.
For Project hint, in our production, we have some special UDF which needs custom resource, i.e. the GPU, so we need special hint for this project.

@danny0405 danny0405 force-pushed the danny0405:CALCITE-482 branch 4 times, most recently from 2fbb190 to dccc428 Oct 18, 2019
danny0405 added 3 commits Aug 4, 2019
* Add a new interface RelOptRuleCall.transformTo(RelNode, Map, BiFunction) to
make the hints copy strategy overridable
* Cache the hint strategies into RelOptCluster so that user can query the
strategies during rule planning
* Rename HintStrategies.QUERY to HintStrategies.SET_VAR
@danny0405 danny0405 force-pushed the danny0405:CALCITE-482 branch from dccc428 to 581c3be Oct 21, 2019
@hsyuan

This comment has been minimized.

Copy link
Member

commented Oct 22, 2019

  • I saw you are supporting WITH hint. I am against on supporting it. 1 single style /*+ */ would suffice.
  • RelWithHint gives me impression that it is still a RelNode class. Hintable or IHint or other name may be better? Similarly, RelHint is a relational node? But it is not. HintNode might be better?
@danny0405

This comment has been minimized.

Copy link
Contributor Author

commented Oct 22, 2019

  • I saw you are supporting WITH hint. I am against on supporting it. 1 single style /*+ */ would suffice.
  • RelWithHint gives me impression that it is still a RelNode class. Hintable or IHint or other name may be better? Similarly, RelHint is a relational node? But it is not. HintNode might be better?
  • The WITH has already been removed based on Julian's comments.
  • RelWithHint means a relational node with hint, the sub-class of it are exactly all relational nodes; for RelHint it has a sql node counter-part named SqlHint, RelHint means hint of a RelNode, i do not think the name is ambiguous
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.