Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-4531: Add a Drill customized rule for pushing filter past aggre… #444

Merged
merged 1 commit into from
Mar 25, 2016

Conversation

jinfengni
Copy link
Contributor

…gate


import org.apache.calcite.rel.rules.FilterAggregateTransposeRule;

public class DrillFilterAggregateTransposeRule{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this rule does not have an onMatch(), can you add comments about the justification ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the only thing we are doing is creating a FilterAggregateTransposeRule with a custom filter factory, does it make sense to simply create this in planner phase rather than having a custom rule class? It seems like the rule class isn't actually a rule.

Also, can you put a little information next to the rule definition of what happens if we don't use a custom factory? It would seem like the existing transpose rule would have worked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The customized rule is not only adding a customer filter factory, it also specifies the filter / aggregate has to be in Drill Logical. The default rule patten will match Filter and Aggregate [1]. In debug, we saw that the default one will match with a LogicalFilter on top of DrillAggregate, which I feel is not right. We do not want to match Rels with mixed conventions.

@sudheeshkatkam helped track to the following error, which seems indicate the rule matching a LogicalFilter /DrillAggregate. Then, exception handling seems hit a cyclic somehow, and hang there forever.

The intention of this customized rule is to ensure that we do not match rel nodes with mixed convention.

Internal error: Error while applying rule FilterAggregateTransposeRule, args [rel#217:LogicalFilter.NONE.ANY([]).[[]](input=rel#126:Subset#5.LOGICAL.ANY%28[]%29.[],condition=IS NOT NULL%28$1%29), rel#6972:DrillAggregateRel.LOGICAL.ANY([]).[](input=rel#6971:Subset#1714.NONE.ANY%28[]%29.[],group={0, 1})]

[1] https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/FilterAggregateTransposeRule.java#L52-L53

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that restricting the rule patten to Drill logical only would have side effect, since there is some filter pushdown rule works under calcite logical world.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So probably two instances, one that is concrete Calcite logical and one that is concrete Drill logical?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. that's something I'm thinking of.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacques-n @amansinha100 , could you please take another look at this PR? I modified the patch based on the discussion we had.

In stead of creating two instances (which causes Calcite complain duplicate rule descriptions error), I override match() method to check the rels have the same convention trait.

As a separate work, I'm working on DRILL-3855, to re-enable two rules (FilterSetOpTranspose, ProjectSetOpTranspose). These rules were disabled in DRILL-3257, since it caused TPC-DS Q74 to hang in planning, very similar to what was reported in this JIRA.

@amansinha100
Copy link

+1. Are there other rules that also may be getting applied with mixed convention (Calcite logical and Drill logical) ? I am surprised that we don't encountered more such cases..

@jinfengni
Copy link
Contributor Author

We did have other rules getting applied mixed convention. For DRILL-3855, I'll fix push filter/project past setop rules. I'll open another JIRA to go through all the other rules.

Even with the mixed convention, the issue reported in this jira did not always happen for every query. That's probably why we did not encountered more such cases before.

@jacques-n
Copy link
Contributor

I think there is another bug here. DrillAggregateRel should assert when a none convention is passed in. If you add that assertion where do we fail? We should fix that

@jacques-n
Copy link
Contributor

In general, this pattern should fail earlier. There should be an AbstractConverter between these rels. We should probably make sure that a rels assert their convention to avoid this type of bug earlier.

@jinfengni
Copy link
Contributor Author

Adding assertion to DrillAggregateRel did not show new failure when using the default FilterAggregateTransposeRule. In the rule, copy() method is used to create new instance of aggregate. So, the aggregate created by this rule should not violate the convention check.

The patten of mixed rels does happen and it's the expected thing, since Drill's logical planning is dealing with two convention: NONE and LOGICAL. So, at any time, the RelSet would have RelSubSets for both conventions. It's normal to see a LogicalFilter whose input is a RelSubset with a DrillAggregate as a member.

The problem is not we should not have such mixed patten. In stead, the problem is that the Rule should not match such pattern. If the rule matches such mixed Rel, then looks like it could cause problem. With the default rule, the trace of running 50 seconds shows FilterAggregateTransposeRule is fired abnormally high (it could be even high if it does not timeout)
1055 [FilterAggregateTransposeRule]
45 [DrillPushProjectPastFilterRule]
38 [ProjectMergeRule:force_mode]
32 [DrillReduceExpressionsRule(Filter)]
30 [ReduceExpressionsRule_Project]
30 [DrillProjectRule]
26 [DrillAggregateRule]

If we add the restriction, then the new rule was just fired coupled of times.

@jacques-n
Copy link
Contributor

Your expectation about mixed conventions is inconsistent with my own. My expectation: a RelSubset should only have a single convention (in fact it should have only a single trait definition and all children should have satisfying trait definitions). I would expect the following:

(1) Prels should only have PHYSICAL
(2) Drels should only have LOGICAL
(3) Calcite Logical Rels should only have NONE

My previous statement mistook the convention of the DrillAggregateRel in the below. I thought it said NONE.

[rel#217:LogicalFilter.NONE.ANY([]).[], rel#6972:DrillAggregateRel.LOGICAL.ANY([]).]

It seems like the actual problem is that LogicalFilter is allowing an input of LOGICAL. I don't believe that LOGICAL should satisfy NONE. It seems that LogicalFilter should assert that its child is of Convention NONE. (In other words, a LogicalFilter shouldn't have Convention transformation properties.) Otherwise, we'll see a large number of invalid (but infinite) plans.

@jinfengni
Copy link
Contributor Author

I agree that your expectation for RelSubset makes sense to me. However, for now it does not happen that way. The following is the trace for the query which went through planning with this customized rule (I removed some rels).

Set#1, type: RecordType(BIGINT custkey, ANY custAddress)
...
Set#2, type: (DrillRecordRow[, l_orderkey, l_partkey, l_linenumber])
rel#273:Subset#2.LOGICAL.ANY([]).[], best=rel#442, importance=0.31381059609000006
rel#275:AbstractConverter.LOGICAL.ANY([])., rowcount=100.0, cumulative cost={inf}
rel#442:DrillScanRel.LOGICAL.ANY([]).[](table=[cp, tpch/lineitem.parquet],groupscan=ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, usedMetadataFile=false, columns=[*]]), rowcount=60175.0, cumulative cost={60175.0 rows, 6.0175E8 cpu, 0.0 io, 0.0 network, 0.0 memory}
Set#3, type: (DrillRecordRow[
, l_orderkey, l_partkey, l_linenumber])
rel#82:Subset#3.NONE.ANY([]).[], best=null, importance=0.3874204890000001
rel#345:LogicalFilter.NONE.ANY([]).[[]](input=rel#273:Subset#2.LOGICAL.ANY%28[]%29.[],condition=AND%28>=%28$1, 20160101%29, <=%28$2, 20160301%29, OR%28=%28$2, 1%29, =%28$2, 2%29, =%28$2, 5%29, =%28$2, 6%29%29%29), rowcount=6.25, cumulative cost={inf}

rel#345:LogicalFilter has a child rel#273 with LOGICAL convention.

As another example, for the following query:

Select n_name, n_nationkey from cp.tpch/nation.parquet where n_nationkey > 5

The trace:
Set#0, type: (DrillRecordRow[, n_nationkey, n_name])
...
Set#1, type: (DrillRecordRow[
, n_nationkey, n_name])
rel#21:Subset#1.NONE.ANY([]).[], best=null, importance=0.81
rel#20:LogicalFilter.NONE.ANY([]).[](input=rel#19:Subset#0.ENUMERABLE.ANY%28[]%29.[],condition=>%28$1, 5%29), rowcount=50.0, cumulative cost={inf}
rel#37:LogicalFilter.NONE.ANY([]).[[]](input=rel#19:Subset#0.ENUMERABLE.ANY%28[]%29.[],condition=>%28$1, 5%29), rowcount=50.0, cumulative cost={inf}
rel#35:Subset#1.LOGICAL.ANY([]).[], best=rel#60, importance=0.81
rel#60:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#57:Subset#0.LOGICAL.ANY%28[]%29.[],condition=>%28$1, 5%29), rowcount=50.0, cumulative cost={125.0 rows, 250600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}
Set#2, type: RecordType(ANY n_name, ANY n_nationkey)
rel#23:Subset#2.NONE.ANY([]).[], best=null, importance=0.9
rel#53:LogicalProject.NONE.ANY([])., rowcount=50.0, cumulative cost={inf}

Again, rel#53:LogicalProject has a child rel#35 whose convention is LOGICAL.

I think the reason that we have such mixed rels is we have different kinds of rules, used in a single Volcano planning phase.

  1. Rule matchs base class Filter/Project, etc only.
  2. Rule matches LogicalFilter/LogicalProject, etc
  3. Rule uses copy() method to generate a new Rel
  4. Rule uses RelFactory to generate a new Rel.
  5. convent rule, which convert from Calcite logical (NONE/Enumerable) to Drill logical (LOGICAL)

For instance, ProjectMergeRule, which matches base Project, yet uses default RelFactory, will match both LogicalProject and DrillProject, but produce LogicalProject as outcome. That will cause the mixed rels.

2 things we may consider to fix this:

  1. Separate the convent rules from the other transformation rules. Apply convert rule first, then transformation rule match DrillLogical only. That's similar to what other system (hive) is doing.
  2. go through every rule we use, and we need make sure the convention of input and ouptput of a transformation rule should be same, except for the convert rule.

The above 2 things would take some considerably effort, though.

@jacques-n
Copy link
Contributor

Yeah, you're right and thanks for reviewing fully. We can shelve for now but this definitely seems like something we should get to.

+1 on the patch.

@jinfengni
Copy link
Contributor Author

Thanks for the discussion, @jacques-n . I opened DRILL-4541 to track the issues we discussed. I'll work on it and see if I can find a solution.

@asfgit asfgit merged commit 7e6de2b into apache:master Mar 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants