Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-4347: Propagate distinct row count for joins from logical plann… #671

Closed
wants to merge 1 commit into from

Conversation

Projects
None yet
5 participants
@amansinha100
Copy link
Contributor

commented Nov 29, 2016

…ing to physical planning phase.

@jacques-n

This comment has been minimized.

Copy link
Contributor

commented Nov 29, 2016

Random question: won't a caching metadata provider achieve the same thing without having to do per rel node changes? At least that is what I thought it was for. Maybe @julianhyde can provide additional feedback?

@amansinha100

This comment has been minimized.

Copy link
Contributor Author

commented Nov 29, 2016

It is quite likely that the CachingRelMetadataProvider is meant for this. Based on the stack trace, there are multiple instances of "at org.apache.calcite.rel.metadata.CachingRelMetadataProvider$CachingInvocationHandler.invoke(CachingRelMetadataProvider.java:132)" and that line # indicates that there was either a cache miss or the entry was stale. So, the caching provider does in fact get used but then subsequently gets stuck in the apply() method of the ReflectiveRelMetadataProvider. I did not attempt to debug why it got stuck there...partly because I am not very familiar with the way reflection is used in this provider. Hence, my fix is an attempt to circumvent the issue.

@julianhyde

This comment has been minimized.

Copy link
Contributor

commented Nov 29, 2016

Yes, CachingRelMetadataProvider is meant for this.

After https://issues.apache.org/jira/browse/CALCITE-604, providers are considerably more efficient (not called via reflection), and RelMetadataQuery contains a per-request cache (because some metadata does change, but slowly).

@amansinha100

This comment has been minimized.

Copy link
Contributor Author

commented Nov 29, 2016

Thanks @julianhyde... CALCITE-604 should potentially help with this. Drill's calcite version has not caught up to this yet. Let me confer with @jinfengni sometime next week (he is on vacation until then) and get back on what can be done to get this into Drill. In the meantime, even though my patch addresses the hang issue, I will hold it for now.

@amansinha100 amansinha100 force-pushed the amansinha100:DRILL-4347-1 branch from f203603 to b9d74bb Nov 29, 2016

@jinfengni

This comment has been minimized.

Copy link
Contributor

commented Dec 7, 2016

@jacques-n , CachingRelMetadataProvider provides caching capability per meta-method / rel node [1]. Since Drill logical rel (DrillJoinRel) and Drill physical rel (JoinPrel) are different rels, CachingRelMetadataProvider probably would not help avoiding the first meta data call for the physical rel nodes, even the meta data for logical rels are in the cache.

@julianhyde , I probably once tried to cherry-pick CALCITE-604 to Drill's calcite fork, and I aborted that effort after seeing many merging conflicts (If I remember correctly). Since there has been ongoing effort to rebase Drill onto latest Calcite, it might make sense to see if the rebase work could be done shortly. At that time, Drill will benefit from CALCITE-604.

  1. https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/CachingRelMetadataProvider.java#L113-L120
count = super.getDistinctRowCount(rel, groupKey, predicate);
}
} else {
count = super.getDistinctRowCount(rel, groupKey, predicate);

This comment has been minimized.

Copy link
@jinfengni

jinfengni Dec 7, 2016

Contributor

The API of RelMdDistinctRowCount seems to indicate the distinct rowcount depends on input of groupKey and predicate. However, the cached value in DrillJoinRel does not differentiate based on groupKey / predicate. Will it cause issue in the cases this getDistinctRowCount() is called multiple times with different groupKey / predicate?

@julianhyde

This comment has been minimized.

Copy link
Contributor

commented Dec 7, 2016

Rebasing onto Calcite is like running after a train: you can't just stop and take a rest. :)

And by the way, the state of Drill-Arrow integration makes me very sad. Now Drill has fallen behind there, I doubt whether it will ever catch up.

@jinfengni

This comment has been minimized.

Copy link
Contributor

commented Dec 7, 2016

Agreed with your comment about the importance of doing calcite rebasing. AFAIK, someone(s) else in Ukraine have been working on calcite rebasing for a while. Last time I heard is they managed to get a rebased calcite branch and are dealing with regressions on Drill side.

@KulykRoman , if possible, please provide a brief update on the current status of Calcite rebasing work.

@vvysotskyi

This comment has been minimized.

Copy link
Member

commented Apr 11, 2019

@amansinha100, since Drill has rebased onto the newer Calcite version, can this PR be moved forward?

@amansinha100

This comment has been minimized.

Copy link
Contributor Author

commented Apr 11, 2019

@vvysotskyi thanks for re-surfacing this. From the JIRA comments, this JIRA was fixed by DRILL-4678. I think we can close this PR. Let me know what you think.

@vvysotskyi

This comment has been minimized.

Copy link
Member

commented Apr 11, 2019

Ok, if this issue was already fixed, I agree that we can close this PR. Thanks for clarifying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.