Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add join-related DataSource types and analysis functionality. #9234

Closed
wants to merge 4 commits into from

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Jan 21, 2020

Builds on #9111 and implements the datasource analysis mentioned in #8728. Still can't
handle join datasources, but we're a step closer.

Join-related DataSource types:

  1. Add "join", "lookup", and "inline" datasources.
  2. Add "getChildren" and "withChildren" methods to DataSource, which will be used
    in the future for query rewriting (e.g. inlining of subqueries).

DataSource analysis functionality:

  1. Add DataSourceAnalysis class, which breaks down datasources into three components:
    outer queries, a base datasource (left-most of the highest level left-leaning join
    tree), and other joined-in leaf datasources (the right-hand branches of the
    left-leaning join tree).
  2. Add "isConcrete", "isGlobal", and "isCacheable" methods to DataSource in order to
    support analysis.
  3. Use the DataSourceAnalysis methods throughout the query handling stack, replacing
    various ad-hoc approaches. Most of the interesting changes are in
    ClientQuerySegmentWalker (brokers), ServerManager (historicals), and
    SinkQuerySegmentWalker (indexing tasks).

Other notes:

  1. Changed TimelineServerView to return an Optional timeline, which I thought made
    the analysis changes cleaner to implement.
  2. Renamed DataSource#getNames to DataSource#getTableNames, which I think is clearer.
    Also, made it a Set, so implementations don't need to worry about duplicates.
  3. Added QueryToolChest#canPerformSubquery, which is now used by query entry points to
    determine whether it is safe to pass a subquery dataSource to the query toolchest.
    Fixes an issue introduced in Implement force push down for nested group by query #5471 where subqueries under non-groupBy-typed queries
    were silently ignored, since neither the query entry point nor the toolchest did
    anything special with them.
  4. The addition of "isCacheable" should work around Improper result-level cache ETag handling for union datasources #8713, since UnionDataSource now
    returns false for cacheability.

Builds on apache#9111 and implements the datasource analysis mentioned in apache#8728. Still can't
handle join datasources, but we're a step closer.

Join-related DataSource types:

1) Add "join", "lookup", and "inline" datasources.
2) Add "getChildren" and "withChildren" methods to DataSource, which will be used
   in the future for query rewriting (e.g. inlining of subqueries).

DataSource analysis functionality:

1) Add DataSourceAnalysis class, which breaks down datasources into three components:
   outer queries, a base datasource (left-most of the highest level left-leaning join
   tree), and other joined-in leaf datasources (the right-hand branches of the
   left-leaning join tree).
2) Add "isConcrete", "isGlobal", and "isCacheable" methods to DataSource in order to
   support analysis.
3) Use the DataSourceAnalysis methods throughout the query handling stack, replacing
   various ad-hoc approaches. Most of the interesting changes are in
   ClientQuerySegmentWalker (brokers), ServerManager (historicals), and
   SinkQuerySegmentWalker (indexing tasks).

Other notes:

1) Changed TimelineServerView to return an Optional timeline, which I thought made
   the analysis changes cleaner to implement.
2) Renamed DataSource#getNames to DataSource#getTableNames, which I think is clearer.
   Also, made it a Set, so implementations don't need to worry about duplicates.
3) Added QueryToolChest#canPerformSubquery, which is now used by query entry points to
   determine whether it is safe to pass a subquery dataSource to the query toolchest.
   Fixes an issue introduced in apache#5471 where subqueries under non-groupBy-typed queries
   were silently ignored, since neither the query entry point nor the toolchest did
   anything special with them.
4) The addition of "isCacheable" should work around apache#8713, since UnionDataSource now
   returns false for cacheability.
@gianm
Copy link
Contributor Author

gianm commented Jan 21, 2020

Closing in favor of #9235, which is slightly scoped down (it doesn't hook into the query layer).

@gianm gianm closed this Jan 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant