Skip to content

Use DataSourceAnalysis throughout the query stack.#9239

Merged
gianm merged 1 commit intoapache:masterfrom
gianm:joins-two
Jan 23, 2020
Merged

Use DataSourceAnalysis throughout the query stack.#9239
gianm merged 1 commit intoapache:masterfrom
gianm:joins-two

Conversation

@gianm
Copy link
Contributor

@gianm gianm commented Jan 22, 2020

Builds on #9235, using the datasource analysis functionality to replace various ad-hoc
approaches. The most interesting changes are in ClientQuerySegmentWalker (brokers),
ServerManager (historicals), and SinkQuerySegmentWalker (indexing tasks).

Other changes related to improving how we analyze queries:

  1. Changes TimelineServerView to return an Optional timeline, which I thought made
    the analysis changes cleaner to implement.
  2. Added QueryToolChest#canPerformSubquery, which is now used by query entry points to
    determine whether it is safe to pass a subquery dataSource to the query toolchest.
    Fixes an issue introduced in Implement force push down for nested group by query #5471 where subqueries under non-groupBy-typed queries
    were silently ignored, since neither the query entry point nor the toolchest did
    anything special with them.
  3. Removes the QueryPlus.withQuerySegmentSpec method, which was mostly being used in
    error-prone ways (ignoring any potential subqueries, and not verifying that the
    underlying data source is actually a table). Replaces with a new function,
    Queries.withSpecificSegments, that includes sanity checks.

Builds on apache#9235, using the datasource analysis functionality to replace various ad-hoc
approaches. The most interesting changes are in ClientQuerySegmentWalker (brokers),
ServerManager (historicals), and SinkQuerySegmentWalker (indexing tasks).

Other changes related to improving how we analyze queries:

1) Changes TimelineServerView to return an Optional timeline, which I thought made
   the analysis changes cleaner to implement.
2) Added QueryToolChest#canPerformSubquery, which is now used by query entry points to
   determine whether it is safe to pass a subquery dataSource to the query toolchest.
   Fixes an issue introduced in apache#5471 where subqueries under non-groupBy-typed queries
   were silently ignored, since neither the query entry point nor the toolchest did
   anything special with them.
3) Removes the QueryPlus.withQuerySegmentSpec method, which was mostly being used in
   error-prone ways (ignoring any potential subqueries, and not verifying that the
   underlying data source is actually a table). Replaces with a new function,
   Queries.withSpecificSegments, that includes sanity checks.
@jihoonson
Copy link
Contributor

LGTM

@gianm gianm merged commit f0f6857 into apache:master Jan 23, 2020
@gianm gianm deleted the joins-two branch January 24, 2020 22:12
@jihoonson jihoonson added this to the 0.18.0 milestone Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants