Move scan-query from a contrib extension into core. #4751

gianm · 2017-09-05T01:23:51Z

Based on a proposal at: https://groups.google.com/d/topic/druid-development/ME_OatUDnbk/discussion

This patch also adds support for virtual columns to the Scan query,
and updates Druid SQL to use Scan instead of Select.

This patch also makes some behavioral changes to handling of the __time
column. In particular, it is now is returned as "__time" rather than
"timestamp"; it is no longer included if you do not specifically ask for
it in your "columns"; and it is returned as a long rather than a string.

Users can revert time handling to the legacy extension behavior by
setting "legacy" : true in their queries, or setting the property
druid.query.scan.legacy = true. This is meant to provide a migration
path for users that were formerly using the contrib extension.

Based on a proposal at: https://groups.google.com/d/topic/druid-development/ME_OatUDnbk/discussion This patch also adds support for virtual columns to the Scan query, and updates Druid SQL to use Scan instead of Select. This patch also makes some behavioral changes to handling of the __time column. In particular, it is now is returned as "__time" rather than "timestamp"; it is no longer included if you do not specifically ask for it in your "columns"; and it is returned as a long rather than a string. Users can revert time handling to the legacy extension behavior by setting "legacy" : true in their queries, or setting the property druid.query.scan.legacy = true. This is meant to provide a migration path for users that were formerly using the contrib extension.

gianm · 2017-09-05T01:24:33Z

@kaijianding, as the original author of #3307 would you like to review this?

kaijianding · 2017-09-05T17:56:17Z

will review tommorrow

kaijianding

I've gone through the PR, everything fine except the removal of select query from calcite rule part. I think we should still keep select query for some specific case, like order by __time desc and limit 10 offset 20 pagination

kaijianding · 2017-09-06T15:04:45Z

docs/content/querying/select-query.md

@@ -19,6 +20,12 @@ Select queries return raw Druid rows and support pagination.
 }
 ```

+<div class="note info">
+Consider using the [Scan query](scan-query.html) instead of the Select query if you don't need the strict time-ascending


how about mention scan-query doesn't support pagination?

I'll mention it.

kaijianding · 2017-09-06T16:30:50Z

sql/src/main/java/io/druid/sql/calcite/rel/DruidQueryBuilder.java

+        selectProjection != null ? VirtualColumns.create(selectProjection.getVirtualColumns()) : VirtualColumns.EMPTY,
+        ScanQuery.RESULT_FORMAT_COMPACTED_LIST,
+        0,
+        limitSpec == null || limitSpec.getLimit() == Integer.MAX_VALUE ? 0 : limitSpec.getLimit(),


this 'limit' in ScanQuery is long type, do we also need change limitSpec.getLimit() to long?

I think we shouldn't make this change in this patch, since there's code elsewhere that assumes Integer.MAX_VALUE means "no limit". Also Calcite treats limit as int in a few places anyway. So some more care would be needed to migrate that. I'll add a comment though.

kaijianding · 2017-09-06T16:53:12Z

sql/src/main/java/io/druid/sql/calcite/rule/SelectRules.java

-      if (orderBys.isEmpty() ||
-          (orderBys.size() == 1 && orderBys.get(0).getDimension().equals(Column.TIME_COLUMN_NAME))) {
+      // Scan query can handle limiting but not sorting, so avoid applying this rule if there is a sort.
+      if (limitSpec.getColumns().isEmpty()) {


how a sql to retrieve the latest 10 records is handled after druid select query is totally removed from rules?

select a,b,c from datasource order by __time desc limit 10

This sql can be translated to druid select query with descending=true and should be fast if limit is a small number.

If we don't handle this kind of sql here, what kind of druid query is used after we bypass this rule?

This patch would have made order by time impossible. I guess I could add the "select" query back, just so this kind of query is possible. It will take some more time.

gianm · 2017-09-12T17:13:58Z

Updated the patch based on your comments @kaijianding. Thank you for reviewing.

jihoonson · 2017-09-12T20:32:18Z

This patch looks good to me. Please fix the conflicts.

gianm · 2017-09-12T21:41:42Z

@jihoonson I fixed the conflicts.

kaijianding · 2017-09-13T00:05:28Z

sql/src/main/java/io/druid/sql/calcite/rel/DruidQueryBuilder.java

@@ -341,7 +344,7 @@ public RelDataType getRowType()

  /**
   * Return this query as some kind of Druid query. The returned query will either be {@link TopNQuery},
-   * {@link TimeseriesQuery}, {@link GroupByQuery}, or {@link SelectQuery}.
+   * {@link TimeseriesQuery}, {@link GroupByQuery}, or {@link ScanQuery}.


still keep link to SelectQuery and add ScanQuery

kaijianding

This PR is good to me now

gianm added the Release Notes label Sep 5, 2017

gianm added this to the 0.11.0 milestone Sep 5, 2017

kaijianding reviewed Sep 6, 2017

View reviewed changes

gianm added 4 commits September 11, 2017 17:42

Merge branch 'master' into scan-query-core

9de076a

Adjustments from review.

1e0c696

Merge branch 'master' into scan-query-core

12f4a7d

Add back Select query.

8abf0f9

gianm added 2 commits September 12, 2017 10:15

Adjust SQL docs.

bbb6eff

Merge branch 'master' into scan-query-core

4173f67

jihoonson approved these changes Sep 12, 2017

View reviewed changes

Merge branch 'master' into scan-query-core

f8db3f1

kaijianding reviewed Sep 13, 2017

View reviewed changes

Restore SelectQuery link.

d11b47a

kaijianding approved these changes Sep 13, 2017

View reviewed changes

Merge branch 'master' into scan-query-core

2f0b7ad

jihoonson merged commit 2ce8123 into apache:master Sep 13, 2017

gianm deleted the scan-query-core branch September 13, 2017 17:25

jon-wei mentioned this pull request Sep 28, 2017

Druid 0.11.0 release notes #4876

Closed

gianm mentioned this pull request Jan 9, 2018

Error in coordinator/overlord when loading scan-query #4835

Closed

kgyrtkirk mentioned this pull request Feb 16, 2024

Remove ScanQuery legacy support #15918

Open

clintropolis mentioned this pull request Jun 26, 2024

remove native scan query legacy mode #16659

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move scan-query from a contrib extension into core. #4751

Move scan-query from a contrib extension into core. #4751

gianm commented Sep 5, 2017

gianm commented Sep 5, 2017

kaijianding commented Sep 5, 2017

kaijianding left a comment

kaijianding Sep 6, 2017

gianm Sep 12, 2017

kaijianding Sep 6, 2017

gianm Sep 6, 2017

kaijianding Sep 6, 2017

gianm Sep 12, 2017

gianm commented Sep 12, 2017 •

edited

Loading

jihoonson commented Sep 12, 2017

gianm commented Sep 12, 2017

kaijianding Sep 13, 2017 •

edited

Loading

gianm Sep 13, 2017

kaijianding left a comment

Move scan-query from a contrib extension into core. #4751

Move scan-query from a contrib extension into core. #4751

Conversation

gianm commented Sep 5, 2017

gianm commented Sep 5, 2017

kaijianding commented Sep 5, 2017

kaijianding left a comment

Choose a reason for hiding this comment

kaijianding Sep 6, 2017

Choose a reason for hiding this comment

gianm Sep 12, 2017

Choose a reason for hiding this comment

kaijianding Sep 6, 2017

Choose a reason for hiding this comment

gianm Sep 6, 2017

Choose a reason for hiding this comment

kaijianding Sep 6, 2017

Choose a reason for hiding this comment

gianm Sep 12, 2017

Choose a reason for hiding this comment

gianm commented Sep 12, 2017 • edited Loading

jihoonson commented Sep 12, 2017

gianm commented Sep 12, 2017

kaijianding Sep 13, 2017 • edited Loading

Choose a reason for hiding this comment

gianm Sep 13, 2017

Choose a reason for hiding this comment

kaijianding left a comment

Choose a reason for hiding this comment

gianm commented Sep 12, 2017 •

edited

Loading

kaijianding Sep 13, 2017 •

edited

Loading