Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move scan-query from a contrib extension into core. #4751

Merged
merged 10 commits into from
Sep 13, 2017

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Sep 5, 2017

Based on a proposal at: https://groups.google.com/d/topic/druid-development/ME_OatUDnbk/discussion

This patch also adds support for virtual columns to the Scan query,
and updates Druid SQL to use Scan instead of Select.

This patch also makes some behavioral changes to handling of the __time
column. In particular, it is now is returned as "__time" rather than
"timestamp"; it is no longer included if you do not specifically ask for
it in your "columns"; and it is returned as a long rather than a string.

Users can revert time handling to the legacy extension behavior by
setting "legacy" : true in their queries, or setting the property
druid.query.scan.legacy = true. This is meant to provide a migration
path for users that were formerly using the contrib extension.

Based on a proposal at: https://groups.google.com/d/topic/druid-development/ME_OatUDnbk/discussion

This patch also adds support for virtual columns to the Scan query,
and updates Druid SQL to use Scan instead of Select.

This patch also makes some behavioral changes to handling of the __time
column. In particular, it is now is returned as "__time" rather than
"timestamp"; it is no longer included if you do not specifically ask for
it in your "columns"; and it is returned as a long rather than a string.

Users can revert time handling to the legacy extension behavior by
setting "legacy" : true in their queries, or setting the property
druid.query.scan.legacy = true. This is meant to provide a migration
path for users that were formerly using the contrib extension.
@gianm gianm added this to the 0.11.0 milestone Sep 5, 2017
@gianm
Copy link
Contributor Author

gianm commented Sep 5, 2017

@kaijianding, as the original author of #3307 would you like to review this?

@kaijianding
Copy link
Contributor

will review tommorrow

Copy link
Contributor

@kaijianding kaijianding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone through the PR, everything fine except the removal of select query from calcite rule part. I think we should still keep select query for some specific case, like order by __time desc and limit 10 offset 20 pagination

@@ -19,6 +20,12 @@ Select queries return raw Druid rows and support pagination.
}
```

<div class="note info">
Consider using the [Scan query](scan-query.html) instead of the Select query if you don't need the strict time-ascending
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about mention scan-query doesn't support pagination?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll mention it.

selectProjection != null ? VirtualColumns.create(selectProjection.getVirtualColumns()) : VirtualColumns.EMPTY,
ScanQuery.RESULT_FORMAT_COMPACTED_LIST,
0,
limitSpec == null || limitSpec.getLimit() == Integer.MAX_VALUE ? 0 : limitSpec.getLimit(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this 'limit' in ScanQuery is long type, do we also need change limitSpec.getLimit() to long?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't make this change in this patch, since there's code elsewhere that assumes Integer.MAX_VALUE means "no limit". Also Calcite treats limit as int in a few places anyway. So some more care would be needed to migrate that. I'll add a comment though.

if (orderBys.isEmpty() ||
(orderBys.size() == 1 && orderBys.get(0).getDimension().equals(Column.TIME_COLUMN_NAME))) {
// Scan query can handle limiting but not sorting, so avoid applying this rule if there is a sort.
if (limitSpec.getColumns().isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how a sql to retrieve the latest 10 records is handled after druid select query is totally removed from rules?

select a,b,c from datasource order by __time desc limit 10

This sql can be translated to druid select query with descending=true and should be fast if limit is a small number.

If we don't handle this kind of sql here, what kind of druid query is used after we bypass this rule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch would have made order by time impossible. I guess I could add the "select" query back, just so this kind of query is possible. It will take some more time.

@gianm
Copy link
Contributor Author

gianm commented Sep 12, 2017

Updated the patch based on your comments @kaijianding. Thank you for reviewing.

@jihoonson
Copy link
Contributor

This patch looks good to me. Please fix the conflicts.

@gianm
Copy link
Contributor Author

gianm commented Sep 12, 2017

@jihoonson I fixed the conflicts.

@@ -341,7 +344,7 @@ public RelDataType getRowType()

/**
* Return this query as some kind of Druid query. The returned query will either be {@link TopNQuery},
* {@link TimeseriesQuery}, {@link GroupByQuery}, or {@link SelectQuery}.
* {@link TimeseriesQuery}, {@link GroupByQuery}, or {@link ScanQuery}.
Copy link
Contributor

@kaijianding kaijianding Sep 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still keep link to SelectQuery and add ScanQuery

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

@kaijianding kaijianding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is good to me now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants