Add SQL support for time-ordered scans #7370

justinborromeo · 2019-03-28T20:05:48Z

Motivation

See #6088 for original idea. PR #7133 is close to completion and the next step is to add SQL support for time-ordered scans. This would eliminate the need for using select queries in SQL planning since the only thing select is good for is time-ordering results. Updating to use scan would improve memory performance.

Proposed changes

The SQL planning in DruidQuery will be changed so that Scan is used if ordering by __time is specified. After that, Select will be essentially obsolete and will be removed from SQL planning altogether.

The user interface won't change.

Rationale

I think removing select queries from the SQL planner completely is the best choice since its design isn't great memory-wise. Although this means that time-ordered SELECT queries that fall outside of the configurable scan time-ordering limits (default 100K rows or 30 segments per time chunk) will fail, these limits can be tuned based on machine specs to a point where the query will succeed. Furthermore, if the query is big enough to cause memory issues with scan, using a select will be even worse.

Operational impact

No impact to overall cluster operation. Existing select queries might start failing if they're outside of the configurable row or segments per time chunk limits.

The text was updated successfully, but these errors were encountered:

gianm · 2019-03-28T20:43:12Z

Scan is used if ordering by __time is specified

Scan is already used if ordering by __time is not specified so I suppose this proposed change means that Select will never be used by Druid SQL. That sounds ok to me, because in situations you describe (lots of segments or high thresholds), Select queries have resource usage issues anyway as described in #6088. Make sure to update the Druid SQL documentation to talk about how the queries are planned now.

👍

justinborromeo added Design Review Proposal labels Mar 28, 2019

justinborromeo mentioned this issue Mar 30, 2019

SQL support for time-ordered scan #7373

Merged

jon-wei closed this as completed in #7373 Apr 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SQL support for time-ordered scans #7370

Add SQL support for time-ordered scans #7370

justinborromeo commented Mar 28, 2019

gianm commented Mar 28, 2019

Add SQL support for time-ordered scans #7370

Add SQL support for time-ordered scans #7370

Comments

justinborromeo commented Mar 28, 2019

Motivation

Proposed changes

Rationale

Operational impact

gianm commented Mar 28, 2019