Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SQL support for time-ordered scans #7370

Closed
justinborromeo opened this issue Mar 28, 2019 · 1 comment
Closed

Add SQL support for time-ordered scans #7370

justinborromeo opened this issue Mar 28, 2019 · 1 comment

Comments

@justinborromeo
Copy link
Contributor

Motivation

See #6088 for original idea. PR #7133 is close to completion and the next step is to add SQL support for time-ordered scans. This would eliminate the need for using select queries in SQL planning since the only thing select is good for is time-ordering results. Updating to use scan would improve memory performance.

Proposed changes

The SQL planning in DruidQuery will be changed so that Scan is used if ordering by __time is specified. After that, Select will be essentially obsolete and will be removed from SQL planning altogether.

The user interface won't change.

Rationale

I think removing select queries from the SQL planner completely is the best choice since its design isn't great memory-wise. Although this means that time-ordered SELECT queries that fall outside of the configurable scan time-ordering limits (default 100K rows or 30 segments per time chunk) will fail, these limits can be tuned based on machine specs to a point where the query will succeed. Furthermore, if the query is big enough to cause memory issues with scan, using a select will be even worse.

Operational impact

No impact to overall cluster operation. Existing select queries might start failing if they're outside of the configurable row or segments per time chunk limits.

@gianm
Copy link
Contributor

gianm commented Mar 28, 2019

Scan is used if ordering by __time is specified

Scan is already used if ordering by __time is not specified so I suppose this proposed change means that Select will never be used by Druid SQL. That sounds ok to me, because in situations you describe (lots of segments or high thresholds), Select queries have resource usage issues anyway as described in #6088. Make sure to update the Druid SQL documentation to talk about how the queries are planned now.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants