You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
See #6088 for original idea. PR #7133 is close to completion and the next step is to add SQL support for time-ordered scans. This would eliminate the need for using select queries in SQL planning since the only thing select is good for is time-ordering results. Updating to use scan would improve memory performance.
Proposed changes
The SQL planning in DruidQuery will be changed so that Scan is used if ordering by __time is specified. After that, Select will be essentially obsolete and will be removed from SQL planning altogether.
The user interface won't change.
Rationale
I think removing select queries from the SQL planner completely is the best choice since its design isn't great memory-wise. Although this means that time-ordered SELECT queries that fall outside of the configurable scan time-ordering limits (default 100K rows or 30 segments per time chunk) will fail, these limits can be tuned based on machine specs to a point where the query will succeed. Furthermore, if the query is big enough to cause memory issues with scan, using a select will be even worse.
Operational impact
No impact to overall cluster operation. Existing select queries might start failing if they're outside of the configurable row or segments per time chunk limits.
The text was updated successfully, but these errors were encountered:
Scan is already used if ordering by __time is not specified so I suppose this proposed change means that Select will never be used by Druid SQL. That sounds ok to me, because in situations you describe (lots of segments or high thresholds), Select queries have resource usage issues anyway as described in #6088. Make sure to update the Druid SQL documentation to talk about how the queries are planned now.
Motivation
See #6088 for original idea. PR #7133 is close to completion and the next step is to add SQL support for time-ordered scans. This would eliminate the need for using select queries in SQL planning since the only thing select is good for is time-ordering results. Updating to use scan would improve memory performance.
Proposed changes
The SQL planning in
DruidQuery
will be changed so that Scan is used if ordering by __time is specified. After that, Select will be essentially obsolete and will be removed from SQL planning altogether.The user interface won't change.
Rationale
I think removing select queries from the SQL planner completely is the best choice since its design isn't great memory-wise. Although this means that time-ordered SELECT queries that fall outside of the configurable scan time-ordering limits (default 100K rows or 30 segments per time chunk) will fail, these limits can be tuned based on machine specs to a point where the query will succeed. Furthermore, if the query is big enough to cause memory issues with scan, using a select will be even worse.
Operational impact
No impact to overall cluster operation. Existing select queries might start failing if they're outside of the configurable row or segments per time chunk limits.
The text was updated successfully, but these errors were encountered: