No protections for select query #5006

niketh · 2017-10-25T01:40:30Z

Select query (unlike groupBy) has no protection against bad queries. I propose adding a limit of 50K like in the case of groupBy so that historicals/brokers don't get overwhelmed. An unchecked select query causes historicals to OOM while finalizing results to be sent to the broker.

b-slim · 2017-10-25T02:06:10Z

My 2 cents, Unlike group By select query is not suppose to produce an unexpectedly large amount of rows due to some grouping by high cardinality dimensions. In fact, the select query has paginations that limit the amount of row to pull at a given query, I think that is sort of enough to limit the number of rows pulled in a single query.

RestfulBlue · 2018-07-23T16:57:00Z

Any updates? Druid still can be killed by OOM using simple select query. of course I can use restrictions, but not all users understand what they are doing and the fact that users can very simply drop the entire cluster does not please me. Is it really hard to implement restriction for select query in configuration?

gianm · 2018-08-01T09:54:40Z

@RestfulBlue Have you considered the Scan query (http://druid.io/docs/latest/querying/scan-query.html) is meant to be a more memory friendly replacement for the Select query. It won't run your historicals out of memory. The only issue it really has is #4933, and that's a broker thing, not historicals. And it only can happen with very large or unlimited scan limits.

You could also consider adding a property for maxResults to SelectQueryConfig, and having that be a cap on the number of results that a Select query is allowed to return. With the way Select is written, though, it might be challenging to implement. Select's real problem is that it selects threshold-number of rows per each segment in parallel and then limits them later. This causes it to use up a lot more memory than you expect, and can also make it tough to implement a maxResults limit.

RestfulBlue · 2018-08-07T18:53:26Z

@gianm yes, i have considered, but currently druid api can be used only from some ui apps, like grafana/metabase/etc. if we let every analytic/data scientist in our company access to api , then entire druid cluster will crash really soon :( only one way to protect it now is timeout with low value ( about 10s). in 12.2 druid ingestion via kafka indexing task looks stable. We have separated cluster for stress testing, with throughput about 500000rows/sec on hardware with 8 cores, 64gb mem, 6 servers and now there s no failed tasks ( in 12.1 1/3 of task was failed), but currently query api looks really unstable and too many things can go wrong

gianm · 2018-08-07T19:38:20Z

@RestfulBlue I'd really try to move to Scan queries if you can… ultimately I believe we are going to be deprecating and removing Select queries. They are broken beyond repair, imo, and Scan is a good replacement (especially after #6088).

KenjiTakahashi · 2018-08-07T21:41:17Z

I'm with @gianm on this. I have examined the select in quite a detail in the past (before scan came to be, it is to be found somewhere in the issues/mailing list) and, sorry to say, but IMHO the whole design is just wrong, there's no use in trying to improve on it. Especially now, when we have scan.

FYI: We have since moved all but one (and a very rarely occurring one, that needs #6088, actually) of our queries to scan and never looked back.

asdf2014 · 2018-08-08T08:51:39Z

some ui apps, like grafana/metabase/etc. if we let every analytic/data scientist in our company access to api , then entire druid cluster will crash really soon

Hi, @RestfulBlue . Maybe you can try Apache Superset, which supports permission control to prevent ordinary users from sending queries that consume a lot of server resources, in addition, Superset also can predefine some dashboards instead of directly exposing the query api.

Tips: There is an article about superset on my personal blog that might help you.

RestfulBlue · 2018-08-10T09:35:42Z

@asdf2014 we use grafana as main gui, we have our self developed druid datasource and we have good restrictions as well. Using grafana users just cant build bad query and it is not a problem. Problems come then analysts/devs/others start using direct druid api , for example using jdbc

asdf2014 · 2018-08-10T09:53:27Z

Hi, @RestfulBlue . Indeed, we should analyze the query sql based on cost, then we can add some limitations for malicious query to protect druid cluster.

RestfulBlue · 2018-08-10T12:53:12Z

@asdf2014 jdbc its just example, in some places native queryes preferred, as sql has some problems.i think every query should have limitations like groupby. groupby looks really good query : disk spilling, limitation applyes to total rows count(not per granularity), when timeseries and topn doesnt has this restrictions. timeseries with large interval and none granularity - oom, same with topn. i think every query should have limitations like groupby

asdf2014 · 2018-08-12T02:36:06Z

@RestfulBlue Good point. I agree with you.

stale · 2019-06-20T22:30:28Z

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

stale · 2019-07-04T22:59:59Z

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

gianm · 2019-07-04T23:06:47Z

Since #7133 is done, even more workloads can be moved to Scan now. That is what I'd recommend.

gianm mentioned this issue Aug 1, 2018

Scan query: time-ordering #6088

Closed

gianm mentioned this issue Sep 28, 2018

Optimize data structures in SelectResultValueBuilder #6400

Closed

stale bot added the stale label Jun 20, 2019

stale bot closed this as completed Jul 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No protections for select query #5006

No protections for select query #5006

niketh commented Oct 25, 2017

b-slim commented Oct 25, 2017

RestfulBlue commented Jul 23, 2018

gianm commented Aug 1, 2018

RestfulBlue commented Aug 7, 2018 •

edited

Loading

gianm commented Aug 7, 2018

KenjiTakahashi commented Aug 7, 2018

asdf2014 commented Aug 8, 2018

RestfulBlue commented Aug 10, 2018

asdf2014 commented Aug 10, 2018

RestfulBlue commented Aug 10, 2018

asdf2014 commented Aug 12, 2018

stale bot commented Jun 20, 2019

stale bot commented Jul 4, 2019

gianm commented Jul 4, 2019

No protections for select query #5006

No protections for select query #5006

Comments

niketh commented Oct 25, 2017

b-slim commented Oct 25, 2017

RestfulBlue commented Jul 23, 2018

gianm commented Aug 1, 2018

RestfulBlue commented Aug 7, 2018 • edited Loading

gianm commented Aug 7, 2018

KenjiTakahashi commented Aug 7, 2018

asdf2014 commented Aug 8, 2018

RestfulBlue commented Aug 10, 2018

asdf2014 commented Aug 10, 2018

RestfulBlue commented Aug 10, 2018

asdf2014 commented Aug 12, 2018

stale bot commented Jun 20, 2019

stale bot commented Jul 4, 2019

gianm commented Jul 4, 2019

RestfulBlue commented Aug 7, 2018 •

edited

Loading