-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guard against exponential increase of filters during CNF conversion #12314
Conversation
10ac9b8
to
ea3b348
Compare
@@ -178,6 +179,11 @@ private static void generateAllCombinations( | |||
a.add(child); | |||
// Result must receive an actual OrFilter. | |||
result.add(idempotentOr(Filters.or(a))); | |||
if (result.size() >= 10_000) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason for this limit to be configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be configurable - didn't keep it as of now to avoid more configs.
but if it feels like disruptive (because it could lead to failure in some existing queries which create > 10k filters and still run), I can add a config for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the change where exceeding this limit for not fail queries for now. when the limit exceeds, the join flow will disable pushdown optimization and the normal filter flow will return the non-CNF filter. does that seem ok to not add a config to toggle the limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i think that is ok, the transformation shouldn't be changing results afaik, though per my other comment there should be some way to know that this has happened
|
||
import org.apache.druid.java.util.common.StringUtils; | ||
|
||
public class CNFFilterExplosionException extends Exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a BadQueryException
? Since i think this is user controllable by setting a context flag or not, I think doing this would result in it being a user error when it eventually surfaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, BadQueryException
makes sense but one problem is that it is an un-checked exception. I was trying to make this a checked exception so that callers must decide how to handle it. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that makes sense, looking closer I see now that these exceptions are always eaten and don't make it to the user, i admittedly just did a drive by scan of this PR earlier and thought they might make it to the surface 😅
though thinking about it, maybe it is worth a log of some sort to know that this is happening so that there is some indicator of why the expected query isn't happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess a log might be kind of noisy if this was coming from the SQL planner, and would need to be collected and printed at the end instead. We have a mechanism for collecting errors this way, but not for informative messages; this could be done at a later time I think (if is actually useful to have a way to know why the planner did or didn't do a think that was asked for).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with that - it would make more sense to have something in a per query structure which is shown to the user.
processing/src/main/java/org/apache/druid/segment/filter/cnf/HiveCnfHelper.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
return useCNF ? Filters.toCnf(filter) : filter; | ||
} | ||
catch (CNFFilterExplosionException cnfFilterExplosionException) { | ||
return filter; // cannot convert to CNF, return the filter as is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just put a log that we are not converting to cnf due to 'CNFFilterExplosionExceptions` so that people debugging the query know whats happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clusters where this happens regularly, that can generate a lot of log spew. You could argue that they should respond to the log spew by not doing the queries, but it would still be a sudden chunk of log spew. In general, CNF is a potential optimization done by the planner, we don't log other optimization choices that we make, we just make them. In this case, it's deeming that converting to CNF is more expensive than just leaving it as is, which is yet another optimization decision. I don't think it needs a log, or if it does, maybe at debug level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand where your coming from. In such a case, I would then tend towards not throwing an exception and refactoring the Filter.toCnf(filter)
to return an appropriate state which helps us decide if the conversionWasSucessfull or not. Maybe use Optional.of()
so that we avoid throwing exceptions due to performance overheads of exceptions.
We can take this up as future work. Maybe when we are making this limit configurable.
Merged since failures on jdk15 tests are unrelated. |
…pache#12314) Currently, the CNF conversion of a filter is unbounded, which means that it can create as many filters as possible thereby also leading to OOMs in historical heap. We should throw an error or disable CNF conversion if the filter count starts getting out of hand. There are ways to do CNF conversion with linear increase in filters as well but that has been left out of the scope of this change since those algorithms add new variables in the predicate - which can be contentious.
Fixes #12311
Currently, the CNF conversion of a filter is unbounded, which means that it can create as many filters as possible thereby also leading to OOMs in historical heap. We should throw an error or disable CNF conversion if the filter count starts getting out of hand. There are ways to do CNF conversion with linear increase in filters as well but that has been left out of the scope of this change since those algorithms add new variables in the predicate - which can be contentious. Any improved algorithm for conversion can be taken up later upon discussion.
This PR has: