-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
You can write a query like
select ... from foo where id in (4)And that sql is oftentimes made by tools that handle some number of ids;
We have a specialized InList implementation (e.g. see #4057) but for single values it is still faster to use a standard equality predicate
Describe the solution you'd like
As mentioned by @jackwener and @Dandandan in #4057 (comment)_ we should rewrite inlist with a few elements.
We should definitely simplify <left> IN (<expr>) to <left> = <expr> as that will be better in all cases.
Describe alternatives you've considered
We could potentially also rewrite <left> IN (<expr>, <expr2>, .. <exprN>) to <left> = <expr> OR <left> = <expr2> OR .. <left> = <exprN>
However, at some point the InList expression is faster to evaluate, and that break even point depends on the cost to evaluate <left> . Thus I suggest we only rewrite for single value IN lists
Additional context
This is a good first issue because there are several examples of the code and tests to follow
You can find simplify rules here: https://github.com/apache/arrow-datafusion/blob/10e64dc013ba210ab1f6c2a3c02c66aef4a0e802/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L329-L339