Skip to content

Simplify small InList expressions #4089

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
You can write a query like

select ... from foo where id in (4)

And that sql is oftentimes made by tools that handle some number of ids;

We have a specialized InList implementation (e.g. see #4057) but for single values it is still faster to use a standard equality predicate

Describe the solution you'd like

As mentioned by @jackwener and @Dandandan in #4057 (comment)_ we should rewrite inlist with a few elements.

We should definitely simplify <left> IN (<expr>) to <left> = <expr> as that will be better in all cases.

Describe alternatives you've considered

We could potentially also rewrite <left> IN (<expr>, <expr2>, .. <exprN>) to <left> = <expr> OR <left> = <expr2> OR .. <left> = <exprN>

However, at some point the InList expression is faster to evaluate, and that break even point depends on the cost to evaluate <left> . Thus I suggest we only rewrite for single value IN lists

Additional context
This is a good first issue because there are several examples of the code and tests to follow

You can find simplify rules here: https://github.com/apache/arrow-datafusion/blob/10e64dc013ba210ab1f6c2a3c02c66aef4a0e802/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L329-L339

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions