Skip to content

Optimize array_has_any() for scalar arg #20384

@neilconway

Description

@neilconway

Is your feature request related to a problem or challenge?

When array_has_any is passed a scalar for either of its argument, we can use a much faster algorithm: rather than doing O(N*M) comparisons for each row of the columnar arg, we can build a hash table on the scalar array and probe it instead.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

#18181 discusses a user-reported query where array_has_any is slow. In that scenario, array_has_any is called on a table column and an uncorrelated subquery that returns a 1245-element array. This results in calling array_has_any with a pair of columnar arguments (i.e., we don't take advantage of the fact that the subquery argument is effectively fixed and pass a scalar instead). Optimizing that query involves two steps:

  1. Optimize array_has_any for a scalar arg, which is this ticket. This has value as a standalone optimization.
  2. Query optimization improvement to handle this general class of queries better; I'll do some more digging here and file another ticket shortly.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions