relax native expression multi-value string usage validation for conditional and null coalescing functions#13885
Conversation
…tional and null coalescing functions
|
Could you describe the motivation and risk a little more? As to motivation— are there some queries that can plan after this change, that couldn't plan before? As to risk— are there some cases where the validation was important, and now wouldn't be happening anymore? |
Right now there are certain queries that are basically impossible to write, anything involving null value coercion or conditionals combined with explicit multi-value string functions, such as all of the where clauses in the query added in This same information is currently used to determine if an expression should be automatically 'applied' because the string is being used as a scalar value but is in fact multi-valued, so the risk of these changes is that if something was relying on NVL to be mapped across each element of the multi-value string instead of replace the value with some array then that behavior will no longer happen in this PR. This is why I am working on a follow-up that is what i consider the real fix, which is to switch to using the type inference system so that way an expression could look at its inputs inferred output types to determine if a multi-value string is being used as a scalar or as an array, and enforce consistency (and determine if translation is required) that way, but I'm not completely sure how complicated this fix is so put this up as a interim fix because i feel like it improves the behavior, though its certainly worth a discussion. The more I think about it though, i'm feeling a bit less good about this interim fix so I've been continuing on the more correct fix of consolidating the stuff on type inference. |
Hmm does that mean you'd like to close the PR? Or would you be updating it? |
I'm not entirely certain yet so will leave this open for now; I'm considering splitting out the changes to |
|
I opened #13890 which has just the cast operator changes as a partial alternative to this PR, where the other part would be covered by an in progress change to consolidate expression analysis stuff into a single pass based on the expression type inference stuff. |
|
closing in favor of #13890 |
Description
This PR relaxes multi-value string usage consistency validation to allow expressions like
nvl,isnull,notnull, case and if statements, etc, to no longer automatically assume that their inputs must be scalar arguments.This validation was added way back in #7588 as a means to ensure that a multi-value string column was being used consistently within an expression so that it cannot be considered both as an array and as a scalar value (which must be mapped across the multiple values). This system basically divided expressions up into expressions that definitely take array inputs and output arrays, and everything else. This worked well enough for lots of expressions, but a small handful don't fit this model, which is what this PR aims to help fix.
This system predates the native expression layer having full type inference, and I think it could be done a lot better, so I am also going to be trying to merge this much older analysis system into the type inference system in a follow-up PR, since all of the analysis of contextual usage and validation can be done at the same places where we doing the newer stuff (expression planner, etc).
While here I also consolidated some calcite -> native druid type stuff for the cast operator, which allowed some additional expressions to plan correctly.
This PR has: