Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coalesce with string and int failed #11860

Open
2 tasks done
jayzhan211 opened this issue Apr 28, 2024 · 2 comments
Open
2 tasks done

coalesce with string and int failed #11860

jayzhan211 opened this issue Apr 28, 2024 · 2 comments

Comments

@jayzhan211
Copy link

What happens?

I'm not sure if the result is expected, but it seems weird to me

D select coalesce('a', 1);
Error: Conversion Error: Could not convert string 'a' to INT32
D select coalesce(1, 'a');
┌──────────────────┐
│ COALESCE(1, 'a') │
│      int32       │
├──────────────────┤
│                1 │
└──────────────────┘

I expect either they have errors or expected results, 'a' and 1

To Reproduce

D select coalesce('a', 1);
Error: Conversion Error: Could not convert string 'a' to INT32
D select coalesce(1, 'a');
┌──────────────────┐
│ COALESCE(1, 'a') │
│      int32       │
├──────────────────┤
│                1 │
└──────────────────┘

OS:

Macos M2

DuckDB Version:

v0.10.0 20b1486

DuckDB Client:

cli

Full Name:

Jay Zhan

Affiliation:

None

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have
@ewencp
Copy link
Contributor

ewencp commented May 6, 2024

The difference in behavior here is due to a combination of type resolution and short circuiting in execution of the COALESCE such that after finding a non-null value, no other values are converted. In the type resolution, the string literal is just being treated as an unknown. DuckDB is lax when resolving types for string literals with a second operand and seems to just assume the string can convert to the normalized type of the second operand.

Then the execution of the COALESCE works from left to right on the arguments and once it finds a non-NULL, short-circuits and doesn't evaluate the rest of them, including converting the literal to an integer. This explains why the first example works, but in the second example the first check on an argument to COALESCE needs to convert the literal and fails.

The "right" fix for this kinda depends.

In, e.g., Postgres, the implicit conversion for string literals to non-string-like types isn't permitted -- it'd only map to string-like types implicitly and require an explicit cast. Arguably even if you wanted to be more lax in DuckDB, this seems like something that should be caught during constant folding in the optimizer. Alternatively, the type resolution code could be adjusted to check that the literal is actually convertible to the type it's assuming. Or, despite being a bit redundant, possibly just doing both (the latter happens earlier, only once per query/planning, and could potentially give more useful error messages, the former just seems worthwhile to avoid redundant work (which I think is being performed, but haven't verified).

This could also be considered "correct" behavior. But doesn't seem great since it's confusing and if the string literal argument was a column reference to a textual type it generates an error re: types and requiring an explicit cast.

On test coverage, from a quick review it seems like the relevant tests only use explicit casts and probably aren't covering quite enough ordering cases as the error tests are primarily focused (for good reason) on NULL tests.

@ewencp
Copy link
Contributor

ewencp commented May 6, 2024

Actually it looks like the constant folding rule in the optimizer is trying to evaluate the cast to simplify, but since the same execution code is used, the optimization is also short circuited. It might be possible to catch/rethrow ConversionException specifically in ExpressionExecutor to make the error in the first case be caught earlier (currently it just fails to get optimized, then fails again in query evaluation). But to force different behavior in the optimizer to catch the type error, the ExpressionExecutor would need more context about the phase it's being executed in (optimization or execution).

Arguably this simply isn't a bug since the lax implicit conversion rules (which are specified in the SQL docs) combined with an expectation of lazy evaluation would mean the current behavior makes sense and is expected. But I'm not one of the duckdb devs, would be good to have one of them weigh in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants