-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge?
What does Coalesce do
The coalesce function implicitly coerces types with Signature::VariadicEqual which has comparison_coercion internally, and the coercion is taken considered for all the columns, not only those we need, which gives us back unexpected casting results.
Coerce types after first non-null values are known
We can see the following example,
statement ok
create table test1 (a int, b int) as values (null, 3), (2, null);
# test coercion string
query TT
select
coalesce(a, b, 'none_set'),
arrow_typeof(coalesce(a, b, 'none_set'))
from test1;
----
3 Utf8
2 Utf8
Since they are coerced to Utf8, so we get 3 and 2 with Utf8.
Ideally, If we take the first non-null value, we should expect to get Int, not Utf8.
another example, dict is cast to Int64
query IT
select
coalesce(arrow_cast(123, 'Dictionary(Int32, Int8)'), 34),
arrow_typeof(coalesce(arrow_cast(123, 'Dictionary(Int32, Int8)'), 34))
----
123 Int64
The reason we need coercion is that it is possible that we have different types for different columns. Coerce them can help we get the final single type.
statement ok
create table test1 as values
(arrow_cast(1, 'Int8'), arrow_cast(2, 'Int32')),
(null, arrow_cast(3, 'Int32'))
query IT
select
coalesce(column1, column2),
arrow_typeof(coalesce(column1, column2))
from test1;
----
1 Int32
3 Int32
We get (1, Int8) and (3, Int32) for respective row, and finally cast them to I32.
I suggest that we apply coercion after we collect those first non-null values for each row.
Use non-comparison coercion
Comparision coercion (fn comparison_coercion) vs non-comparison coercion (fn coerced_from)
Those two logic are quite different, comparison coercion is for comparision. For example, compare dict with dict returns value type, since dict key is not important. Compare i64 with u64 we fallback to i64 because we don't have i128, even there is possible of lossy if you have large U64 value, but most of the cases like U64(1) and I64(1) is comparable, we will not block for those edge cases in comparison. And, there might be more.
Given the difference between these two coercion, I think non-comparison coercion is more suitable for Coalesce function. Btw, I think make_array should switch to non-comparison coercion too. make_array should do comparison coercion, following what Duckdb do
I suggest we switch VariadicEqual to non-comparison coercion or introduce another signature VariadicEqualNonCompare if comparison coercion is needed somewhere.
I think Int64 is a big surprise that we should avoid.
query IT
select
coalesce(arrow_cast(3, 'Dictionary(Int32, Int32)'), arrow_cast(3, 'Dictionary(Int32, Int64)')),
arrow_typeof(coalesce(arrow_cast(3, 'Dictionary(Int32, Int32)'), arrow_cast(3, 'Dictionary(Int32, Int64)')));
----
3 Int64
Maybe disable implicit coercion for Coalesce function?
I'm not sure why is coalesce function introduce implicit coercion, but I found that Postgres and Duckdb does not do implicit casting for Coalesce, maybe we should follow them?
DuckDB errors
D create table t1 (a integer, b varchar);
D insert into t1 values (1, 'a');
D insert into t1 values (null, 'b');
D select coalesce(a, b) from t1;
Error: Binder Error: Cannot mix values of type INTEGER and VARCHAR in COALESCE operator - an explicit cast is required
Postgres Error
postgres=# create table t2(a int, b varchar);
CREATE TABLE
postgres=# insert into t2 values (1, 'a');
INSERT 0 1
postgres=# insert into t2 values (null, 'b');
INSERT 0 1
postgres=# select coalesce(a, b) from t2;
ERROR: COALESCE types integer and character varying cannot be matched
LINE 1: select coalesce(a, b) from t2;
Describe the solution you'd like
- Avoid casting for coalesce at all
- Consider coercion after collecting known results if we need casting.
- Switch VariadicEqual to non-comparison coercion.
Describe alternatives you've considered
No response
Additional context
No response
Related issue that has coercion issue from Coalesce #10221
Part of the idea #10241