-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
DataFusion supports this type of syntax, such as count(a, b) and count(distinct a, b).
However, its behavior may not be well-defined. A counterintuitive behavior is that count(distinct a, b) returns more rows than count(a, b).
DataFusion CLI v39.0.0
> create table t(a int, b int) as values(1, NULL), (NULL, 10), (NULL, NULL);
> select count(distinct a,b) from t;
+-------------------------+
| count(DISTINCT t.a,t.b) |
+-------------------------+
| 1 |
+-------------------------+
> select count(a,b) from t;
+----------------+
| count(t.a,t.b) |
+----------------+
| 0 |
+----------------+Additionally, PostgreSQL does not support this type of syntax. MySQL only supports count(distinct a, b), but its result is different from DataFusion.
Server version: 8.3.0 Homebrew
mysql> create table t(a int, b int);
Query OK, 0 rows affected (0.01 sec)
mysql> insert into t values(1, NULL), (NULL, 10), (NULL, NULL);
Query OK, 3 rows affected (0.01 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> select count(distinct a,b) from t;
+---------------------+
| count(distinct a,b) |
+---------------------+
| 0 |
+---------------------+
1 row in set (0.00 sec)Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
findepi
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request