[SPARK-27883][SQL] Port AGGREGATES.sql [Part 2]#24743
[SPARK-27883][SQL] Port AGGREGATES.sql [Part 2]#24743wangyum wants to merge 4 commits intoapache:masterfrom wangyum:SPARK-27883
Conversation
sql/core/src/test/resources/sql-tests/results/pgSQL/aggregates_part2.sql.out
Outdated
Show resolved
Hide resolved
sql/core/src/test/resources/sql-tests/results/pgSQL/aggregates_part2.sql.out
Outdated
Show resolved
Hide resolved
sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part2.sql
Outdated
Show resolved
Hide resolved
| b2 BOOL, | ||
| b3 BOOL, | ||
| b4 BOOL); | ||
| -- SELECT |
There was a problem hiding this comment.
Do we need to deal with booland_statefunc and boolor_statefunc? These 2 functions are not exist in doc: https://www.postgresql.org/docs/12/functions-aggregate.html
There was a problem hiding this comment.
288 * Function for standard EVERY aggregate conforming to SQL 2003.
289 * The aggregate is also named bool_and for consistency.
290 *
291 * Note: this is only used in plain aggregate mode, not moving-aggregate mode.
There was a problem hiding this comment.
BooleanType and BooleanType is equivalent to bool bool_and bool?
There was a problem hiding this comment.
Yes. it's equivalent. BOOL_AND and booland_statefunc, BOOL_OR and boolor_statefunc.
There was a problem hiding this comment.
Can we rewrite the queries by using AND on the columns with boolean data types?
There was a problem hiding this comment.
Do you mean:
select NULL and NULL is null AS "t"?
There was a problem hiding this comment.
The behaviour different:
- Spark SQL:
SELECT
-- boolean and transitions
-- null because strict
NULL and NULL IS NULL AS `t`,
TRUE and NULL IS NULL AS `t`,
FALSE and NULL IS NULL AS `t`,
NULL and TRUE IS NULL AS `t`,
NULL and FALSE IS NULL AS `t`,
-- and actual computations
TRUE and TRUE AS `t`,
NOT TRUE and FALSE AS `t`,
NOT FALSE and TRUE AS `t`,
NOT FALSE and FALSE AS `t`
-- !query 1 schema
struct<t:boolean,t:boolean,t:boolean,t:boolean,t:boolean,t:boolean,t:boolean,t:boolean,t:boolean>
-- !query 1 output
NULL true false false false true false true falsePostgreSQL:
SELECT
-- boolean and transitions
-- null because strict
booland_statefunc(NULL, NULL) IS NULL AS "t",
booland_statefunc(TRUE, NULL) IS NULL AS "t",
booland_statefunc(FALSE, NULL) IS NULL AS "t",
booland_statefunc(NULL, TRUE) IS NULL AS "t",
booland_statefunc(NULL, FALSE) IS NULL AS "t",
-- and actual computations
booland_statefunc(TRUE, TRUE) AS "t",
NOT booland_statefunc(TRUE, FALSE) AS "t",
NOT booland_statefunc(FALSE, TRUE) AS "t",
NOT booland_statefunc(FALSE, FALSE) AS "t";
t | t | t | t | t | t | t | t | t
---+---+---+---+---+---+---+---+---
t | t | t | t | t | t | t | t | t
(1 row)There was a problem hiding this comment.
You need to add parenthesis.
%sql
SELECT
-- boolean and transitions
-- null because strict
(NULL and NULL) IS NULL AS `t`,
(TRUE and NULL) IS NULL AS `t`,
(FALSE and NULL) IS NULL AS `t`,
(NULL and TRUE) IS NULL AS `t`,
(NULL and FALSE) IS NULL AS `t`,
-- and actual computations
TRUE and TRUE AS `t`,
NOT (TRUE and FALSE) AS `t`,
NOT (FALSE and TRUE) AS `t`,
NOT (FALSE and FALSE) AS `t`
You will still see the difference. That is because of the strict mode.
Based on the doc of postgreSQL https://www.postgresql.org/docs/9.5/sql-createfunction.html
, for strict mode,
STRICT indicates that the function always returns null whenever any of its arguments are null. If this parameter is specified, the function is not executed when there are null arguments; instead a null result is assumed automatically.
Our AND does not have a strict mode. It is fine to have a behavior difference, but you need to update the comment to explain it.
sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part2.sql
Outdated
Show resolved
Hide resolved
|
Test build #105964 has finished for PR 24743 at commit
|
|
Retest this please. |
|
Test build #105979 has finished for PR 24743 at commit
|
|
@dongjoon-hyun The test will not pass because of #24743 (comment) |
|
Got it, @wangyum ! |
sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part2.sql
Outdated
Show resolved
Hide resolved
|
Test build #106052 has finished for PR 24743 at commit
|
|
Test build #106231 has finished for PR 24743 at commit
|
|
LGTM Thanks! Merged to master |
## What changes were proposed in this pull request? This PR is to port AGGREGATES.sql from PostgreSQL regression tests. https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/aggregates.sql#L145-L350 The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/aggregates.out#L499-L984 When porting the test cases, found four PostgreSQL specific features that do not exist in Spark SQL: - [SPARK-27877](https://issues.apache.org/jira/browse/SPARK-27877): Implement SQL-standard LATERAL subqueries - [SPARK-27878](https://issues.apache.org/jira/browse/SPARK-27878): Support ARRAY(sub-SELECT) expressions - [SPARK-27879](https://issues.apache.org/jira/browse/SPARK-27879): Implement bitwise integer aggregates(BIT_AND and BIT_OR) - [SPARK-27880](https://issues.apache.org/jira/browse/SPARK-27880): Implement boolean aggregates(BOOL_AND, BOOL_OR and EVERY) ## How was this patch tested? N/A Closes apache#24743 from wangyum/SPARK-27883. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>
What changes were proposed in this pull request?
This PR is to port AGGREGATES.sql from PostgreSQL regression tests. https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/aggregates.sql#L145-L350
The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/aggregates.out#L499-L984
When porting the test cases, found four PostgreSQL specific features that do not exist in Spark SQL:
How was this patch tested?
N/A