Skip to content

[SPARK-27883][SQL] Port AGGREGATES.sql [Part 2]#24743

Closed
wangyum wants to merge 4 commits intoapache:masterfrom
wangyum:SPARK-27883
Closed

[SPARK-27883][SQL] Port AGGREGATES.sql [Part 2]#24743
wangyum wants to merge 4 commits intoapache:masterfrom
wangyum:SPARK-27883

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented May 30, 2019

What changes were proposed in this pull request?

This PR is to port AGGREGATES.sql from PostgreSQL regression tests. https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/aggregates.sql#L145-L350

The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/aggregates.out#L499-L984

When porting the test cases, found four PostgreSQL specific features that do not exist in Spark SQL:

  • SPARK-27877: Implement SQL-standard LATERAL subqueries
  • SPARK-27878: Support ARRAY(sub-SELECT) expressions
  • SPARK-27879: Implement bitwise integer aggregates(BIT_AND and BIT_OR)
  • SPARK-27880: Implement boolean aggregates(BOOL_AND, BOOL_OR and EVERY)

How was this patch tested?

N/A

b2 BOOL,
b3 BOOL,
b4 BOOL);
-- SELECT
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to deal with booland_statefunc and boolor_statefunc? These 2 functions are not exist in doc: https://www.postgresql.org/docs/12/functions-aggregate.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  288  * Function for standard EVERY aggregate conforming to SQL 2003.
  289  * The aggregate is also named bool_and for consistency.
  290  *
  291  * Note: this is only used in plain aggregate mode, not moving-aggregate mode.

See: https://doxygen.postgresql.org/bool_8c_source.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BooleanType and BooleanType is equivalent to bool bool_and bool?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. it's equivalent. BOOL_AND and booland_statefunc, BOOL_OR and boolor_statefunc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rewrite the queries by using AND on the columns with boolean data types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean:

select NULL and NULL is null AS "t"

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behaviour different:

  1. Spark SQL:
SELECT
  -- boolean and transitions
  -- null because strict
  NULL and NULL IS NULL AS `t`,
  TRUE and NULL IS NULL AS `t`,
  FALSE and NULL IS NULL AS `t`,
  NULL and TRUE IS NULL AS `t`,
  NULL and FALSE IS NULL AS `t`,
  -- and actual computations
  TRUE and TRUE AS `t`,
  NOT TRUE and FALSE AS `t`,
  NOT FALSE and TRUE AS `t`,
  NOT FALSE and FALSE AS `t`
-- !query 1 schema
struct<t:boolean,t:boolean,t:boolean,t:boolean,t:boolean,t:boolean,t:boolean,t:boolean,t:boolean>
-- !query 1 output
NULL	true	false	false	false	true	false	true	false

PostgreSQL:

SELECT
  -- boolean and transitions
  -- null because strict
  booland_statefunc(NULL, NULL)  IS NULL AS "t",
  booland_statefunc(TRUE, NULL)  IS NULL AS "t",
  booland_statefunc(FALSE, NULL) IS NULL AS "t",
  booland_statefunc(NULL, TRUE)  IS NULL AS "t",
  booland_statefunc(NULL, FALSE) IS NULL AS "t",
  -- and actual computations
  booland_statefunc(TRUE, TRUE) AS "t",
  NOT booland_statefunc(TRUE, FALSE) AS "t",
  NOT booland_statefunc(FALSE, TRUE) AS "t",
  NOT booland_statefunc(FALSE, FALSE) AS "t";
 t | t | t | t | t | t | t | t | t 
---+---+---+---+---+---+---+---+---
 t | t | t | t | t | t | t | t | t
(1 row)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add parenthesis.

%sql
SELECT
  -- boolean and transitions
  -- null because strict
  (NULL and NULL) IS NULL AS `t`,
  (TRUE and NULL) IS NULL AS `t`,
  (FALSE and NULL) IS NULL AS `t`,
  (NULL and TRUE) IS NULL AS `t`,
  (NULL and FALSE) IS NULL AS `t`,
  -- and actual computations
  TRUE and TRUE AS `t`,
  NOT (TRUE and FALSE) AS `t`,
  NOT (FALSE and TRUE) AS `t`,
  NOT (FALSE and FALSE) AS `t`

You will still see the difference. That is because of the strict mode.

Based on the doc of postgreSQL https://www.postgresql.org/docs/9.5/sql-createfunction.html
, for strict mode,

STRICT indicates that the function always returns null whenever any of its arguments are null. If this parameter is specified, the function is not executed when there are null arguments; instead a null result is assumed automatically.

Our AND does not have a strict mode. It is fine to have a behavior difference, but you need to update the comment to explain it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I see.

@SparkQA
Copy link

SparkQA commented May 30, 2019

Test build #105964 has finished for PR 24743 at commit b585463.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented May 30, 2019

Test build #105979 has finished for PR 24743 at commit b585463.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented May 30, 2019

@dongjoon-hyun The test will not pass because of #24743 (comment)

@dongjoon-hyun
Copy link
Member

Got it, @wangyum !

@SparkQA
Copy link

SparkQA commented Jun 1, 2019

Test build #106052 has finished for PR 24743 at commit ee63c37.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum wangyum changed the title [WIP][SPARK-27883][SQL] Port AGGREGATES.sql [Part 2] [SPARK-27883][SQL] Port AGGREGATES.sql [Part 2] Jun 1, 2019
@SparkQA
Copy link

SparkQA commented Jun 6, 2019

Test build #106231 has finished for PR 24743 at commit 41d4e0a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

Thanks! Merged to master

@gatorsmile gatorsmile closed this in 4de9649 Jun 6, 2019
@wangyum wangyum deleted the SPARK-27883 branch June 6, 2019 23:30
emanuelebardelli pushed a commit to emanuelebardelli/spark that referenced this pull request Jun 15, 2019
## What changes were proposed in this pull request?

This PR is to port AGGREGATES.sql from PostgreSQL regression tests. https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/aggregates.sql#L145-L350

The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/aggregates.out#L499-L984

When porting the test cases, found four PostgreSQL specific features that do not exist in Spark SQL:

- [SPARK-27877](https://issues.apache.org/jira/browse/SPARK-27877): Implement SQL-standard LATERAL subqueries
- [SPARK-27878](https://issues.apache.org/jira/browse/SPARK-27878): Support ARRAY(sub-SELECT) expressions
- [SPARK-27879](https://issues.apache.org/jira/browse/SPARK-27879): Implement bitwise integer aggregates(BIT_AND and BIT_OR)
- [SPARK-27880](https://issues.apache.org/jira/browse/SPARK-27880): Implement boolean aggregates(BOOL_AND, BOOL_OR and EVERY)

## How was this patch tested?

N/A

Closes apache#24743 from wangyum/SPARK-27883.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants