[SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3] #24829

wangyum · 2019-06-10T05:02:10Z

What changes were proposed in this pull request?

This PR is to port AGGREGATES.sql from PostgreSQL regression tests. https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/aggregates.sql#L352-L605

The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/aggregates.out#L986-L1613

When porting the test cases, found seven PostgreSQL specific features that do not exist in Spark SQL:

SPARK-27974: Add built-in Aggregate Function: array_agg
SPARK-27978: Add built-in Aggregate Functions: string_agg
SPARK-27986: Support Aggregate Expressions with filter
SPARK-27987: Support POSIX Regular Expressions
SPARK-28682: ANSI SQL: Collation Support
SPARK-28768: Implement more text pattern operators
SPARK-28865: Table inheritance

How was this patch tested?

N/A

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql

SparkQA · 2019-06-10T07:05:01Z

Test build #106340 has finished for PR 24829 at commit baaf0c5.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

…aggfstr, and aggfns

SparkQA · 2019-06-12T07:05:02Z

Test build #106402 has finished for PR 24829 at commit 0a425c4.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2019-06-12T07:10:42Z

retest this please

SparkQA · 2019-06-12T10:09:00Z

Test build #106407 has finished for PR 24829 at commit 0a425c4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-06-12T10:18:53Z

Test build #106409 has finished for PR 24829 at commit 0a425c4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-07-26T11:13:16Z

Test build #108203 has finished for PR 24829 at commit 0a425c4.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-01T17:33:56Z

Test build #108520 has finished for PR 24829 at commit 7c8b1ed.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2019-08-09T01:08:15Z

retest this please

SparkQA · 2019-08-09T03:41:34Z

Test build #108855 has finished for PR 24829 at commit 7c8b1ed.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-09T07:05:02Z

Test build #108862 has finished for PR 24829 at commit 55b233e.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2019-08-09T07:19:18Z

retest this please

SparkQA · 2019-08-09T10:51:31Z

Test build #108870 has finished for PR 24829 at commit 55b233e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql

sql/core/src/test/resources/sql-tests/results/pgSQL/aggregates_part3.sql.out

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql

sql/core/src/test/resources/sql-tests/results/pgSQL/aggregates_part3.sql.out

SparkQA · 2019-08-10T07:05:01Z

Test build #108909 has finished for PR 24829 at commit 7fe3507.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2019-08-10T07:36:27Z

retest this please

SparkQA · 2019-08-10T11:01:40Z

Test build #108911 has finished for PR 24829 at commit 7fe3507.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-08-11T00:06:21Z

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql

+--
+--
+-- AGGREGATES [Part 3]
+-- https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/aggregates.sql#L352-L605


REL_12_BETA1 -> REL_12_BETA2?

Please check the line numbers too if the file is changed. I hope it's unchanged.

AGGREGATES [Part 3] unchanged, AGGREGATES [Part 4] changed:
https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/aggregates.sql#L992
We has added it:

spark/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part4.sql

Lines 414 to 419 in 2a2b202

-- Make sure that generation of HashAggregate for uniqification purposes

-- does not lead to array overflow due to unexpected duplicate hash keys

-- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com

-- explain (costs off)

-- select 1 from tenk1

-- where (hundred, thousand) in (select twothousand, twothousand from onek);

dongjoon-hyun · 2019-08-11T00:11:49Z

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql

+-- drop table minmaxtest cascade;
+
+-- [SPARK-9830] It is not allowed to use an aggregate function in the argument of another aggregate function
+-- check for correct detection of nested-aggregate errors


Instead of adding SPARK-9830, can we check the actual error? The original PostgreSQL test is also for ensuring the error.

It's a correct behaviour, aggregate function calls cannot be nested
https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/aggregates.out#L1078-L1085

Ya. I know. My suggestion is to check the error message like PostgreSQL.
If it doesn't throw exceptions later, we can detect a regression at that time.

dongjoon-hyun · 2019-08-11T00:17:57Z

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql

+-- select array_agg(distinct a order by a desc nulls last)
+--   from (values (1),(2),(1),(3),(null),(2)) v(a);
+
+-- Skip the test below because it requires 4 UDFs: aggf_trans, aggfns_trans, aggfstr, and aggfns


Sorry, but this looks insufficient to ignore the followings.
We can register these functions like we did for tables tenk1 , cann't we?

UDF is easy, UDAF seems need to implement UserDefinedFunction.

Got it. Never mind.

dongjoon-hyun · 2019-08-11T00:24:54Z

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql

+--   from (values (1,3,'foo'),(0,null,null),(2,2,'bar'),(3,1,'baz')) v(a,b,c),
+--        generate_series(1,3) i;
+
+-- select aggfstr(distinct a,b,c order by b)


Did you mean that Spark doesn't support UDF invocation syntax like this udfName(distinct a,b,c order by b)?

I think this is UDF behaviour:

Do we have a JIRA for that? Then, please add the ID as a comment here.

postgres=# select max(distinct a) from (values('a'), ('b')) v(a); max ----- b (1 row) spark-sql> select max(distinct a) from (values('a'), ('b')) v(a); b spark-sql>

postgres=# select upper(distinct a) from (values('a'), ('b')) v(a); ERROR: DISTINCT specified, but upper is not an aggregate function LINE 1: select upper(distinct a) from (values('a'), ('b')) v(a); spark-sql> select upper(distinct a) from (values('a'), ('b')) v(a); Error in query: upper does not support the modifier DISTINCT; line 1 pos 7 spark-sql>

Do we need to add an ID? It seems that only the error message is different.

dongjoon-hyun · 2019-08-11T00:25:38Z

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql

+
+-- test specific code paths
+
+-- select aggfns(distinct a,a,c order by c using ~<~,a)


It seems that we need to mention the existing JIRA issue for distinct a,a,c order by c using ~<~,a?

SPARK-28010 or a new one for USING syntax.

https://issues.apache.org/jira/browse/SPARK-28768

dongjoon-hyun · 2019-08-11T00:39:59Z

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql

+
+-- Skip these tests because we do not have a bytea type
+-- string_agg bytea tests
+-- create table bytea_test_table(v bytea);


@wangyum and @maropu .

Shall we use BINARY as we did at strings.sql?

create table bytea_test_table(v BINARY);

After that, we are able to remove line 212 and focus on our next steps.

Ur, right. I missed it. bytea is binary.

dongjoon-hyun · 2019-08-11T00:42:16Z

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql

+
+-- select string_agg(v, '') from bytea_test_table;
+
+-- insert into bytea_test_table values(decode('ff','hex'));


Maybe, something like the following? And, do we have a JIRA for decode?

insert into bytea_test_table values(decode('aa', 'utf-8'));

It's https://issues.apache.org/jira/browse/SPARK-28121

dongjoon-hyun · 2019-08-11T00:47:15Z

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql

+select (select count(*)
+        from (values (1)) t0(inner_c))
+from (values (2),(3)) t1(outer_c);
+-- Rewriting to CASE WHEN will hit: Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses


File a JIRA issue?

I think we should revert it to original query. We can not support Aggregate Expressions with filter.

dongjoon-hyun · 2019-08-11T00:47:27Z

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql

+-- select (select count(*) filter (where outer_c <> 0)
+--         from (values (1)) t0(inner_c))
+-- from (values (2),(3)) t1(outer_c);
+-- Rewriting to CASE WHEN will hit: Found an aggregate expression in a correlated predicate that has both outer and local references


File a JIRA issue?

I think we should revert it to original query. We can not support Aggregate Expressions with filter.

dongjoon-hyun · 2019-08-11T00:50:50Z

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql

+-- subquery in FILTER clause (PostgreSQL extension)
+-- Rewriting to CASE WHEN will hit: IN/EXISTS predicate sub-queries can only be used in a Filter
+-- select sum(unique1) FILTER (WHERE
+--  unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1;


From line 252 ~ 268, I understand the reason why you add comments like Rewriting to CASE WHEN will hit: Expressions referencing the outer query are not supported outside of WHERE/HAVING clause. But, let's have an explicit JIRA ID. Otherwise, let's not add the comment here.

Yes. We should revert it to original query.

SparkQA · 2019-08-17T05:53:43Z

Test build #109258 has finished for PR 24829 at commit a2e9e57.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-18T15:54:29Z

Test build #109293 has finished for PR 24829 at commit ad5363b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-08-25T02:00:02Z

retest this please

HyukjinKwon · 2019-08-25T02:04:30Z

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql

+-- AGGREGATES [Part 3]
+-- https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/aggregates.sql#L352-L605
+
+-- We do not support inheritance tree, skip related tests.


Out of curiosity, why didn't we file a JIRA here? Even if we explicitly don't support, it's better to file a JIRA and resolve it as Won't Fix.

HyukjinKwon · 2019-08-25T02:07:00Z

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql

+-- select array_agg(distinct a order by a desc nulls last)
+--   from (values (1),(2),(1),(3),(null),(2)) v(a);
+
+-- Skip the test below because it requires 4 UDFs: aggf_trans, aggfns_trans, aggfstr, and aggfns


Then should it be "Skip the test below because it requires 4 UDAFs:"?

SparkQA · 2019-08-25T05:32:15Z

Test build #109684 has finished for PR 24829 at commit ad5363b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-25T13:32:44Z

Test build #109696 has finished for PR 24829 at commit 3365046.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-08-25T14:34:22Z

Merged to master.

Thanks all.

wangyum added 2 commits June 7, 2019 14:07

add AGGREGATES [Part 3]

c3ddcc0

add result

baaf0c5

wangyum commented Jun 10, 2019

View reviewed changes

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql Outdated Show resolved Hide resolved

Skip some test because it requires 4 UDFs: aggf_trans, aggfns_trans, …

0a425c4

…aggfstr, and aggfns

dongjoon-hyun added TEST SQL TESTS and removed TEST labels Jun 12, 2019

wangyum added 2 commits July 31, 2019 18:13

Merge remote-tracking branch 'upstream/master' into SPARK-27988

51825c4

Merge master

7c8b1ed

wangyum changed the title ~~[WIP][SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3]~~ [SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3] Aug 1, 2019

wangyum added 2 commits August 9, 2019 12:36

Merge remote-tracking branch 'upstream/master' into SPARK-27988

8511dcf

Merge master

55b233e

maropu reviewed Aug 10, 2019

View reviewed changes

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql Outdated Show resolved Hide resolved

maropu reviewed Aug 10, 2019

View reviewed changes

sql/core/src/test/resources/sql-tests/results/pgSQL/aggregates_part3.sql.out Outdated Show resolved Hide resolved

maropu reviewed Aug 10, 2019

View reviewed changes

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql Outdated Show resolved Hide resolved

maropu reviewed Aug 10, 2019

View reviewed changes

sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql Outdated Show resolved Hide resolved

maropu reviewed Aug 10, 2019

View reviewed changes

sql/core/src/test/resources/sql-tests/results/pgSQL/aggregates_part3.sql.out Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Aug 11, 2019

View reviewed changes

Address comment

a2e9e57

fix

ad5363b

HyukjinKwon reviewed Aug 25, 2019

View reviewed changes

HyukjinKwon approved these changes Aug 25, 2019

View reviewed changes

fix

3365046

HyukjinKwon closed this in 4b16cf1 Aug 25, 2019

wangyum deleted the SPARK-27988 branch August 25, 2019 14:45

	-- Make sure that generation of HashAggregate for uniqification purposes
	-- does not lead to array overflow due to unexpected duplicate hash keys
	-- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
	-- explain (costs off)
	-- select 1 from tenk1
	-- where (hundred, thousand) in (select twothousand, twothousand from onek);


		-- test specific code paths

		-- select aggfns(distinct a,a,c order by c using ~<~,a)


		-- select string_agg(v, '') from bytea_test_table;

		-- insert into bytea_test_table values(decode('ff','hex'));

[SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3] #24829

[SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3] #24829

Uh oh!

Conversation

wangyum commented Jun 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Uh oh!

SparkQA commented Jun 10, 2019

Uh oh!

SparkQA commented Jun 12, 2019

Uh oh!

wangyum commented Jun 12, 2019

Uh oh!

SparkQA commented Jun 12, 2019

Uh oh!

SparkQA commented Jun 12, 2019

Uh oh!

SparkQA commented Jul 26, 2019

Uh oh!

SparkQA commented Aug 1, 2019

Uh oh!

wangyum commented Aug 9, 2019

Uh oh!

SparkQA commented Aug 9, 2019

Uh oh!

SparkQA commented Aug 9, 2019

Uh oh!

wangyum commented Aug 9, 2019

Uh oh!

SparkQA commented Aug 9, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SparkQA commented Aug 10, 2019

Uh oh!

wangyum commented Aug 10, 2019

Uh oh!

SparkQA commented Aug 10, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangyum Aug 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Aug 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

wangyum commented Jun 10, 2019 •

edited

Loading

wangyum Aug 16, 2019 •

edited

Loading

dongjoon-hyun Aug 11, 2019 •

edited

Loading