Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29107][SQL][TESTS] Port window.sql (Part 1) #25816

Closed
wants to merge 8 commits into from

Conversation

DylanGuedes
Copy link
Contributor

What changes were proposed in this pull request?

This PR ports window.sql from PostgreSQL regression tests https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/window.sql from lines 1~319

The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/window.out

How was this patch tested?

Pass the Jenkins.

Why are the changes needed?

To ensure compatibility with PGSQL

Does this PR introduce any user-facing change?

No

How was this patch tested?

Comparison with PgSQL results.

@wangyum
Copy link
Member

wangyum commented Sep 17, 2019

ok to test

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110785 has started for PR 25816 at commit a8a3695.

@shaneknapp
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110801 has finished for PR 25816 at commit a8a3695.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for some minor comments. @wangyum @HyukjinKwon

-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
--
-- Window Functions Testing
-- https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add line numbers?:

-- https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/aggregates.sql#L1-L143

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can update to match Beta4, but:

  • If the PgSQL version is that relevant, then we will add a new JIRA for every new PgSQL release for every .sql migration (i.e: strings.sql, union.sql, date.sql, aggregates.sql, etc). But we didn't.
  • If the PgSQL is not that relevant, then there's no need to change PRs to match the latest release candidate version (although it matches a stable version).

What you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked the last discussion? #24850 (comment)
I think we don't need to stick to the single snapshot, so its ok to update it to REL_12_BETA4 in this port. Actually, no difference between BETA2 and BETA4 in window.sql?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I didn't get noticed that you guys were already discussing about updating other tests.
I checked last week and there are some major differences between BETA2 and BETA4 for window.sql, I just can't remember if lines L1-L319 are impacted by them. So, it's ok to keep at REL2 or should I updated to REL3?

SELECT sum(salary) OVER w, rank() OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);

-- strict aggs
-- Temporarily turns off the ANSI mode because of compatibility issues between keywords
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe which keyword?

FROM tenk1 WHERE unique1 < 10;

-- [SPARK-28428] Spark `exclude` always expecting `()`
SELECT sum(unique1) over (w range between current row and unbounded following),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why didn't you comment out this statement? You did the other statements, though?

@HyukjinKwon
Copy link
Member

ping @DylanGuedes to update

@DylanGuedes
Copy link
Contributor Author

@HyukjinKwon I updated with the changes suggested, but still not clear: should I update to REL4?

@SparkQA
Copy link

SparkQA commented Sep 29, 2019

Test build #111564 has finished for PR 25816 at commit 6f08f75.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Yea, we should update it.

@SparkQA
Copy link

SparkQA commented Sep 29, 2019

Test build #111567 has finished for PR 25816 at commit ad93404.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

3 9 25 9


-- !query 43
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems failing when it's ran via the thrift server. Can you check the cause and exclude this in the test? cc @wangyum

@SparkQA
Copy link

SparkQA commented Sep 29, 2019

Test build #111571 has finished for PR 25816 at commit 0783234.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

-- SELECT four, ten/4 as two,
-- sum(ten/4) over (partition by four order by ten/4 range between unbounded preceding and current row),
-- last(ten/4) over (partition by four order by ten/4 range between unbounded preceding and current row)
-- FROM (select distinct ten, four from tenk1) ss;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, did you regenerate the golden files after this removal?

@SparkQA
Copy link

SparkQA commented Sep 29, 2019

Test build #111575 has finished for PR 25816 at commit b6fe45c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 2, 2019

Test build #111661 has finished for PR 25816 at commit 178bbfe.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Hmm. Still Result did not match for query for the following.

SELECT four, ten/4 as two,
sum(ten/4) over (partition by four order by ten/4 range between unbounded preceding and current row),
last(ten/4) over (partition by four order by ten/4 range between unbounded preceding and current row)
FROM (select distinct ten, four from tenk1) ss

@DylanGuedes . Could you post the query result in both spark-shell and spark-sql?
Previously, there was a bug which the result is different in both shells and spark-sql is similar with Spark Thrift Server.

Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
@DylanGuedes
Copy link
Contributor Author

It works fine locally. And, to be fair, I have no idea why this is not working: in my older PR It also always passed. I'll remote both queries that are failing.

Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
@SparkQA
Copy link

SparkQA commented Oct 8, 2019

Test build #111862 has finished for PR 25816 at commit 3a5825c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

last(ten) over (partition by four order by ten range between unbounded preceding and unbounded following)
FROM (select distinct ten, four from tenk1) ss;

-- Failing on thrift server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a JIRA issue if we are not going to cover this query.

-- last(ten/4) over (partition by four order by ten/4 range between unbounded preceding and current row)
-- FROM (select distinct ten, four from tenk1) ss;

-- Failing on thrift server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a JIRA issue if we are not going to cover this query, too.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this consumed your time your time, @DylanGuedes .

I believe the followings are the final pieces.

cc @maropu , too.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Oct 13, 2019

I checked the diff. There is additional addition from line 1257, but that is irrelevant to this PR because this PR is only for line 1 to line 319.

~/A/postgres:REL_12_STABLE$ git diff REL_12_BETA4 src/test/regress/sql/window.sql
diff --git a/src/test/regress/sql/window.sql b/src/test/regress/sql/window.sql
index fc6d4cc903..fe273aa31e 100644
--- a/src/test/regress/sql/window.sql
+++ b/src/test/regress/sql/window.sql
@@ -1257,3 +1257,22 @@ SELECT to_char(SUM(n::float8) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 FO
 SELECT i, b, bool_and(b) OVER w, bool_or(b) OVER w
   FROM (VALUES (1,true), (2,true), (3,false), (4,false), (5,true)) v(i,b)
   WINDOW w AS (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING);
+
+-- Tests for problems with failure to walk or mutate expressions
+-- within window frame clauses.
+
+-- test walker (fails with collation error if expressions are not walked)
+SELECT array_agg(i) OVER w
+  FROM generate_series(1,5) i
+WINDOW w AS (ORDER BY i ROWS BETWEEN (('foo' < 'foobar')::integer) PRECEDING AND CURRENT ROW);
+
+-- test mutator (fails when inlined if expressions are not mutated)
+CREATE FUNCTION pg_temp.f(group_size BIGINT) RETURNS SETOF integer[]
+AS $$
+    SELECT array_agg(s) OVER w
+      FROM generate_series(1,5) s
+    WINDOW w AS (ORDER BY s ROWS BETWEEN CURRENT ROW AND GROUP_SIZE FOLLOWING)
+$$ LANGUAGE SQL STABLE;
+
+EXPLAIN (costs off) SELECT * FROM pg_temp.f(2);
+SELECT * FROM pg_temp.f(2);

Please simply update the link with REL_12_STABLE in the PR description and code.

@SparkQA
Copy link

SparkQA commented Oct 13, 2019

Test build #111988 has finished for PR 25816 at commit 3a5825c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
@SparkQA
Copy link

SparkQA commented Oct 13, 2019

Test build #111995 has finished for PR 25816 at commit 1ac60f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @DylanGuedes , @maropu , @HyukjinKwon !
Merged to master for this.

@gengliangwang
Copy link
Member

I have created a followup to correct the file location: #26116

@dongjoon-hyun
Copy link
Member

Oops. Yes. We moved the location and this PR is very old.
For now, #26116 is failing. I'll revert this PR. Sorry guys.

@dongjoon-hyun
Copy link
Member

I missed the location, @DylanGuedes .
Sorry, but could you make a PR after putting the file into the correct location sql/core/src/test/resources/sql-tests/inputs/postgreSQL instead of pgSQL?
As @gengliangwang pointed out, this PR may fail at that new location. We can decide to file a JIRA or fix it at that time.

And, thank you again, @gengliangwang !

@DylanGuedes
Copy link
Contributor Author

@dongjoon-hyun I made a new PR with the new location.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants