Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27934][SQL][TEST] Port case.sql #24782

Closed
wants to merge 5 commits into from
Closed

[SPARK-27934][SQL][TEST] Port case.sql #24782

wants to merge 5 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Jun 3, 2019

What changes were proposed in this pull request?

This PR is to port case.sql from PostgreSQL regression tests. https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/case.sql

The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/case.out

When porting the test cases, found one PostgreSQL specific features that do not exist in Spark SQL:

How was this patch tested?

N/A

-- Test the case statement
--
-- There are 2 join condition is missing in this test case. we set spark.sql.crossJoin.enabled=true.
set spark.sql.crossJoin.enabled=true;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to set spark.sql.crossJoin.enabled=true. otherwise:

-- !query 30
SELECT '' AS Five, NULLIF(a.i,b.i) AS `NULLIF(a.i,b.i)`,
  NULLIF(b.i, 4) AS `NULLIF(b.i,4)`
  FROM CASE_TBL a, CASE2_TBL b
-- !query 30 schema
struct<>
-- !query 30 output
org.apache.spark.sql.AnalysisException
Detected implicit cartesian product for INNER join between logical plans
Project [i#x]
+- Relation[i#x,f#x] parquet
and
Project [i#x]
+- Relation[i#x,j#x] parquet
Join condition is missing or trivial.
Either: use the CROSS JOIN syntax to allow cartesian products between these
relations, or: enable implicit cartesian products by setting the configuration
variable spark.sql.crossJoin.enabled=true;

SELECT CASE WHEN i > 100 THEN 1/0 ELSE 0 END FROM case_tbl
-- !query 20 schema
struct<CASE WHEN (i > 100) THEN (CAST(1 AS DOUBLE) / CAST(0 AS DOUBLE)) ELSE CAST(0 AS DOUBLE) END:double>
-- !query 20 output
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FROM CASE_TBL a, CASE2_TBL b
WHERE COALESCE(f,b.i) = 2;

-- We don't support update now.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skip UPDATE cases? I add a comment here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @gatorsmile and @wangyum . The half of the file is comment which is irrelevant to Apache Spark. Do we need to keep all the invalid comments? In fact, the original will be changed time to time, too. For the simply invalid one (which has no SPARK JIRA), shall we skip adding comments?


-- SELECT * FROM CASE_TBL;

-- We don't support the features below:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skip these cases? I add a comment here.

@SparkQA
Copy link

SparkQA commented Jun 3, 2019

Test build #106111 has finished for PR 24782 at commit 58e9c2d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Could you fix the following PR description? case.sql instead of AGGREGATES.sql?

This PR is to port AGGREGATES.sql from PostgreSQL regression tests.

@wangyum
Copy link
Member Author

wangyum commented Jun 4, 2019

Thank you @dongjoon-hyun

-- https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/case.sql
-- Test the case statement
--
-- There are 2 join condition is missing in this test case. we set spark.sql.crossJoin.enabled=true.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • is missing -> which is missing or missed?
  • we set -> We set.

BTW, it's not clear about the relation between the missed one and crossJoin.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about rewriting it to the following text?

This test suite contains two Cartesian products without using explicit CROSS JOIN syntax. Thus, we set spark.sql.crossJoin.enabled to true.

--
-- CASE
-- https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/case.sql
-- Test the case statement
Copy link
Member

@dongjoon-hyun dongjoon-hyun Jun 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is copied from the original, but for readability, the case statement -> the CASE statement

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-27934][SQL] Port case.sql [SPARK-27934][SQL][TEST] Port case.sql Jun 9, 2019

-- [SPARK-27930] Add built-in Math Function: RANDOM
-- SELECT '7' AS `None`,
-- CASE WHEN random() < 0 THEN 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we first use rand()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Rewrite it to rand().

-- 'begin return $1; end' language plpgsql volatile;

-- SELECT CASE
-- (CASE vol('bar')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is pretty useful. Could we use a udf here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Add vol to SQLQueryTestSuite.

@SparkQA
Copy link

SparkQA commented Jun 10, 2019

Test build #106350 has finished for PR 24782 at commit 89e3a9e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.



-- !query 21
SELECT CASE WHEN i > 100 THEN 1/0 ELSE 0 END FROM case_tbl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment https://github.com/apache/spark/pull/24782/files#r289867687 should be moved to here.

@gatorsmile
Copy link
Member

LGTM

Merged to master.

emanuelebardelli pushed a commit to emanuelebardelli/spark that referenced this pull request Jun 15, 2019
## What changes were proposed in this pull request?

This PR is to port case.sql from PostgreSQL regression tests. https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/case.sql

The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/case.out

When porting the test cases, found one PostgreSQL specific features that do not exist in Spark SQL:

- [SPARK-27930](https://issues.apache.org/jira/browse/SPARK-27930): Add built-in Math Function: RANDOM

## How was this patch tested?

N/A

Closes apache#24782 from wangyum/SPARK-27934.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants