-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27934][SQL][TEST] Port case.sql #24782
Conversation
-- Test the case statement | ||
-- | ||
-- There are 2 join condition is missing in this test case. we set spark.sql.crossJoin.enabled=true. | ||
set spark.sql.crossJoin.enabled=true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to set spark.sql.crossJoin.enabled=true
. otherwise:
-- !query 30
SELECT '' AS Five, NULLIF(a.i,b.i) AS `NULLIF(a.i,b.i)`,
NULLIF(b.i, 4) AS `NULLIF(b.i,4)`
FROM CASE_TBL a, CASE2_TBL b
-- !query 30 schema
struct<>
-- !query 30 output
org.apache.spark.sql.AnalysisException
Detected implicit cartesian product for INNER join between logical plans
Project [i#x]
+- Relation[i#x,f#x] parquet
and
Project [i#x]
+- Relation[i#x,j#x] parquet
Join condition is missing or trivial.
Either: use the CROSS JOIN syntax to allow cartesian products between these
relations, or: enable implicit cartesian products by setting the configuration
variable spark.sql.crossJoin.enabled=true;
SELECT CASE WHEN i > 100 THEN 1/0 ELSE 0 END FROM case_tbl | ||
-- !query 20 schema | ||
struct<CASE WHEN (i > 100) THEN (CAST(1 AS DOUBLE) / CAST(0 AS DOUBLE)) ELSE CAST(0 AS DOUBLE) END:double> | ||
-- !query 20 output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PostgresSQL throws ERROR: division by zero
: https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/case.out#L98-L99
I create SPARK-27923 to track this: https://issues.apache.org/jira/browse/SPARK-27923
FROM CASE_TBL a, CASE2_TBL b | ||
WHERE COALESCE(f,b.i) = 2; | ||
|
||
-- We don't support update now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip UPDATE
cases? I add a comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @gatorsmile and @wangyum . The half of the file is comment
which is irrelevant to Apache Spark. Do we need to keep all the invalid comments? In fact, the original will be changed time to time, too. For the simply invalid one (which has no SPARK JIRA), shall we skip adding comments?
|
||
-- SELECT * FROM CASE_TBL; | ||
|
||
-- We don't support the features below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip these cases? I add a comment here.
Test build #106111 has finished for PR 24782 at commit
|
Could you fix the following PR description?
|
Thank you @dongjoon-hyun |
-- https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/case.sql | ||
-- Test the case statement | ||
-- | ||
-- There are 2 join condition is missing in this test case. we set spark.sql.crossJoin.enabled=true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is missing
->which is missing
ormissed
?we set
->We set
.
BTW, it's not clear about the relation between the missed one and crossJoin
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about rewriting it to the following text?
This test suite contains two Cartesian products without using explicit CROSS JOIN syntax. Thus, we set spark.sql.crossJoin.enabled to true.
-- | ||
-- CASE | ||
-- https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/case.sql | ||
-- Test the case statement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is copied from the original, but for readability, the case statement
-> the CASE statement
|
||
-- [SPARK-27930] Add built-in Math Function: RANDOM | ||
-- SELECT '7' AS `None`, | ||
-- CASE WHEN random() < 0 THEN 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we first use rand()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Rewrite it to rand()
.
-- 'begin return $1; end' language plpgsql volatile; | ||
|
||
-- SELECT CASE | ||
-- (CASE vol('bar') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case is pretty useful. Could we use a udf here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Add vol
to SQLQueryTestSuite
.
Test build #106350 has finished for PR 24782 at commit
|
|
||
|
||
-- !query 21 | ||
SELECT CASE WHEN i > 100 THEN 1/0 ELSE 0 END FROM case_tbl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your comment https://github.com/apache/spark/pull/24782/files#r289867687 should be moved to here.
LGTM Merged to master. |
## What changes were proposed in this pull request? This PR is to port case.sql from PostgreSQL regression tests. https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/case.sql The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/case.out When porting the test cases, found one PostgreSQL specific features that do not exist in Spark SQL: - [SPARK-27930](https://issues.apache.org/jira/browse/SPARK-27930): Add built-in Math Function: RANDOM ## How was this patch tested? N/A Closes apache#24782 from wangyum/SPARK-27934. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>
What changes were proposed in this pull request?
This PR is to port case.sql from PostgreSQL regression tests. https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/case.sql
The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/case.out
When porting the test cases, found one PostgreSQL specific features that do not exist in Spark SQL:
How was this patch tested?
N/A