New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into UDF test base #25195
Conversation
ok to test |
add to whitelist |
@younggyuchun, do you know why it works with UDFs? -- !query 19
-SELECT val, cate, row_number() OVER(PARTITION BY cate) FROM testData ORDER BY cate, val
+SELECT udf(val), cate, row_number() OVER(PARTITION BY cate ORDER BY val) FROM testData ORDER BY cate, udf(val)
-- !query 19 schema
-struct<>
+struct<CAST(udf(cast(val as string)) AS INT):int,cate:string,row_number() OVER (PARTITION BY cate ORDER BY val ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW):int>
-- !query 19 output
-org.apache.spark.sql.AnalysisException
-Window function row_number() requires window to be ordered, please add ORDER BY clause. For example SELECT row_number()(value_expr) OVER (PARTITION BY window_partition ORDER BY window_ordering) from table;
+NULL NULL 1
+3 NULL 2
+NULL a 1
+1 a 2
+1 a 3
+2 a 4
+1 b 1
+2 b 2
+3 b 3 |
@HyukjinKwon The reason is that I specified "ORDER BY val"?: Before: After: I guess we shouldn't specify "ORDER BY val" here right? |
Can you elaborate why and confirm if Scala UDF works or not as well? We should better match the test cases with the original file if there isn't a specific reason to differentiate. The point of why we're doing it is to detect non-working cases and file the issue (not making the test cases working with UDFs). |
Test build #107874 has finished for PR 25195 at commit
|
I thought we need to make the test cases work with UDFs as the original SQL spits out the following errors so I added 'ORDER BY' clause on it. Window function row_number() requires window to be ordered, please add ORDER BY clause. For example SELECT row_number()(value_expr) OVER (PARTITION BY window_partition ORDER BY window_ordering) from table; But I now understand the purpose of this work and think we should put it back to the original one. |
Yup, let's put it back. |
Done. Thank you @HyukjinKwon |
Test build #108015 has finished for PR 25195 at commit
|
sql/core/src/test/resources/sql-tests/inputs/udf/udf-window.sql
Outdated
Show resolved
Hide resolved
sql/core/src/test/resources/sql-tests/inputs/udf/udf-window.sql
Outdated
Show resolved
Hide resolved
sql/core/src/test/resources/sql-tests/inputs/udf/udf-window.sql
Outdated
Show resolved
Hide resolved
Test build #108126 has finished for PR 25195 at commit
|
Thanks, @younggyuchun. Do you mind updating PR description (the diff) too? |
Sure done |
Merged to master. Thanks @younggyuchun. Wellcome to Spark contributors :D. |
My pleasure and thank you for reviewing my work. Have a great night! |
What changes were proposed in this pull request?
This PR adds some tests converted from window.sql to test UDFs. Please see the contribution guide of this umbrella ticket - SPARK-27921.
Diff comparing to 'xxx.sql'
How was this patch tested?
Tested as guided in SPARK-27921.