Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22362][SQL] Add unit test for Window Aggregate Functions #20046

Closed
wants to merge 11 commits into from

Conversation

attilapiros
Copy link
Contributor

What changes were proposed in this pull request?

Improving the test coverage of window functions focusing on missing test for window aggregate functions. No new UDAF test is added as it has been tested already.

How was this patch tested?

Only new tests were added, automated tests were executed.

val e = intercept[AnalysisException](
df.select(
$"key",
count("invalid").over(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

over() would be enough, as partition and orderBy is ignored anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will remove the unnecessary parts.

Copy link
Contributor

@smurakozi smurakozi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have tests attempting to feed aggregation functions with columns containing wrong datatypes (e.g. avg on String)

@attilapiros attilapiros changed the title Adding unit tests for missing aggregate functions [SPARK-22362][SQL] Add unit test for Window Aggregate Functions Dec 21, 2017
@squito
Copy link
Contributor

squito commented Dec 21, 2017

Jenkins, ok to test

@SparkQA
Copy link

SparkQA commented Dec 21, 2017

Test build #85276 has finished for PR 20046 at commit 7b73f57.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 22, 2017

Test build #85310 has finished for PR 20046 at commit 00f50c0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 22, 2017

Test build #85311 has finished for PR 20046 at commit 7290ec4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 2, 2018

Test build #85599 has finished for PR 20046 at commit 4dd08dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 2, 2018

Test build #85601 has finished for PR 20046 at commit 03d6159.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 2, 2018

Test build #85610 has finished for PR 20046 at commit caafba7.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@attilapiros
Copy link
Contributor Author

@SparkQA
Copy link

SparkQA commented Jan 18, 2018

Test build #86352 has finished for PR 20046 at commit adab4da.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jan 19, 2018

Test build #86362 has finished for PR 20046 at commit adab4da.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -86,6 +93,429 @@ class DataFrameWindowFunctionsSuite extends QueryTest with SharedSQLContext {
assert(e.message.contains("requires window to be ordered"))
}

test("aggregation and rows between") {
val df = Seq((1, "1"), (2, "1"), (2, "2"), (1, "1"), (2, "2")).toDF("key", "value")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shall also include null data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests was removed and re-added as result of merge conflict. Now I cleaned up.

@jiangxb1987
Copy link
Contributor

We shall also cover the sql interface, you can find some example in sql/core/src/test/resources/sql-tests/inputs/udaf.sql

@smurakozi
Copy link
Contributor

smurakozi commented Jan 22, 2018

@jiangxb1987 how does your request to cover the sql interface relates to SPARK-23160?
I assume it is to be covered in that issue, is that correct?

@attilapiros
Copy link
Contributor Author

I have already extended sql/core/src/test/resources/sql-tests/inputs/window.sql with the missing window aggregate functions but if you would like I can move it to a different PR too.

@SparkQA
Copy link

SparkQA commented Jan 22, 2018

Test build #86474 has finished for PR 20046 at commit 458a0cc.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@attilapiros
Copy link
Contributor Author

PySpark failure must be unrelated as only unit tests are added.

@SparkQA
Copy link

SparkQA commented Jan 22, 2018

Test build #86472 has finished for PR 20046 at commit 5c941c7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 22, 2018

Test build #86476 has finished for PR 20046 at commit a0e14cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@attilapiros
Copy link
Contributor Author

cc @gatorsmile @hvanhovell

@jiangxb1987 your review comments are applied, is there something else I should work on regarding this PR?

@attilapiros
Copy link
Contributor Author

gentle ping @gatorsmile @hvanhovell

@SparkQA
Copy link

SparkQA commented Apr 9, 2018

Test build #89066 has finished for PR 20046 at commit 93aa930.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@attilapiros
Copy link
Contributor Author

ping @hvanhovell @cloud-fan

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. Looks good overall.

@@ -33,6 +35,11 @@ import org.apache.spark.unsafe.types.CalendarInterval
class DataFrameWindowFunctionsSuite extends QueryTest with SharedSQLContext {
import testImplicits._

private def sortWrappedArrayInRow(d: DataFrame) = d.map {
case Row(key: String, unsorted: mutable.WrappedArray[String]) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not pattern match against mutable.WrappedArray and use Seq instead. mutable.WrappedArray is pretty much an implementation detail, and pattern matching against it is brittle.

@@ -33,6 +35,11 @@ import org.apache.spark.unsafe.types.CalendarInterval
class DataFrameWindowFunctionsSuite extends QueryTest with SharedSQLContext {
import testImplicits._

private def sortWrappedArrayInRow(d: DataFrame) = d.map {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also just use the array_sort function. That is probably a lot cheaper.

@SparkQA
Copy link

SparkQA commented Apr 10, 2018

Test build #89100 has finished for PR 20046 at commit 9baef9c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@attilapiros
Copy link
Contributor Author

gentle reminder @hvanhovell

@attilapiros
Copy link
Contributor Author

@hvanhovell I would like to ask you to take another quick glance to these change

@hvanhovell
Copy link
Contributor

LGTM - merging to master. Thanks!

@asfgit asfgit closed this in 9ea8d3d Apr 19, 2018
@attilapiros attilapiros deleted the SPARK-22362 branch April 26, 2018 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants