Skip to content

Conversation

smurakozi
Copy link
Contributor

What changes were proposed in this pull request?

Improve the test coverage of window specifications.

New tests cover basic cases more systematically in DataFrameWindowFunctionsSuite:

  • different partition clauses (none, one, multiple)
  • different order clauses (none, one, multiple, asc/desc, nulls first/last)
  • defaults if clauses are missing

New tests were added to cover some more complex cases when partitionBy or orderBy uses expressions.

ExpressionParserSuite.'window function expressions' was also extended to check parsing of some additional window expressions.

How was this patch tested?

Only new tests were added, automated tests were executed.

'v,
lead("v", 1).over(Window.orderBy($"k1".desc, $"k2")),
lead("v", 1).over(Window.orderBy($"k1", $"k2".desc)),
lead("v", 1).over(Window.orderBy($"k1".desc, $"k2"))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see multiple orderBy but what is the benefit to see the exact same calculation in the 2nd and 4th column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be desc, desc, thanks for catching it.

assertEqual("foo(*) over (order by a desc, b asc)", windowed(Seq.empty, Seq('a.desc, 'b.asc)))
assertEqual("foo(*) over (sort by a)", windowed(Seq.empty, Seq('a.asc)))
assertEqual("foo(*) over (sort by a desc, b asc)", windowed(Seq.empty, Seq('a.desc, 'b.asc)))
assertEqual("foo(*) over (partition by a, b order by c)", windowed(Seq('a, 'b), Seq('c.asc)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(partition by a order by c) is a possibility in order to be conform with the pattern you followed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it, thx

assertEqual("foo(*) over (sort by a)", windowed(Seq.empty, Seq('a.asc)))
assertEqual("foo(*) over (sort by a desc, b asc)", windowed(Seq.empty, Seq('a.desc, 'b.asc)))
assertEqual("foo(*) over (partition by a, b order by c)", windowed(Seq('a, 'b), Seq('c.asc)))
assertEqual("foo(*) over (distribute by a, b sort by c)", windowed(Seq('a, 'b), Seq('c.asc)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(distribute by a order by c) is a possibility in order to be conform with the pattern you followed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it, thx

@squito
Copy link
Contributor

squito commented Dec 21, 2017

Jenkins, ok to test

@SparkQA
Copy link

SparkQA commented Dec 21, 2017

Test build #85277 has finished for PR 20045 at commit 797c907.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor

squito commented Dec 21, 2017

failure seems to be unrelated ...

Jenkins, retest this please

}


test("Order by without frame defaults to range between unbounded_preceding - current_row") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose using an easier aggregate function to follow this behaviour: collect_list.
And using one more projection where "rangeBetween(Window.unboundedPreceding, Window.currentRow)" is explicitly given. Moreover desc direction can be tested too.

Like:

test("Order by without frame defaults to range between unbounded_preceding - current_row") {
    val df = Seq(
      ("a", "p1", "1"),
      ("b", "p1", "2"),
      ("c", "p1", "2")).toDF("key", "partition", "value")
    checkAnswer(
      df.select(
        $"key",
        collect_list("value").over(Window.partitionBy($"partition").orderBy($"value")),
        collect_list("value").over(Window.partitionBy($"partition").orderBy($"value")
          .rangeBetween(Window.unboundedPreceding, Window.currentRow)),
        collect_list("value").over(Window.partitionBy($"partition").orderBy($"value".desc)),
        collect_list("value").over(Window.partitionBy($"partition").orderBy($"value".desc)
          .rangeBetween(Window.unboundedPreceding, Window.currentRow))),
      Seq(
        Row("a", Array("1"), Array("1"), Array("2", "2", "1"), Array("2", "2", "1")),
        Row("b", Array("1", "2", "2"), Array("1", "2", "2"), Array("2", "2"), Array("2", "2")),
        Row("c", Array("1", "2", "2"), Array("1", "2", "2"), Array("2", "2"), Array("2", "2"))))
  }

@SparkQA
Copy link

SparkQA commented Dec 22, 2017

Test build #4023 has finished for PR 20045 at commit 797c907.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@smurakozi smurakozi changed the title [Spark-22360][SQL] Add unit tests for Window Specifications [Spark-22360][SQL][TEST] Add unit tests for Window Specifications Dec 23, 2017
@SparkQA
Copy link

SparkQA commented Jan 2, 2018

Test build #85612 has finished for PR 20045 at commit a0100d1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@smurakozi
Copy link
Contributor Author

@gatorsmile @hvanhovell @jiangxb1987, could you have a look, please?

@SparkQA
Copy link

SparkQA commented Jan 17, 2018

Test build #86256 has finished for PR 20045 at commit 28e1738.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 17, 2018

Test build #86258 has finished for PR 20045 at commit a3d3429.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 17, 2018

Test build #86267 has finished for PR 20045 at commit 04d9e0f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@smurakozi
Copy link
Contributor Author

@gatorsmile @hvanhovell @jiangxb1987, could you have a look, please?

// Basic window testing.
assertEqual("foo(*) over w1", UnresolvedWindowExpression(func, WindowSpecReference("w1")))
assertEqual("foo(*) over ()", windowed())
assertEqual("foo(*) over (partition by a)", windowed(Seq('a)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shall also cover the null cases, such as foo(*) over (partition by null)

@SparkQA
Copy link

SparkQA commented Jan 22, 2018

Test build #86475 has finished for PR 20045 at commit 5feb4f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@smurakozi
Copy link
Contributor Author

Do you think I need to cover any other cases, @jiangxb1987 ?

@smurakozi
Copy link
Contributor Author

I think I've addressed all your points @jiangxb1987.
Do you think I need any more changes?

@jiangxb1987
Copy link
Contributor

Sorry for the delay, I'll check the results later this week.

@smurakozi
Copy link
Contributor Author

ping @jiangxb1987

1 similar comment
@smurakozi
Copy link
Contributor Author

ping @jiangxb1987

@rxin
Copy link
Contributor

rxin commented Apr 9, 2018

Can we add them to the file based test suites instead?

@HyukjinKwon
Copy link
Member

ok to test

@HyukjinKwon
Copy link
Member

gentle ping @smurakozi

@SparkQA
Copy link

SparkQA commented Jul 16, 2018

Test build #93049 has finished for PR 20045 at commit 5feb4f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [Spark-22360][SQL][TEST] Add unit tests for Window Specifications [SPARK-22360][SQL][TEST] Add unit tests for Window Specifications Jun 15, 2019
@dongjoon-hyun
Copy link
Member

Do you still need this, @jiangxb1987 ? Also, gentle ping, @smurakozi .

@gaborgsomogyi
Copy link
Contributor

I'm pretty sure @smurakozi will not finish this PR.

@HyukjinKwon
Copy link
Member

Closing due to inactivity here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants