-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-22360][SQL][TEST] Add unit tests for Window Specifications #20045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
'v, | ||
lead("v", 1).over(Window.orderBy($"k1".desc, $"k2")), | ||
lead("v", 1).over(Window.orderBy($"k1", $"k2".desc)), | ||
lead("v", 1).over(Window.orderBy($"k1".desc, $"k2"))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I see multiple orderBy but what is the benefit to see the exact same calculation in the 2nd and 4th column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be desc, desc, thanks for catching it.
assertEqual("foo(*) over (order by a desc, b asc)", windowed(Seq.empty, Seq('a.desc, 'b.asc))) | ||
assertEqual("foo(*) over (sort by a)", windowed(Seq.empty, Seq('a.asc))) | ||
assertEqual("foo(*) over (sort by a desc, b asc)", windowed(Seq.empty, Seq('a.desc, 'b.asc))) | ||
assertEqual("foo(*) over (partition by a, b order by c)", windowed(Seq('a, 'b), Seq('c.asc))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(partition by a order by c) is a possibility in order to be conform with the pattern you followed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix it, thx
assertEqual("foo(*) over (sort by a)", windowed(Seq.empty, Seq('a.asc))) | ||
assertEqual("foo(*) over (sort by a desc, b asc)", windowed(Seq.empty, Seq('a.desc, 'b.asc))) | ||
assertEqual("foo(*) over (partition by a, b order by c)", windowed(Seq('a, 'b), Seq('c.asc))) | ||
assertEqual("foo(*) over (distribute by a, b sort by c)", windowed(Seq('a, 'b), Seq('c.asc))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(distribute by a order by c) is a possibility in order to be conform with the pattern you followed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix it, thx
Jenkins, ok to test |
Test build #85277 has finished for PR 20045 at commit
|
failure seems to be unrelated ... Jenkins, retest this please |
} | ||
|
||
|
||
test("Order by without frame defaults to range between unbounded_preceding - current_row") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose using an easier aggregate function to follow this behaviour: collect_list.
And using one more projection where "rangeBetween(Window.unboundedPreceding, Window.currentRow)" is explicitly given. Moreover desc direction can be tested too.
Like:
test("Order by without frame defaults to range between unbounded_preceding - current_row") {
val df = Seq(
("a", "p1", "1"),
("b", "p1", "2"),
("c", "p1", "2")).toDF("key", "partition", "value")
checkAnswer(
df.select(
$"key",
collect_list("value").over(Window.partitionBy($"partition").orderBy($"value")),
collect_list("value").over(Window.partitionBy($"partition").orderBy($"value")
.rangeBetween(Window.unboundedPreceding, Window.currentRow)),
collect_list("value").over(Window.partitionBy($"partition").orderBy($"value".desc)),
collect_list("value").over(Window.partitionBy($"partition").orderBy($"value".desc)
.rangeBetween(Window.unboundedPreceding, Window.currentRow))),
Seq(
Row("a", Array("1"), Array("1"), Array("2", "2", "1"), Array("2", "2", "1")),
Row("b", Array("1", "2", "2"), Array("1", "2", "2"), Array("2", "2"), Array("2", "2")),
Row("c", Array("1", "2", "2"), Array("1", "2", "2"), Array("2", "2"), Array("2", "2"))))
}
Test build #4023 has finished for PR 20045 at commit
|
Test build #85612 has finished for PR 20045 at commit
|
@gatorsmile @hvanhovell @jiangxb1987, could you have a look, please? |
Test build #86256 has finished for PR 20045 at commit
|
Test build #86258 has finished for PR 20045 at commit
|
Test build #86267 has finished for PR 20045 at commit
|
@gatorsmile @hvanhovell @jiangxb1987, could you have a look, please? |
// Basic window testing. | ||
assertEqual("foo(*) over w1", UnresolvedWindowExpression(func, WindowSpecReference("w1"))) | ||
assertEqual("foo(*) over ()", windowed()) | ||
assertEqual("foo(*) over (partition by a)", windowed(Seq('a))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shall also cover the null
cases, such as foo(*) over (partition by null)
Test build #86475 has finished for PR 20045 at commit
|
Do you think I need to cover any other cases, @jiangxb1987 ? |
I think I've addressed all your points @jiangxb1987. |
Sorry for the delay, I'll check the results later this week. |
ping @jiangxb1987 |
1 similar comment
ping @jiangxb1987 |
Can we add them to the file based test suites instead? |
ok to test |
gentle ping @smurakozi |
Test build #93049 has finished for PR 20045 at commit
|
Do you still need this, @jiangxb1987 ? Also, gentle ping, @smurakozi . |
I'm pretty sure @smurakozi will not finish this PR. |
Closing due to inactivity here. |
What changes were proposed in this pull request?
Improve the test coverage of window specifications.
New tests cover basic cases more systematically in DataFrameWindowFunctionsSuite:
New tests were added to cover some more complex cases when partitionBy or orderBy uses expressions.
ExpressionParserSuite.'window function expressions' was also extended to check parsing of some additional window expressions.
How was this patch tested?
Only new tests were added, automated tests were executed.