-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32616][SQL] Window operators should be added determinedly #29432
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
oh.. good catch! LGTM, pending Jenkins. |
github action passes, merging to master! |
thanks all! |
We don't need to backport this into branch-3.0? spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Lines 2686 to 2705 in 81d7747
|
It's not a bug. The query can still run and return correct result. |
okay, I updated the type in the jira ( |
Test build #127453 has finished for PR 29432 at commit
|
Recently encountered a similar problem in the production environment, the calculation order of the window is random, sometimes it only takes 1-2 minutes to run, sometimes it takes 1-2 hours to run, or it may fail due to OOM. If Using this Pacth, I can ensure the order of the windows and ensure the speed of calculation. CREATE TABLE XXX (
`d` STRING ,
`indexs` STRING,
`vid` STRING );
select distinct indexs, d, dd, pd, diffday
, count(vid) over(PARTITION BY indexs, d, diffday) as c2
, size(collect_set(vid) OVER (PARTITION BY indexs, d)) AS c1
from
(
select indexs, vid, d, a.dd, p.pd, datediff(a.d, p.pd) as diffday
from
(
select indexs, vid, d
, collect_set(d) over(partition by indexs, vid order by d asc rows between unbounded preceding and 1 preceding) as dd
from XXX
) a
lateral view outer explode(dd) p as pd
) a;
|
What changes were proposed in this pull request?
Use the
LinkedHashMap
instead ofimmutable.Map
to hold theWindow
expressions inExtractWindowExpressions.addWindow
.Why are the changes needed?
This is a bug fix for #29270. In that PR, the generated plan(especially for the queries q47, q49, q57) on Jenkins always can not match the golden plan generated on my laptop.
It happens because
ExtractWindowExpressions.addWindow
now usesimmutable.Map
to hold theWindow
expressions by the key(spec.partitionSpec, spec.orderSpec, WindowFunctionType.functionType(expr))
and converts the map toSeq
at the end. Then, theSeq
is used to add Window operators on top of the child plan. However, for the same query, the order of Windows expression inside theSeq
could be undetermined when the expression id changes(which can affect the key). As a result, the same query could have different plans because of the undetermined order of Window operators.Therefore, we use
LinkedHashMap
, which records the insertion order of entries, to make the adding order determined.Does this PR introduce any user-facing change?
Maybe yes, users now always see the same plan for the same queries with multiple Window operators.
How was this patch tested?
It's really hard to make a reproduce demo. I just tested manually with #29270 and it looks good.