-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-12295] [SQL] external spilling for window functions #10605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #48787 has finished for PR 10605 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private[execution]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are skipping all the records we have already processed here; this shouldn't be to expensive; but it still adds some complexity (O(n * (n - 1) /2) towards O(n * n)). I'd like to see/do a few benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it doubles the time. If N is large, it's kind of like double of infinity is still infinity :-)
Actually I tried to add a skip(n) API for UnsafeSortIterator to improve this a little bit, but it seems not worth that complexity, so removed.
I think the next step could be have a fast path for those functions that has communitativity, we could reduce the complexity to O(N * 2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true that :)
This is fine for now; it is not used that much.
|
@davies this is pretty awesome! I have taken long look at the window code and it looks solid. I am less of an expert on the Memory management front, so maybe someone else should take a look at that. I do have one small concern: I am absolutely convinced that in the case of large partition sizes this will outperform the current implementation by a margin. However, I am wondering what happens if we consider smaller partition sizes (e.g. n 2-32). We might take a small hit in these cases, because of the added complexity. Have you done some benchmarking on this? If you haven't this is a link to benchmark I used for my initial window prototype: https://issues.apache.org/jira/secure/attachment/12745984/perf_test3.scala I'd like to finish with something we should not address in this PR (thinking out loud if you will). The child node of a |
|
Test build #48790 has finished for PR 10605 at commit
|
|
@hvanhovell Here is the result after having a fast path for small partition: |
Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala
Conflicts: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java project/MimaExcludes.scala
|
Test build #48812 has finished for PR 10605 at commit
|
|
Test build #48806 has finished for PR 10605 at commit
|
|
Test build #48817 has finished for PR 10605 at commit
|
|
Test build #48835 has finished for PR 10605 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on making this configurable
|
@davies I left some final comments; feel free to use/ignore them. The benchmark looks good. The shrinking case was expected to perform horrible, and it does. The other figures are so close to the old figure that I don't think we can safely say that they are different (given the limited scope and duration of the test). I was wondering, how does this perform without the optimized (hybrid) approach? LGTM (pending review of the |
|
@hvanhovell Before the fast path, tiny partitions look horrible: I think the cost came from UnsafeExternalSorter, once the partition is larger than 4096, the overhead is under 10%. For larger partitions, this PR may be faster. it's also safe that 4k rows are not managed (4M memory if each row is 1k bytes), So I think we don't need a configuration for this, we already have too much configurations. |
|
@JoshRosen Could you help to review the ExternalSort* parts? |
|
I'm going to merge this to avoid conflict, any comments will be addressed by follow-up PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ExternalRowBuffer, can this copy() be saved ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tedyu both buffers implementations are statefull (they contain an iterator). Each frame traverses the partition in a different way, so we cannot share iterators between them.
This PR manage the memory used by window functions (buffered rows), also enable external spilling. After this PR, we can run window functions on a partition with hundreds of millions of rows with only 1G. Author: Davies Liu <davies@databricks.com> Closes apache#10605 from davies/unsafe_window. Conflicts: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala
This PR manage the memory used by window functions (buffered rows), also enable external spilling. After this PR, we can run window functions on a partition with hundreds of millions of rows with only 1G. Author: Davies Liu <davies@databricks.com> Closes apache#10605 from davies/unsafe_window.
|
|
||
| import java.util.Comparator; | ||
|
|
||
| import org.apache.avro.reflect.Nullable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this should probably have been import javax.annotation.Nullable.
This PR manage the memory used by window functions (buffered rows), also enable external spilling.
After this PR, we can run window functions on a partition with hundreds of millions of rows with only 1G.