-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-10474] [SQL] Aggregation fails to allocate memory for pointer array (round 2) #8888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
091298e
to
dfc73e8
Compare
dfc73e8
to
a96b94e
Compare
b2708ca
to
c910d0b
Compare
LGTM |
Test build #42924 has finished for PR 8888 at commit
|
Test build #42925 has finished for PR 8888 at commit
|
Test build #1803 has started for PR 8888 at commit |
Test build #1804 has started for PR 8888 at commit |
Test build #1799 has finished for PR 8888 at commit
|
Test build #1806 has started for PR 8888 at commit |
Test build #1805 has started for PR 8888 at commit |
Test build #1800 has finished for PR 8888 at commit
|
Test build #1802 has finished for PR 8888 at commit
|
Test build #1801 has finished for PR 8888 at commit
|
By the way @JoshRosen this seems to run core tests like |
https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L138 Seems reserve the data page only if the |
@andrewor14, it's running core tests because you changed |
@chenghao-intel SPARK-10474 is caused by an aggregate falling back to sort-based aggregation. In this case we don't acquire the page in the constructor, but we do acquire it when we insert into the sorter later. |
Ah yes never mind. I didn't realize it was in core. |
Test build #1807 has started for PR 8888 at commit |
@andrewor14 that's actually what I mean, if we didn't reserve the memory when creating the down streaming operator, we probably never get the chance to acquire the page when inserting records to |
@chenghao-intel but we do reserve the page in advance. See spark/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala Line 77 in 084e4e1
|
Test build #42942 has finished for PR 8888 at commit
|
OK it passed tests. I'm merging this into master 1.5. |
Test build #1808 has finished for PR 8888 at commit
|
…array (round 2) This patch reverts most of the changes in a previous fix #8827. The real cause of the issue is that in `TungstenAggregate`'s prepare method we only reserve 1 page, but later when we switch to sort-based aggregation we try to acquire 1 page AND a pointer array. The longer-term fix should be to reserve also the pointer array, but for now ***we will simply not track the pointer array***. (Note that elsewhere we already don't track the pointer array, e.g. [here](https://github.com/apache/spark/blob/a18208047f06a4244703c17023bb20cbe1f59d73/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java#L88)) Note: This patch reuses the unit test added in #8827 so it doesn't show up in the diff. Author: Andrew Or <andrew@databricks.com> Closes #8888 from andrewor14/dont-track-pointer-array. (cherry picked from commit 83f6f54) Signed-off-by: Andrew Or <andrew@databricks.com>
…array (round 2) This patch reverts most of the changes in a previous fix #8827. The real cause of the issue is that in `TungstenAggregate`'s prepare method we only reserve 1 page, but later when we switch to sort-based aggregation we try to acquire 1 page AND a pointer array. The longer-term fix should be to reserve also the pointer array, but for now ***we will simply not track the pointer array***. (Note that elsewhere we already don't track the pointer array, e.g. [here](https://github.com/apache/spark/blob/a18208047f06a4244703c17023bb20cbe1f59d73/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java#L88)) Note: This patch reuses the unit test added in #8827 so it doesn't show up in the diff. Author: Andrew Or <andrew@databricks.com> Closes #8888 from andrewor14/dont-track-pointer-array.
Test build #1811 has finished for PR 8888 at commit
|
Test build #1810 has finished for PR 8888 at commit
|
Test build #1809 has finished for PR 8888 at commit
|
Thank you @andrewor14 for the explanation. I believe you're talking about to reserve the data page in advanced via the |
Yes, it's not related. What you mention here is a bigger problem. The current solution only ensures that we don't starve any operators. In the future we can improve this mechanism by introducing some force spilling mechanism, but that's too big of a change to backport to 1.5. |
…array (round 2) This patch reverts most of the changes in a previous fix apache#8827. The real cause of the issue is that in `TungstenAggregate`'s prepare method we only reserve 1 page, but later when we switch to sort-based aggregation we try to acquire 1 page AND a pointer array. The longer-term fix should be to reserve also the pointer array, but for now ***we will simply not track the pointer array***. (Note that elsewhere we already don't track the pointer array, e.g. [here](https://github.com/apache/spark/blob/a18208047f06a4244703c17023bb20cbe1f59d73/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java#L88)) Note: This patch reuses the unit test added in apache#8827 so it doesn't show up in the diff. Author: Andrew Or <andrew@databricks.com> Closes apache#8888 from andrewor14/dont-track-pointer-array. (cherry picked from commit 83f6f54) Signed-off-by: Andrew Or <andrew@databricks.com> (cherry picked from commit 1f47e68)
This patch reverts most of the changes in a previous fix #8827.
The real cause of the issue is that in
TungstenAggregate
's prepare method we only reserve 1 page, but later when we switch to sort-based aggregation we try to acquire 1 page AND a pointer array. The longer-term fix should be to reserve also the pointer array, but for now _we will simply not track the pointer array_. (Note that elsewhere we already don't track the pointer array, e.g. here)Note: This patch reuses the unit test added in #8827 so it doesn't show up in the diff.