Skip to content

Conversation

andrewor14
Copy link
Contributor

This patch reverts most of the changes in a previous fix #8827.

The real cause of the issue is that in TungstenAggregate's prepare method we only reserve 1 page, but later when we switch to sort-based aggregation we try to acquire 1 page AND a pointer array. The longer-term fix should be to reserve also the pointer array, but for now _we will simply not track the pointer array_. (Note that elsewhere we already don't track the pointer array, e.g. here)

Note: This patch reuses the unit test added in #8827 so it doesn't show up in the diff.

@andrewor14 andrewor14 force-pushed the dont-track-pointer-array branch from 091298e to dfc73e8 Compare September 23, 2015 21:26
@andrewor14 andrewor14 force-pushed the dont-track-pointer-array branch from dfc73e8 to a96b94e Compare September 23, 2015 21:26
@andrewor14 andrewor14 force-pushed the dont-track-pointer-array branch from b2708ca to c910d0b Compare September 23, 2015 21:28
@davies
Copy link
Contributor

davies commented Sep 23, 2015

LGTM

@SparkQA
Copy link

SparkQA commented Sep 23, 2015

Test build #42924 has finished for PR 8888 at commit c910d0b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 23, 2015

Test build #42925 has finished for PR 8888 at commit ed36351.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Looks like we still tracked the pointer array memory when we grow
it. Don't do that.
@SparkQA
Copy link

SparkQA commented Sep 24, 2015

Test build #1803 has started for PR 8888 at commit ed36351.

@SparkQA
Copy link

SparkQA commented Sep 24, 2015

Test build #1804 has started for PR 8888 at commit ed36351.

@SparkQA
Copy link

SparkQA commented Sep 24, 2015

Test build #1799 has finished for PR 8888 at commit ed36351.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 24, 2015

Test build #1806 has started for PR 8888 at commit ed36351.

@SparkQA
Copy link

SparkQA commented Sep 24, 2015

Test build #1805 has started for PR 8888 at commit ed36351.

@SparkQA
Copy link

SparkQA commented Sep 24, 2015

Test build #1800 has finished for PR 8888 at commit ed36351.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 24, 2015

Test build #1802 has finished for PR 8888 at commit ed36351.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 24, 2015

Test build #1801 has finished for PR 8888 at commit ed36351.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor Author

By the way @JoshRosen this seems to run core tests like SecurityManagerSuite even though it touches only SQL stuff. Just FYI.

@chenghao-intel
Copy link
Contributor

https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L138 Seems reserve the data page only if the existingInMemorySorter is unset, however, SPARK-474 is not this case right? As it follows the sort merge join operator.

@JoshRosen
Copy link
Contributor

@andrewor14, it's running core tests because you changed core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java

@andrewor14
Copy link
Contributor Author

@chenghao-intel SPARK-10474 is caused by an aggregate falling back to sort-based aggregation. In this case we don't acquire the page in the constructor, but we do acquire it when we insert into the sorter later.

@andrewor14
Copy link
Contributor Author

it's running core tests because you changed core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java

Ah yes never mind. I didn't realize it was in core.

@SparkQA
Copy link

SparkQA commented Sep 24, 2015

Test build #1807 has started for PR 8888 at commit 2dc34c3.

@chenghao-intel
Copy link
Contributor

@andrewor14 that's actually what I mean, if we didn't reserve the memory when creating the down streaming operator, we probably never get the chance to acquire the page when inserting records to sorter, as the upstreaming operator (like SMJ) probably eat out all of the memory, which even lead to the hash aggregation switching to sort-based aggregation when first record comes. Do you think that's possible?

@andrewor14
Copy link
Contributor Author

@chenghao-intel but we do reserve the page in advance. See TungstenAggregate:

def preparePartition(): TungstenAggregationIterator = {

@SparkQA
Copy link

SparkQA commented Sep 24, 2015

Test build #42942 has finished for PR 8888 at commit 2dc34c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor Author

OK it passed tests. I'm merging this into master 1.5.

@SparkQA
Copy link

SparkQA commented Sep 24, 2015

Test build #1808 has finished for PR 8888 at commit 2dc34c3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Sep 24, 2015
…array (round 2)

This patch reverts most of the changes in a previous fix #8827.

The real cause of the issue is that in `TungstenAggregate`'s prepare method we only reserve 1 page, but later when we switch to sort-based aggregation we try to acquire 1 page AND a pointer array. The longer-term fix should be to reserve also the pointer array, but for now ***we will simply not track the pointer array***. (Note that elsewhere we already don't track the pointer array, e.g. [here](https://github.com/apache/spark/blob/a18208047f06a4244703c17023bb20cbe1f59d73/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java#L88))

Note: This patch reuses the unit test added in #8827 so it doesn't show up in the diff.

Author: Andrew Or <andrew@databricks.com>

Closes #8888 from andrewor14/dont-track-pointer-array.

(cherry picked from commit 83f6f54)
Signed-off-by: Andrew Or <andrew@databricks.com>
@andrewor14 andrewor14 closed this Sep 24, 2015
@andrewor14 andrewor14 deleted the dont-track-pointer-array branch September 24, 2015 02:35
asfgit pushed a commit that referenced this pull request Sep 24, 2015
…array (round 2)

This patch reverts most of the changes in a previous fix #8827.

The real cause of the issue is that in `TungstenAggregate`'s prepare method we only reserve 1 page, but later when we switch to sort-based aggregation we try to acquire 1 page AND a pointer array. The longer-term fix should be to reserve also the pointer array, but for now ***we will simply not track the pointer array***. (Note that elsewhere we already don't track the pointer array, e.g. [here](https://github.com/apache/spark/blob/a18208047f06a4244703c17023bb20cbe1f59d73/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java#L88))

Note: This patch reuses the unit test added in #8827 so it doesn't show up in the diff.

Author: Andrew Or <andrew@databricks.com>

Closes #8888 from andrewor14/dont-track-pointer-array.
@SparkQA
Copy link

SparkQA commented Sep 24, 2015

Test build #1811 has finished for PR 8888 at commit 2dc34c3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 24, 2015

Test build #1810 has finished for PR 8888 at commit 2dc34c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 24, 2015

Test build #1809 has finished for PR 8888 at commit 2dc34c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor

Thank you @andrewor14 for the explanation. I believe you're talking about to reserve the data page in advanced via the UnsafeFixedWidthAggregationMap.map (BytesToBytesMap), which will reserve the data page in its constructor.
Everything looks reasonable to me now, the only concern is the performance, since only a single data page available for the sort-based aggregation. Anyway, it's not related to this fix.

@andrewor14
Copy link
Contributor Author

Yes, it's not related. What you mention here is a bigger problem. The current solution only ensures that we don't starve any operators. In the future we can improve this mechanism by introducing some force spilling mechanism, but that's too big of a change to backport to 1.5.

ashangit pushed a commit to ashangit/spark that referenced this pull request Oct 19, 2016
…array (round 2)

This patch reverts most of the changes in a previous fix apache#8827.

The real cause of the issue is that in `TungstenAggregate`'s prepare method we only reserve 1 page, but later when we switch to sort-based aggregation we try to acquire 1 page AND a pointer array. The longer-term fix should be to reserve also the pointer array, but for now ***we will simply not track the pointer array***. (Note that elsewhere we already don't track the pointer array, e.g. [here](https://github.com/apache/spark/blob/a18208047f06a4244703c17023bb20cbe1f59d73/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java#L88))

Note: This patch reuses the unit test added in apache#8827 so it doesn't show up in the diff.

Author: Andrew Or <andrew@databricks.com>

Closes apache#8888 from andrewor14/dont-track-pointer-array.

(cherry picked from commit 83f6f54)
Signed-off-by: Andrew Or <andrew@databricks.com>
(cherry picked from commit 1f47e68)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants