Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7542] [SQL] Support off-heap index/sort buffer #9477

Closed
wants to merge 5 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Nov 5, 2015

This brings the support of off-heap memory for array inside BytesToBytesMap and InMemorySorter, then we could allocate all the memory from off-heap for execution.

Closes #8068

@davies davies force-pushed the unsafe_timsort branch 3 times, most recently from 77555e1 to 3cb22d4 Compare November 5, 2015 01:24
@SparkQA
Copy link

SparkQA commented Nov 5, 2015

Test build #45072 has finished for PR 9477 at commit 77555e1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 5, 2015

Test build #45076 has finished for PR 9477 at commit 3cb22d4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * final class ShuffleSortDataFormat extends SortDataFormat<PackedRecordPointer, LongArray>\n * final class UnsafeSortDataFormat extends SortDataFormat<RecordPointerAndKeyPrefix, LongArray>\n

@SparkQA
Copy link

SparkQA commented Nov 5, 2015

Test build #45070 has finished for PR 9477 at commit 862b38f.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor Author

davies commented Nov 5, 2015

@JoshRosen This is ready for review

try {
acquireMemory(needed); // could trigger spilling
// could trigger spilling
page = taskMemoryManager.allocatePage(used * 2, this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implicit here is the fact that the in memory sorter's only source of memory usage is the pointer array itself. That's fine, though.

@JoshRosen
Copy link
Contributor

LGTM overall, but I'd like to address one concern before merging: I'm worried that passing both the MemoryConsumer and TaskMemoryManager to the sorter components will open the potential for bugs; I feel that those classes should allocate their memory using only the public MemoryConsumer methods.

Note that the MemoryConsumer methods actually end up calling the TaskMemoryManager methods which your code calls directly:

protected void acquireMemory(long size) {

I think that we should mark those TaskMemoryManager methods as methods that are only supposed to be called by the memory consumer and not directly by developers. The problem with calling them directly is that the bookkeeping in the MemoryConsumer itself won't have been updated. This current API has such a high potential for this type of misuse that I think we should look at fancy Java-isms to restrict the visibility / callability of those methods such that they can only be called from MemoryConsumer.

If you fix this by having those classes only allocate through their MemoryConsumers then feel free to merge as soon as this passes tests. I can take care of a followup to fix the documentation / code to avoid this type of misuse.

@SparkQA
Copy link

SparkQA commented Nov 5, 2015

Test build #1983 has finished for PR 9477 at commit 3cb22d4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor Author

davies commented Nov 5, 2015

@JoshRosen I should had addressed your comments, will merge this once pass the tests.

@SparkQA
Copy link

SparkQA commented Nov 5, 2015

Test build #45096 has finished for PR 9477 at commit 89319e0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * final class ShuffleSortDataFormat extends SortDataFormat<PackedRecordPointer, LongArray>\n * final class UnsafeSortDataFormat extends SortDataFormat<RecordPointerAndKeyPrefix, LongArray>\n

@SparkQA
Copy link

SparkQA commented Nov 5, 2015

Test build #45108 has finished for PR 9477 at commit b42e7db.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * final class ShuffleSortDataFormat extends SortDataFormat<PackedRecordPointer, LongArray>\n * final class UnsafeSortDataFormat extends SortDataFormat<RecordPointerAndKeyPrefix, LongArray>\n

@SparkQA
Copy link

SparkQA commented Nov 5, 2015

Test build #45110 has finished for PR 9477 at commit 854a99f.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * final class ShuffleSortDataFormat extends SortDataFormat<PackedRecordPointer, LongArray>\n * final class UnsafeSortDataFormat extends SortDataFormat<RecordPointerAndKeyPrefix, LongArray>\n

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

try {
acquireMemory(needed); // could trigger spilling
// could trigger spilling
array = allocateArray(used / 16 * 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be / 8 * 2 instead, since we want to double the number of slots in the array and each slot requires 8 bytes of memory?

@SparkQA
Copy link

SparkQA commented Nov 5, 2015

Test build #45125 has finished for PR 9477 at commit 854a99f.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * final class ShuffleSortDataFormat extends SortDataFormat<PackedRecordPointer, LongArray>\n * final class UnsafeSortDataFormat extends SortDataFormat<RecordPointerAndKeyPrefix, LongArray>\n

@SparkQA
Copy link

SparkQA commented Nov 5, 2015

Test build #45127 has finished for PR 9477 at commit c35f512.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * final class ShuffleSortDataFormat extends SortDataFormat<PackedRecordPointer, LongArray>\n * final class UnsafeSortDataFormat extends SortDataFormat<RecordPointerAndKeyPrefix, LongArray>\n

@SparkQA
Copy link

SparkQA commented Nov 5, 2015

Test build #1989 has finished for PR 9477 at commit b367daf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * final class ShuffleSortDataFormat extends SortDataFormat<PackedRecordPointer, LongArray>\n * final class UnsafeSortDataFormat extends SortDataFormat<RecordPointerAndKeyPrefix, LongArray>\n

@davies
Copy link
Contributor Author

davies commented Nov 5, 2015

The failed test is not related, will re-run it.

@SparkQA
Copy link

SparkQA commented Nov 6, 2015

Test build #1993 has finished for PR 9477 at commit b367daf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * final class ShuffleSortDataFormat extends SortDataFormat<PackedRecordPointer, LongArray>\n * final class UnsafeSortDataFormat extends SortDataFormat<RecordPointerAndKeyPrefix, LongArray>\n

@JoshRosen
Copy link
Contributor

LGTM, so I'm going to merge this into master and 1.6. Thanks!

asfgit pushed a commit that referenced this pull request Nov 6, 2015
This brings the support of off-heap memory for array inside BytesToBytesMap and InMemorySorter, then we could allocate all the memory from off-heap for execution.

Closes #8068

Author: Davies Liu <davies@databricks.com>

Closes #9477 from davies/unsafe_timsort.

(cherry picked from commit eec74ba)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
@asfgit asfgit closed this in eec74ba Nov 6, 2015
@SparkQA
Copy link

SparkQA commented Nov 6, 2015

Test build #1994 has finished for PR 9477 at commit b367daf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * final class ShuffleSortDataFormat extends SortDataFormat<PackedRecordPointer, LongArray>\n * final class UnsafeSortDataFormat extends SortDataFormat<RecordPointerAndKeyPrefix, LongArray>\n

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants