Skip to content

Conversation

@rxin
Copy link
Contributor

@rxin rxin commented Aug 2, 2015

The detailed approach is documented in UnsafeKVExternalSorterSuite.testKVSorter(), working as follows:

  1. Create input by generating data randomly based on the given key/value schema (which is also randomly drawn from a list of candidate types)
  2. Run UnsafeKVExternalSorter on the generated data
  3. Collect the output from the sorter, and make sure the keys are sorted in ascending order
  4. Sort the input by both key and value, and sort the sorter output also by both key and value. Compare the sorted input and sorted output together to make sure all the key/values match.
  5. Check memory allocation to make sure there is no memory leak.

There is also a spill flag. When set to true, the sorter will spill probabilistically roughly every 100 records.

@SparkQA
Copy link

SparkQA commented Aug 2, 2015

Test build #39443 has finished for PR 7873 at commit 0488b5c.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class FreqSequence[Item](val sequence: Array[Array[Item]], val freq: Long) extends Serializable
    • class PrefixSpanModel[Item](val freqSequences: RDD[PrefixSpan.FreqSequence[Item]])
    • public final class UnsafeKVExternalSorter

@rxin
Copy link
Contributor Author

rxin commented Aug 2, 2015

The test failure is unrelated to my change.

@SparkQA
Copy link

SparkQA commented Aug 2, 2015

Test build #39446 has finished for PR 7873 at commit a08c251.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public final class UnsafeKVExternalSorter

@rxin
Copy link
Contributor Author

rxin commented Aug 3, 2015

Merging in master.

@asfgit asfgit closed this in 9d03ad9 Aug 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants