Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7081] Faster sort-based shuffle path using binary processing cache-aware sort #5868

Closed
wants to merge 96 commits into from

Commits on May 1, 2015

  1. WIP on UnsafeSorter

    JoshRosen committed May 1, 2015
    Configuration menu
    Copy the full SHA
    81d52c5 View commit details
    Browse the repository at this point in the history
  2. Add basic test case.

    JoshRosen committed May 1, 2015
    Configuration menu
    Copy the full SHA
    abf7bfe View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    57a4ea0 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    e900152 View commit details
    Browse the repository at this point in the history
  5. Fix invalid range in UnsafeSorter.

    TODO: write fuzz tests to uncover stuff like this.
    Sorting has nice invariants; should be an easy test
    to write.
    JoshRosen committed May 1, 2015
    Configuration menu
    Copy the full SHA
    767d3ca View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    3db12de View commit details
    Browse the repository at this point in the history
  7. WIP

    JoshRosen committed May 1, 2015
    Configuration menu
    Copy the full SHA
    4d2f5e1 View commit details
    Browse the repository at this point in the history
  8. Begin code cleanup.

    JoshRosen committed May 1, 2015
    Configuration menu
    Copy the full SHA
    8e3ec20 View commit details
    Browse the repository at this point in the history
  9. More cleanup

    JoshRosen committed May 1, 2015
    Configuration menu
    Copy the full SHA
    253f13e View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    9c6cf58 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    e267cee View commit details
    Browse the repository at this point in the history
  12. Expand serializer API and use new function to help control when new U…

    …nsafeShuffle path is used.
    JoshRosen committed May 1, 2015
    Configuration menu
    Copy the full SHA
    e2d96ca View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    d3cc310 View commit details
    Browse the repository at this point in the history
  14. Renaming and comments

    JoshRosen committed May 1, 2015
    Configuration menu
    Copy the full SHA
    87e721b View commit details
    Browse the repository at this point in the history

Commits on May 2, 2015

  1. Configuration menu
    Copy the full SHA
    0748458 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    026b497 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    1433b42 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    240864c View commit details
    Browse the repository at this point in the history

Commits on May 3, 2015

  1. Add tests for serializer relocation property.

    I verified that the Kryo tests will fail if we remove the auto-reset
    check in KryoSerializer. I also checked that this test fails if we
    mistakenly enable this flag for JavaSerializer. This demonstrates that
    the test case is actually capable of detecting the types of bugs that it's
    trying to prevent.
    
    Of course, it's possible that certain bugs will only surface when serializing
    specific data types, so we'll still have to be cautious when overriding
    `supportsRelocationOfSerializedObjects` for new serializers.
    JoshRosen committed May 3, 2015
    Configuration menu
    Copy the full SHA
    bfc12d3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b8a09fe View commit details
    Browse the repository at this point in the history
  3. Small refactoring of SerializerPropertiesSuite to enable test re-use:

    This lays some groundwork for re-using this test logic for serializers defined
    in other subprojects (those projects can just declare a test-jar dependency
    on Spark core).
    JoshRosen committed May 3, 2015
    Configuration menu
    Copy the full SHA
    c2fca17 View commit details
    Browse the repository at this point in the history
  4. Add missing newline

    JoshRosen committed May 3, 2015
    Configuration menu
    Copy the full SHA
    f17fa8f View commit details
    Browse the repository at this point in the history
  5. Fix bug in calculating free space in current page.

    This broke off-heap mode.
    JoshRosen committed May 3, 2015
    Configuration menu
    Copy the full SHA
    8958584 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    595923a View commit details
    Browse the repository at this point in the history

Commits on May 4, 2015

  1. Configuration menu
    Copy the full SHA
    5e100b2 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2776aca View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f156a8f View commit details
    Browse the repository at this point in the history
  4. Misc. cleanup

    JoshRosen committed May 4, 2015
    Configuration menu
    Copy the full SHA
    3490512 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    3aeaff7 View commit details
    Browse the repository at this point in the history

Commits on May 5, 2015

  1. Re-order imports in tests

    JoshRosen committed May 5, 2015
    Configuration menu
    Copy the full SHA
    7ee918e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    69232fd View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    57f1ec0 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    f480fb2 View commit details
    Browse the repository at this point in the history
  5. WIP towards testing UnsafeShuffleWriter.

    Unfortunately, this involved a TON of mocks; maybe it would be easier to split
    the writer into more objects, such as a spiller and merger, as I did when the
    sorting code was more generic.
    JoshRosen committed May 5, 2015
    Configuration menu
    Copy the full SHA
    133c8c9 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    4f70141 View commit details
    Browse the repository at this point in the history

Commits on May 6, 2015

  1. Configuration menu
    Copy the full SHA
    aaea17b View commit details
    Browse the repository at this point in the history
  2. Merge remote-tracking branch 'origin/master' into unsafe-sort

    Conflicts:
    	core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala
    	core/src/main/scala/org/apache/spark/serializer/Serializer.scala
    	core/src/test/scala/org/apache/spark/serializer/SerializerPropertiesSuite.scala
    	sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlSerializer2.scala
    JoshRosen committed May 6, 2015
    Configuration menu
    Copy the full SHA
    b674412 View commit details
    Browse the repository at this point in the history

Commits on May 7, 2015

  1. Configuration menu
    Copy the full SHA
    11feeb6 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8a6fe52 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    cfe0ec4 View commit details
    Browse the repository at this point in the history
  4. Remove upper type bound in ShuffleWriter interface.

    This bound wasn't necessary and was causing IntelliJ to display spurious
    errors when editing UnsafeShuffleWriter.java.
    JoshRosen committed May 7, 2015
    Configuration menu
    Copy the full SHA
    e67f1ea View commit details
    Browse the repository at this point in the history
  5. More minor cleanup

    JoshRosen committed May 7, 2015
    Configuration menu
    Copy the full SHA
    5e8cf75 View commit details
    Browse the repository at this point in the history
  6. More minor cleanup

    JoshRosen committed May 7, 2015
    Configuration menu
    Copy the full SHA
    1ce1300 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    b95e642 View commit details
    Browse the repository at this point in the history

Commits on May 8, 2015

  1. Configuration menu
    Copy the full SHA
    9883e30 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    722849b View commit details
    Browse the repository at this point in the history

Commits on May 9, 2015

  1. Configuration menu
    Copy the full SHA
    7cd013b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9b7ebed View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e8718dd View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    1929a74 View commit details
    Browse the repository at this point in the history

Commits on May 10, 2015

  1. Configuration menu
    Copy the full SHA
    01afc74 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8f5061a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    67d25ba View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    fd4bb9e View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    9d1ee7c View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    fcd9a3c View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    27b18b0 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    4a01c45 View commit details
    Browse the repository at this point in the history

Commits on May 11, 2015

  1. Configuration menu
    Copy the full SHA
    f780fb1 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b57c17f View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    1ef56c7 View commit details
    Browse the repository at this point in the history
  4. Properly implement close() and flush() in DummySerializerInstance.

    It turns out that we actually rely on these flushing the underlying
    stream in order to properly close streams in DiskBlockObjectWriter;
    it was silly of me to not implement these methods.
    
    This should fix a failing LZ4 test in UnsafeShuffleWriterSuite.
    JoshRosen committed May 11, 2015
    Configuration menu
    Copy the full SHA
    b3b1924 View commit details
    Browse the repository at this point in the history
  5. Bump up shuffle.memoryFraction to make tests pass.

    We'll want to revisit this before merging, since the large minimum memory
    usage means that minimum memory requirements for shuffle may be fairly
    high for local tests.
    JoshRosen committed May 11, 2015
    Configuration menu
    Copy the full SHA
    0d4d199 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    ec6d626 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    ae538dc View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    ea4f85f View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    1e3ad52 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    39434f9 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    e1855e5 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    7c953f9 View commit details
    Browse the repository at this point in the history
  13. Add tests that automatically trigger spills.

    This bumps up line coverage to 93% in UnsafeShuffleExternalSorter; now,
    the only branches that are missed are exception-handling code.
    JoshRosen committed May 11, 2015
    Configuration menu
    Copy the full SHA
    8531286 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    69d5899 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    d4e6d89 View commit details
    Browse the repository at this point in the history

Commits on May 12, 2015

  1. 20 Configuration menu
    Copy the full SHA
    4f0b770 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e58a6b4 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e995d1a View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    56781a1 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    0ad34da View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    85da63f View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    fdcac08 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    2d4e4f4 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    57312c9 View commit details
    Browse the repository at this point in the history
  10. Remove ability to disable spilling in UnsafeShuffleExternalSorter.

    There's no obvious use-case for allowing users to disable spark.shuffle.spill
    and run out of memory. Because this configuration isn't deprecated as of this
    patch, I've added code to log a warning to let users know if their preference
    will be ignored by the new shuffle manager.
    JoshRosen committed May 12, 2015
    Configuration menu
    Copy the full SHA
    6276168 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    4a2c785 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    e3b8855 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    c2ce78e View commit details
    Browse the repository at this point in the history
  14. Merge remote-tracking branch 'origin/master' into unsafe-sort

    Conflicts:
    	project/MimaExcludes.scala
    JoshRosen committed May 12, 2015
    Configuration menu
    Copy the full SHA
    d5779c6 View commit details
    Browse the repository at this point in the history
  15. Track time spend closing / flushing files; split TimeTrackingOutputSt…

    …ream into separate file.
    JoshRosen committed May 12, 2015
    Configuration menu
    Copy the full SHA
    5e189c6 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    df07699 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    de40b9d View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    4023fa4 View commit details
    Browse the repository at this point in the history

Commits on May 13, 2015

  1. Configuration menu
    Copy the full SHA
    51812a7 View commit details
    Browse the repository at this point in the history
  2. Fix some bugs in the address packing code.

    The problem is that TaskMemoryManager expects
    offsets to include the page base address whereas
    PackedRecordPointer did not.
    JoshRosen committed May 13, 2015
    Configuration menu
    Copy the full SHA
    52a9981 View commit details
    Browse the repository at this point in the history
  3. Fix deserialization of JavaSerializer instances.

    This caused a failure in a new test; this problem
    occurs when calls ShuffledRDD.setSerializer() with
    a JavaSerializer.
    JoshRosen committed May 13, 2015
    Configuration menu
    Copy the full SHA
    d494ffe View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    7610f2f View commit details
    Browse the repository at this point in the history
  5. Fix scalastyle errors

    JoshRosen committed May 13, 2015
    Configuration menu
    Copy the full SHA
    ef0a86e View commit details
    Browse the repository at this point in the history