Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown #28708

Commits on Jun 17, 2020

  1. Add an option to migrate shuffle blocks as well as the current cache …

    …blocks during decommissioning
    holdenk committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    094d584 View commit details
    Browse the repository at this point in the history
  2. Update core/src/main/scala/org/apache/spark/storage/BlockManager.scala

    CR feedback "Nit: I think the comment is not needed as your code is self-explanatory here:"
    
    Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com>
    holdenk and attilapiros committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    cd0781f View commit details
    Browse the repository at this point in the history
  3. Update core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlock…

    …Resolver.scala
    
    
    If we have a failure during block migration, log the exception.
    
    Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com>
    holdenk and attilapiros committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    8e0304f View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    ac31c90 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    4eaf4dc View commit details
    Browse the repository at this point in the history
  6. Improve error logging

    Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com>
    holdenk and attilapiros committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    fd3354c View commit details
    Browse the repository at this point in the history
  7. cleanup

    Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com>
    holdenk and attilapiros committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    8054404 View commit details
    Browse the repository at this point in the history
  8. cleanup

    Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com>
    holdenk and attilapiros committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    dcff412 View commit details
    Browse the repository at this point in the history
  9. Add more info to debugging

    Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com>
    holdenk and attilapiros committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    dc9d648 View commit details
    Browse the repository at this point in the history
  10. logging string interpolation

    Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com>
    holdenk and attilapiros committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    493a298 View commit details
    Browse the repository at this point in the history
  11. logging string interpolation

    Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com>
    holdenk and attilapiros committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    c9139ef View commit details
    Browse the repository at this point in the history
  12. logging string interpolation

    Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com>
    holdenk and attilapiros committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    0eef6fa View commit details
    Browse the repository at this point in the history
  13. Generalize the decom put to check put as stream and shuffle blocks as…

    … well (help avoid cascading block migration)
    holdenk committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    b50de4e View commit details
    Browse the repository at this point in the history
  14. spacing

    holdenk committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    e2d1057 View commit details
    Browse the repository at this point in the history
  15. Fix long line, make our shuffle block threads stop so we don't leak t…

    …hreads during testing.
    holdenk committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    31dc836 View commit details
    Browse the repository at this point in the history
  16. Remove un-needed shuffleStatus.invalidateSerializedMapOutputStatusCac…

    …he and log the scheduler when asked to decom and we can't
    holdenk committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    a2d5e64 View commit details
    Browse the repository at this point in the history
  17. Always transfer shuffle blocks as put, take out the spark.network.max…

    …RemoteBlockSizeFetchToMem test that we don't need anymore, add back in submitting the thread I accidently took out in applying some CR feedback (a little fast on the ctrl-k)
    holdenk committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    20655a4 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    5c131ef View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    5e11f1b View commit details
    Browse the repository at this point in the history
  20. add shuffle migration test to BlockManagerSuite

    Attila Zsolt Piros authored and holdenk committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    57965c8 View commit details
    Browse the repository at this point in the history
  21. Use StorageLevel.DISK_ONLY instead of manually making our own, and st…

    …ore the execution service being used to run the threads so we can stop them explicitly. Still uses a graceful stop for other executors which become no longer healthy targets instead of a thread kill.
    holdenk committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    93a8a4d View commit details
    Browse the repository at this point in the history
  22. Cleanup thread leaks

    holdenk committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    4560ebb View commit details
    Browse the repository at this point in the history
  23. Remove excess logging

    holdenk committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    13aaa49 View commit details
    Browse the repository at this point in the history
  24. Configuration menu
    Copy the full SHA
    7af7492 View commit details
    Browse the repository at this point in the history
  25. Configuration menu
    Copy the full SHA
    a7d9238 View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    953e5f2 View commit details
    Browse the repository at this point in the history
  27. Configuration menu
    Copy the full SHA
    d63ca07 View commit details
    Browse the repository at this point in the history
  28. Configuration menu
    Copy the full SHA
    197a524 View commit details
    Browse the repository at this point in the history
  29. Rename the two different test suites for decommissioning so the diffe…

    …rence is clear from the class name (integration-style/unit-style)
    holdenk committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    5855eb4 View commit details
    Browse the repository at this point in the history
  30. Various small cleanups including renaming BlockManagerDecommissionMan…

    …ager to BlockManagerDecommissioner
    holdenk committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    c3f8658 View commit details
    Browse the repository at this point in the history
  31. Configuration menu
    Copy the full SHA
    206a3c3 View commit details
    Browse the repository at this point in the history
  32. Configuration menu
    Copy the full SHA
    f45a89c View commit details
    Browse the repository at this point in the history
  33. Add a case class for the shuffle block id + map id so we're more clea…

    …r on what were passing around than just rabndom tuples
    holdenk committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    4d7b5b8 View commit details
    Browse the repository at this point in the history
  34. Fix the unit test

    holdenk committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    d23cbbf View commit details
    Browse the repository at this point in the history
  35. Code review feedback: add test for rejecting blocks with bad shuffle …

    …resolver, fix string interopolation, remove un-needed retry logic in decommissioning, etc.
    holdenk committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    e573116 View commit details
    Browse the repository at this point in the history
  36. Configuration menu
    Copy the full SHA
    9503ca5 View commit details
    Browse the repository at this point in the history
  37. Used a (configurable) thread pool for shuffle migrations to allow use…

    …rs to limit the number of concurrent migrations. We still limit to one runnable per peer executor so we don't overwhelm any particular target even at large values of the thread pool.
    holdenk committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    968418e View commit details
    Browse the repository at this point in the history
  38. Configuration menu
    Copy the full SHA
    2d61c41 View commit details
    Browse the repository at this point in the history
  39. Re-enable rest of k8s tests

    holdenk committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    ac096f4 View commit details
    Browse the repository at this point in the history

Commits on Jun 29, 2020

  1. Configuration menu
    Copy the full SHA
    ebec1a7 View commit details
    Browse the repository at this point in the history
  2. CR feedback, make the config names more consistent, change the defaul…

    …t number of threads to match the other similar config, add some comments, make isShuffle treat internalShuffle's the same simplifying the code some, seperate the shuffle block refresh and rdd block threads, rever un-needed change to SparkContext, and other misc cleanups
    holdenk committed Jun 29, 2020
    Configuration menu
    Copy the full SHA
    d97c6ee View commit details
    Browse the repository at this point in the history
  3. Minor comment cleanup

    holdenk committed Jun 29, 2020
    Configuration menu
    Copy the full SHA
    56a9903 View commit details
    Browse the repository at this point in the history

Commits on Jul 14, 2020

  1. Merge branch 'master' into SPARK-20629-copy-shuffle-data-when-nodes-a…

    …re-being-shutdown-cleaned-up
    holdenk committed Jul 14, 2020
    Configuration menu
    Copy the full SHA
    5a0cd2a View commit details
    Browse the repository at this point in the history
  2. Merge branch 'master' into SPARK-20629-copy-shuffle-data-when-nodes-a…

    …re-being-shutdown-cleaned-up
    holdenk committed Jul 14, 2020
    Configuration menu
    Copy the full SHA
    546d953 View commit details
    Browse the repository at this point in the history

Commits on Jul 15, 2020

  1. Merge branch 'master' into SPARK-20629-copy-shuffle-data-when-nodes-a…

    …re-being-shutdown-cleaned-up
    holdenk committed Jul 15, 2020
    Configuration menu
    Copy the full SHA
    b63808a View commit details
    Browse the repository at this point in the history
  2. Executor start time variance made this test a bit flaky. Waiting for …

    …all the executors to come up first only slows the entire suite down by ~4s and should remove that flake
    holdenk committed Jul 15, 2020
    Configuration menu
    Copy the full SHA
    fe5ba7b View commit details
    Browse the repository at this point in the history
  3. Nits

    holdenk committed Jul 15, 2020
    Configuration menu
    Copy the full SHA
    eb43f20 View commit details
    Browse the repository at this point in the history

Commits on Jul 16, 2020

  1. Code review feedback, remove regex we don't need, reduce some log lev…

    …els, add a Since annotaiton.
    holdenk committed Jul 16, 2020
    Configuration menu
    Copy the full SHA
    9d210f5 View commit details
    Browse the repository at this point in the history

Commits on Jul 17, 2020

  1. Now that we delegate this to the existing blockmanager logic we don't…

    … need an explicit test for it here and the block manager is stubbed out in IndexShuffleBlockResolverSuite
    holdenk committed Jul 17, 2020
    Configuration menu
    Copy the full SHA
    2467732 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'master' into SPARK-20629-copy-shuffle-data-when-nodes-a…

    …re-being-shutdown-cleaned-up
    holdenk committed Jul 17, 2020
    Configuration menu
    Copy the full SHA
    16b7376 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    8494bdd View commit details
    Browse the repository at this point in the history