Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][SPARK-20629][CORE] Copy shuffle data when nodes are being shutdown #28331

Commits on Apr 24, 2020

  1. Configuration menu
    Copy the full SHA
    4126c1b View commit details
    Browse the repository at this point in the history
  2. Style fixes

    holdenk committed Apr 24, 2020
    Configuration menu
    Copy the full SHA
    8ee8949 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    afb1b1a View commit details
    Browse the repository at this point in the history
  4. Python style fix

    holdenk committed Apr 24, 2020
    Configuration menu
    Copy the full SHA
    4071ae2 View commit details
    Browse the repository at this point in the history

Commits on May 1, 2020

  1. Configuration menu
    Copy the full SHA
    ff620ba View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    adb03db View commit details
    Browse the repository at this point in the history

Commits on May 2, 2020

  1. Try and update the tests some more, switch migration to not make a ne…

    …w forkjoinpool on each iteration
    holdenk committed May 2, 2020
    Configuration menu
    Copy the full SHA
    be2a5e7 View commit details
    Browse the repository at this point in the history

Commits on May 26, 2020

  1. Code cleanups (swap some maps for foreach where we didn't need the re…

    …sults & fix some potential test flakes) as suggested by @attilapiros during review (thanks)
    holdenk committed May 26, 2020
    Configuration menu
    Copy the full SHA
    783114b View commit details
    Browse the repository at this point in the history
  2. Merge branch 'master' into SPARK-20629-copy-shuffle-data-when-nodes-a…

    …re-being-shutdown-flat
    holdenk committed May 26, 2020
    Configuration menu
    Copy the full SHA
    dbe2418 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a240f98 View commit details
    Browse the repository at this point in the history

Commits on May 28, 2020

  1. Use NOOP_REDUCE_ID

    holdenk committed May 28, 2020
    Configuration menu
    Copy the full SHA
    ef8fcc5 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    838a346 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e85c8ef View commit details
    Browse the repository at this point in the history
  4. Make a MigratableResolver interface so custom shuffle implementations…

    … can experiment with this.
    holdenk committed May 28, 2020
    Configuration menu
    Copy the full SHA
    2da0f2d View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    9d31746 View commit details
    Browse the repository at this point in the history
  6. Use block updates to make sure our desired blocks are being moved & a…

    …lso remove a thread sleep
    holdenk committed May 28, 2020
    Configuration menu
    Copy the full SHA
    38ff8be View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    13ec43a View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    a92025c View commit details
    Browse the repository at this point in the history
  9. Tag new APIs

    holdenk committed May 28, 2020
    Configuration menu
    Copy the full SHA
    fe265d7 View commit details
    Browse the repository at this point in the history

Commits on May 29, 2020

  1. Increase the number of execs and decrease the thread sleep time while…

    … increasing the max allowed job time to try and avoid flakiness.
    holdenk committed May 29, 2020
    Configuration menu
    Copy the full SHA
    70c3871 View commit details
    Browse the repository at this point in the history
  2. Update core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlock…

    …Resolver.scala
    
    
    add more information on unexpected block.
    
    Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com>
    holdenk and attilapiros committed May 29, 2020
    Configuration menu
    Copy the full SHA
    069dd3b View commit details
    Browse the repository at this point in the history
  3. Fix the migration to store ShuffleDataBlockId, check that data and in…

    …dex blocks have both been migrated, check that RDD blocks are duplicated not just broadcast blocks, make the number of partitions smaller so the test can run faster, avoid the Thread.sleep for all of the tests except for the midflight test where we need it, check for the broadcast blocks landing (further along in scheduling) beyond just task start, force fetching the shuffle block to local disk if in shuffle block test mode, start the job as soon as the first executor comes online.
    holdenk committed May 29, 2020
    Configuration menu
    Copy the full SHA
    6340f9b View commit details
    Browse the repository at this point in the history

Commits on Jun 1, 2020

  1. Merge branch 'master' into SPARK-20629-copy-shuffle-data-when-nodes-a…

    …re-being-shutdown-flat
    holdenk committed Jun 1, 2020
    Configuration menu
    Copy the full SHA
    4cb0458 View commit details
    Browse the repository at this point in the history
  2. We don't need the to set operations, also sleepyRdd isn't always slee…

    …py so lets call it baseRdd, and test both small blocksize to mem and not
    holdenk committed Jun 1, 2020
    Configuration menu
    Copy the full SHA
    4cfeb8e View commit details
    Browse the repository at this point in the history
  3. Use the remoteBlockSize param in the tests instead of conditioning on…

    … if we're testing shuffles or not
    holdenk committed Jun 1, 2020
    Configuration menu
    Copy the full SHA
    e81aa5a View commit details
    Browse the repository at this point in the history
  4. Add a part of the test where we kill the original exec and recount. N…

    …ote: this fails in forced migrate to disk
    holdenk committed Jun 1, 2020
    Configuration menu
    Copy the full SHA
    841d443 View commit details
    Browse the repository at this point in the history
  5. Add a part of the test where we kill the original exec and recount. N…

    …ote: this fails in forced migrate to disk + some logging
    holdenk committed Jun 1, 2020
    Configuration menu
    Copy the full SHA
    ba20ec0 View commit details
    Browse the repository at this point in the history
  6. Fix the map output update logic which was getting tramped on (also th…

    …e test now passes so yay), re-enable the other tests I disabled while debugging. Add a bit more logging.
    holdenk committed Jun 1, 2020
    Configuration menu
    Copy the full SHA
    17a6a3f View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    155aeb2 View commit details
    Browse the repository at this point in the history

Commits on Jun 2, 2020

  1. Configuration menu
    Copy the full SHA
    7f93df6 View commit details
    Browse the repository at this point in the history
  2. Return faster with shuffle blocks since we don't need the rest of the…

    … logic in update block :)
    holdenk committed Jun 2, 2020
    Configuration menu
    Copy the full SHA
    7e32341 View commit details
    Browse the repository at this point in the history
  3. Small cleanups

    holdenk committed Jun 2, 2020
    Configuration menu
    Copy the full SHA
    a3aa8eb View commit details
    Browse the repository at this point in the history