Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-9202] capping maximum number of executor&driver information kept in Worker #7714

Closed
wants to merge 10 commits into from

Conversation

CodingCat
Copy link
Contributor

@SparkQA
Copy link

SparkQA commented Jul 28, 2015

Test build #38649 has finished for PR 7714 at commit 55c4de4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

In principle, doesn't the Master also have similar problems with retained applications?


private def trimFinishedExecutorsIfNecessary(): Unit = {
if (finishedExecutors.size > retainedExecutors) {
finishedExecutors.take(math.max(finishedExecutors.size / 10, 1)).foreach{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style nit: space after foreach

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to work on this same issue until finding this PR already posted. I had one observation that I wanted to bring up for discussion. scala.collection.concurrent.HashMap does not preserve insertion ordering when you perform operations on it for iterator or traversal methods. So take, for example may be able to remove recent additions where the user might prefer to just lose the oldest executors from the list. LinkedHashMap does provide the guarantee on insertion ordering being preserved for its operations. It comes at the cost of more memory overhead to provide this guarantee, but it may be worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I agree with you

@JoshRosen
Copy link
Contributor

The basic approach looks okay to me, so this is on the right track. Thanks for choosing to work on this!

@CodingCat
Copy link
Contributor Author

@JoshRosen , I just updated the patch addressing your comments and added test cases and docs...for Master, I think we have something capping the memory footprint there, e.g. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L779 and https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L930

@SparkQA
Copy link

SparkQA commented Jul 28, 2015

Test build #38750 has finished for PR 7714 at commit 8d0729e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@CodingCat CodingCat changed the title [WIP][SPARK-9202] capping maximum number of executor&driver information kept in Worker [SPARK-9202] capping maximum number of executor&driver information kept in Worker Jul 28, 2015
@SparkQA
Copy link

SparkQA commented Jul 28, 2015

Test build #38758 has finished for PR 7714 at commit fb3ebb7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 29, 2015

Test build #38769 has finished for PR 7714 at commit 8142028.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 29, 2015

Test build #38788 has finished for PR 7714 at commit 000578d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

I think that this is an important feature to get in for 1.5.0 but my review bandwidth is a little limited right now during the release crunch mode. Therefore, I'm going to try to ping a bunch of other folks to see if any of them have spare cycles to help review. @zsxwing @srowen @sarutak, do any of you have time to take an initial pass on this PR? I'll take a look tomorrow but just wanted to see if I could get some additional eyes on this while I'm asleep :)

}
}

private[worker] def handleDriverStateChanged(driverStateChanged: DriverStateChanged): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, this and handleExecutorStateChanged are just the result of moving code (and adding the call to trim), and there aren't other changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, just encapsulate it for easy test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice change

@sarutak
Copy link
Member

sarutak commented Jul 29, 2015

@JoshRosen Sure, I'll take a look. Have a good sleep :)


private def trimFinishedDriversIfNecessary(): Unit = {
if (finishedDrivers.size > retainedDrivers) {
finishedDrivers.take(math.max(finishedDrivers.size / 10, 1)).foreach {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed finishedExecutors and finishedDrivers are never accessed by keys. They are used like finishedDrivers.values.
So, can we use ListBuffer for finishedExecutors and finishedDrivers and remove elements by finishedDrivers.trimStart(...)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can do that, but at the cost when call .remove(execId), is it worth doing that?

@SparkQA
Copy link

SparkQA commented Jul 29, 2015

Test build #38841 has finished for PR 7714 at commit 54249ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@CodingCat CodingCat force-pushed the SPARK-9202 branch 2 times, most recently from eb0f66e to 1b51a37 Compare July 30, 2015 01:45
@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #38933 has finished for PR 7714 at commit eb0f66e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@CodingCat
Copy link
Contributor Author

some flaky tests....

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #38944 has finished for PR 7714 at commit 1b51a37.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #39041 has finished for PR 7714 at commit 23977fb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@CodingCat
Copy link
Contributor Author

finally....@srowen, @JoshRosen, @sarutak more comments?

@srowen
Copy link
Member

srowen commented Jul 30, 2015

It's looking good to me. Let's leave it open another day or two for comments.

@srowen
Copy link
Member

srowen commented Jul 31, 2015

@JoshRosen @sarutak did you want to look again? otherwise I think this can go in today. I know there's a huge amount of traffic at the moment so wanted to check again

@sarutak
Copy link
Member

sarutak commented Jul 31, 2015

Yeah, I think this is ready to merge.

@asfgit asfgit closed this in c068666 Jul 31, 2015
@CodingCat
Copy link
Contributor Author

@JoshRosen @sarutak @srowen thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants