[SPARK-9202] capping maximum number of executor&driver information kept in Worker #7714

CodingCat · 2015-07-28T03:27:13Z

https://issues.apache.org/jira/browse/SPARK-9202

SparkQA · 2015-07-28T05:24:21Z

Test build #38649 has finished for PR 7714 at commit 55c4de4.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2015-07-28T06:36:47Z

In principle, doesn't the Master also have similar problems with retained applications?

JoshRosen · 2015-07-28T06:37:34Z

core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala

+
+  private def trimFinishedExecutorsIfNecessary(): Unit = {
+    if (finishedExecutors.size > retainedExecutors) {
+      finishedExecutors.take(math.max(finishedExecutors.size / 10, 1)).foreach{


Minor style nit: space after foreach

I was about to work on this same issue until finding this PR already posted. I had one observation that I wanted to bring up for discussion. scala.collection.concurrent.HashMap does not preserve insertion ordering when you perform operations on it for iterator or traversal methods. So take, for example may be able to remove recent additions where the user might prefer to just lose the oldest executors from the list. LinkedHashMap does provide the guarantee on insertion ordering being preserved for its operations. It comes at the cost of more memory overhead to provide this guarantee, but it may be worth it.

Good point, I agree with you

JoshRosen · 2015-07-28T06:38:31Z

The basic approach looks okay to me, so this is on the right track. Thanks for choosing to work on this!

CodingCat · 2015-07-28T19:08:20Z

@JoshRosen , I just updated the patch addressing your comments and added test cases and docs...for Master, I think we have something capping the memory footprint there, e.g. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L779 and https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L930

SparkQA · 2015-07-28T19:09:36Z

Test build #38750 has finished for PR 7714 at commit 8d0729e.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-07-28T21:51:55Z

Test build #38758 has finished for PR 7714 at commit fb3ebb7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-07-29T00:13:24Z

Test build #38769 has finished for PR 7714 at commit 8142028.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-07-29T03:21:38Z

Test build #38788 has finished for PR 7714 at commit 000578d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2015-07-29T07:01:10Z

I think that this is an important feature to get in for 1.5.0 but my review bandwidth is a little limited right now during the release crunch mode. Therefore, I'm going to try to ping a bunch of other folks to see if any of them have spare cycles to help review. @zsxwing @srowen @sarutak, do any of you have time to take an initial pass on this PR? I'll take a look tomorrow but just wanted to see if I could get some additional eyes on this while I'm asleep :)

srowen · 2015-07-29T07:06:58Z

core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala

+    }
+  }
+
+  private[worker] def handleDriverStateChanged(driverStateChanged: DriverStateChanged): Unit = {


Just to clarify, this and handleExecutorStateChanged are just the result of moving code (and adding the call to trim), and there aren't other changes?

yes, just encapsulate it for easy test

This is a nice change

sarutak · 2015-07-29T09:02:14Z

@JoshRosen Sure, I'll take a look. Have a good sleep :)

sarutak · 2015-07-29T09:06:19Z

core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala

+
+  private def trimFinishedDriversIfNecessary(): Unit = {
+    if (finishedDrivers.size > retainedDrivers) {
+      finishedDrivers.take(math.max(finishedDrivers.size / 10, 1)).foreach {


I noticed finishedExecutors and finishedDrivers are never accessed by keys. They are used like finishedDrivers.values.
So, can we use ListBuffer for finishedExecutors and finishedDrivers and remove elements by finishedDrivers.trimStart(...)?

we can do that, but at the cost when call .remove(execId), is it worth doing that?

SparkQA · 2015-07-29T18:23:36Z

Test build #38841 has finished for PR 7714 at commit 54249ac.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-07-30T02:48:49Z

Test build #38933 has finished for PR 7714 at commit eb0f66e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

CodingCat · 2015-07-30T02:51:15Z

some flaky tests....

SparkQA · 2015-07-30T03:43:20Z

Test build #38944 has finished for PR 7714 at commit 1b51a37.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

…hedDrivers

SparkQA · 2015-07-30T12:58:31Z

Test build #39041 has finished for PR 7714 at commit 23977fb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

CodingCat · 2015-07-30T13:00:55Z

finally....@srowen, @JoshRosen, @sarutak more comments?

srowen · 2015-07-30T15:44:10Z

It's looking good to me. Let's leave it open another day or two for comments.

srowen · 2015-07-31T18:32:57Z

@JoshRosen @sarutak did you want to look again? otherwise I think this can go in today. I know there's a huge amount of traffic at the moment so wanted to check again

sarutak · 2015-07-31T18:40:23Z

Yeah, I think this is ready to merge.

CodingCat · 2015-07-31T19:29:46Z

@JoshRosen @sarutak @srowen thanks!

JoshRosen reviewed Jul 28, 2015
View reviewed changes

CodingCat changed the title ~~[WIP][SPARK-9202] capping maximum number of executor&driver information kept in Worker~~ [SPARK-9202] capping maximum number of executor&driver information kept in Worker Jul 28, 2015

srowen reviewed Jul 29, 2015
View reviewed changes

sarutak reviewed Jul 29, 2015
View reviewed changes

CodingCat force-pushed the SPARK-9202 branch 2 times, most recently from eb0f66e to 1b51a37 Compare July 30, 2015 01:45

CodingCat force-pushed the SPARK-9202 branch from 1b51a37 to 23977fb Compare July 30, 2015 10:50

CodingCat added 4 commits July 30, 2015 06:51

trimFinishedExecutorsAndDrivers

ad87ed7

application is fine...

9cac751

applications are fine

c557b3a

test cases and docs

c3b5361

CodingCat added 6 commits July 30, 2015 06:51

add license info & stylistic fix

031755f

styistic fix and respect insert ordering

d7d9485

fix JsonProtocolSuite

80bfe52

stylistic fix

e125241

addressing the comments

dc9772d

add comments about why we don't synchronize finishedExecutors & finis…

23977fb

…hedDrivers

asfgit closed this in c068666 Jul 31, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-9202] capping maximum number of executor&driver information kept in Worker #7714

[SPARK-9202] capping maximum number of executor&driver information kept in Worker #7714

CodingCat commented Jul 28, 2015

SparkQA commented Jul 28, 2015

JoshRosen commented Jul 28, 2015

JoshRosen Jul 28, 2015

rmarsch Jul 28, 2015

CodingCat Jul 28, 2015

JoshRosen commented Jul 28, 2015

CodingCat commented Jul 28, 2015

SparkQA commented Jul 28, 2015

SparkQA commented Jul 28, 2015

SparkQA commented Jul 29, 2015

SparkQA commented Jul 29, 2015

JoshRosen commented Jul 29, 2015

srowen Jul 29, 2015

CodingCat Jul 29, 2015

JoshRosen Jul 29, 2015

sarutak commented Jul 29, 2015

sarutak Jul 29, 2015

CodingCat Jul 29, 2015

SparkQA commented Jul 29, 2015

SparkQA commented Jul 30, 2015

CodingCat commented Jul 30, 2015

SparkQA commented Jul 30, 2015

SparkQA commented Jul 30, 2015

CodingCat commented Jul 30, 2015

srowen commented Jul 30, 2015

srowen commented Jul 31, 2015

sarutak commented Jul 31, 2015

CodingCat commented Jul 31, 2015

[SPARK-9202] capping maximum number of executor&driver information kept in Worker #7714

[SPARK-9202] capping maximum number of executor&driver information kept in Worker #7714

Conversation

CodingCat commented Jul 28, 2015

SparkQA commented Jul 28, 2015

JoshRosen commented Jul 28, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoshRosen commented Jul 28, 2015

CodingCat commented Jul 28, 2015

SparkQA commented Jul 28, 2015

SparkQA commented Jul 28, 2015

SparkQA commented Jul 29, 2015

SparkQA commented Jul 29, 2015

JoshRosen commented Jul 29, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sarutak commented Jul 29, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jul 29, 2015

SparkQA commented Jul 30, 2015

CodingCat commented Jul 30, 2015

SparkQA commented Jul 30, 2015

SparkQA commented Jul 30, 2015

CodingCat commented Jul 30, 2015

srowen commented Jul 30, 2015

srowen commented Jul 31, 2015

sarutak commented Jul 31, 2015

CodingCat commented Jul 31, 2015