-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-9202] capping maximum number of executor&driver information kept in Worker #7714
Conversation
Test build #38649 has finished for PR 7714 at commit
|
In principle, doesn't the Master also have similar problems with retained applications? |
|
||
private def trimFinishedExecutorsIfNecessary(): Unit = { | ||
if (finishedExecutors.size > retainedExecutors) { | ||
finishedExecutors.take(math.max(finishedExecutors.size / 10, 1)).foreach{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor style nit: space after foreach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to work on this same issue until finding this PR already posted. I had one observation that I wanted to bring up for discussion. scala.collection.concurrent.HashMap
does not preserve insertion ordering when you perform operations on it for iterator or traversal methods. So take
, for example may be able to remove recent additions where the user might prefer to just lose the oldest executors from the list. LinkedHashMap does provide the guarantee on insertion ordering being preserved for its operations. It comes at the cost of more memory overhead to provide this guarantee, but it may be worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I agree with you
The basic approach looks okay to me, so this is on the right track. Thanks for choosing to work on this! |
@JoshRosen , I just updated the patch addressing your comments and added test cases and docs...for Master, I think we have something capping the memory footprint there, e.g. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L779 and https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L930 |
Test build #38750 has finished for PR 7714 at commit
|
Test build #38758 has finished for PR 7714 at commit
|
Test build #38769 has finished for PR 7714 at commit
|
Test build #38788 has finished for PR 7714 at commit
|
I think that this is an important feature to get in for 1.5.0 but my review bandwidth is a little limited right now during the release crunch mode. Therefore, I'm going to try to ping a bunch of other folks to see if any of them have spare cycles to help review. @zsxwing @srowen @sarutak, do any of you have time to take an initial pass on this PR? I'll take a look tomorrow but just wanted to see if I could get some additional eyes on this while I'm asleep :) |
} | ||
} | ||
|
||
private[worker] def handleDriverStateChanged(driverStateChanged: DriverStateChanged): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, this and handleExecutorStateChanged
are just the result of moving code (and adding the call to trim), and there aren't other changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, just encapsulate it for easy test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice change
@JoshRosen Sure, I'll take a look. Have a good sleep :) |
|
||
private def trimFinishedDriversIfNecessary(): Unit = { | ||
if (finishedDrivers.size > retainedDrivers) { | ||
finishedDrivers.take(math.max(finishedDrivers.size / 10, 1)).foreach { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed finishedExecutors
and finishedDrivers
are never accessed by keys. They are used like finishedDrivers.values
.
So, can we use ListBuffer
for finishedExecutors
and finishedDrivers
and remove elements by finishedDrivers.trimStart(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can do that, but at the cost when call .remove(execId)
, is it worth doing that?
Test build #38841 has finished for PR 7714 at commit
|
eb0f66e
to
1b51a37
Compare
Test build #38933 has finished for PR 7714 at commit
|
some flaky tests.... |
Test build #38944 has finished for PR 7714 at commit
|
Test build #39041 has finished for PR 7714 at commit
|
finally....@srowen, @JoshRosen, @sarutak more comments? |
It's looking good to me. Let's leave it open another day or two for comments. |
@JoshRosen @sarutak did you want to look again? otherwise I think this can go in today. I know there's a huge amount of traffic at the moment so wanted to check again |
Yeah, I think this is ready to merge. |
@JoshRosen @sarutak @srowen thanks! |
https://issues.apache.org/jira/browse/SPARK-9202