-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-7729:Executor which has been killed should also be displayed on… #6263
Conversation
…layed on Executors Tab." This reverts commit 7a82254.
It looks like this PR and #6644 duplicate / overlap with each other. |
Hi, @archit279thakur would you mind add the logic about adding a time expire to show lost-Executor log? |
Sure, and time for expiration should be configuration based? |
yean, make it configurable looks good |
@suyanNone Can you please review my 2nd commit. |
val localtestconf = new SparkConf().set(StorageStatusListener.TIME_TO_EXPIRE_KILLED_EXECUTOR,"5s") | ||
val listener = new StorageStatusListener(localtestconf) | ||
listener.removedExecutorIdToStorageStatus.put("1", new StorageStatus(null, 50)) | ||
Thread.sleep(5500) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can avoid sleeping by using cache.setTicker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For that we'll have to set an arbitrary ticker to the main cache, We would not want to set any arbitrary ticker to the original cache. Creating a new cache in the test would not be testing of our functionality and would be equivalent of testing just the Guava Cache's code. right? Please correct, if wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can do something in between -- StorageStatusListener can have a private[storage]
constructor which takes the ticker, and the public one just defaults it to the system ticker. Yes, you would not be testing exactly the same behavior, but it tests the important parts.
Sleeping isn't the worst thing in this case -- often it leads to flaky tests, though I don't think that would be the case here. Still, 5 seconds is awfully long for this test when it should take a tiny fraction of that, and it adds up over all the tests.
@squito Can you please review it again. |
@@ -17,26 +17,55 @@ | |||
|
|||
package org.apache.spark.storage | |||
|
|||
import java.util.concurrent.TimeUnit | |||
|
|||
import scala.collection.JavaConversions.collectionAsScalaIterable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid using JavaConversions, you should prefer JavaConverters, which forces you to call .asScala
, making the transformation much clearer to future code readers. the convention is to import scala.collection.JavaConverters._
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
@squito Thanks for your comments. Incorporated them all and also gone through the link. |
val listener = new StorageStatusListener(localtestconf, ticker) | ||
listener.removedExecutorIdToStorageStatus.put("1", new StorageStatus(null, 50)) | ||
ticker.advance(5, TimeUnit.SECONDS) | ||
assert(listener.removedExecutorIdToStorageStatus.asMap.get("1") == null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is more complicated than it needs to be -- no need for an atomic (there is only one thread here) you can just use a long. also I'd check the removedExecutorStorageStatusList
method, rather than the cache itself.
class MyTicker extends Ticker {
var t = 0L
override def read(): Long = t
}
val ticker = new MyTicker
val listener = new StorageStatusListener(localtestconf, ticker)
listener.removedExecutorIdToStorageStatus.put("1", new StorageStatus(null, 50))
assert(listener.removedExecutorStorageStatusList.nonEmpty)
ticker.t = 5000000001L
assert(listener.removedExecutorStorageStatusList.isEmpty)
Jenkins, ok to test |
@archit279thakur can you also bring this up to date with master, and include before & after screenshots? @suyanNone can you take another look as well? |
Test build #45141 has finished for PR 6263 at commit
|
@@ -150,4 +157,21 @@ class StorageStatusListenerSuite extends FunSuite { | |||
listener.onUnpersistRDD(SparkListenerUnpersistRDD(1)) | |||
assert(listener.executorIdToStorageStatus("big").numBlocks === 0) | |||
} | |||
|
|||
test("Killed Executor Entry removed after configurable time") { | |||
val localtestconf = new SparkConf().set(StorageStatusListener.TIME_TO_EXPIRE_KILLED_EXECUTOR,"5s") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: line too long
…into SPARK-7729 Conflicts: core/src/main/scala/org/apache/spark/ui/SparkUI.scala core/src/main/scala/org/apache/spark/ui/exec/ExecutorsPage.scala core/src/test/scala/org/apache/spark/storage/StorageStatusListenerSuite.scala core/src/test/scala/org/apache/spark/ui/storage/StorageTabSuite.scala
Test build #46414 has finished for PR 6263 at commit
|
@squito Two things:
|
Test build #46416 has finished for PR 6263 at commit
|
Test build #46417 has finished for PR 6263 at commit
|
Test build #46418 has finished for PR 6263 at commit
|
Test build #46422 has finished for PR 6263 at commit
|
Test build #46423 has finished for PR 6263 at commit
|
Hi @archit279thakur, good questions about what to do with the time it was killed. The reason I wanted it included is so the user could put it together with the timeline, to see what stages were running when the executor was killed, to help them debug why the executor was removed. I dont' have strong opinions about where it should go in the UI -- I'm willing to believe that it will just lead to too much clutter. @CodingCat , any thoughts? But in either case, it would still be nice to have in the json endpoint. btw, the mima failure is from this:
you can add that line to |
|
||
@DeveloperApi | ||
object StorageStatusListener { | ||
val TIME_TO_EXPIRE_KILLED_EXECUTOR = "spark.ui.timeToExpireKilledExecutor" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need this class? how about just exposing this string to the end user?
@archit279thakur , would you mind just uploading some screenshot, so that we have more sense on the current page structure? |
In the current version of patch, we use expiration time to prevent too many dead executors from appearing on the UI. It brings inconvenient overhead which makes the UI component to have a dependency on Guava...Additionally, there are cases that the executors are failed and restarted time and time again within a very short period (I met this in some of my applications when I introduced some bug, cannot remember what exactly happened) I'm considering that we might be able to just cap the maximum number of rows in the table, like what we do in many other places (master/worker UI, etc.). Even we stick to the expiration time, TimeStampedHashMap might be a cleaner solution? |
@squito Regarding the page structure, do not trust my sense of aesthetic, :-) Personally, I prefer to separate the page into two sections, one for alive executors, one for dead ones, or you provide the capability to the user to sort the entries with status |
@archit279thakur @CodingCat @squito I have created new PR #10058. Can you take a look at it?Thanks. |
I'm going to close this pull request. If this is still relevant and you are interested in pushing it forward, please open a new pull request. Thanks! |
… Executors Tab.