SPARK-7729:Executor which has been killed should also be displayed on… #6263

archit279thakur · 2015-05-19T12:03:18Z

… Executors Tab.

…layed on Executors Tab." This reverts commit 7a82254.

JoshRosen · 2015-10-16T20:07:16Z

It looks like this PR and #6644 duplicate / overlap with each other.

suyanNone · 2015-10-26T03:15:35Z

Hi, @archit279thakur would you mind add the logic about adding a time expire to show lost-Executor log?

archit279thakur · 2015-10-27T17:15:54Z

Sure, and time for expiration should be configuration based?

suyanNone · 2015-11-03T02:50:10Z

yean, make it configurable looks good

…n Executors Tab.

archit279thakur · 2015-11-04T10:29:54Z

@suyanNone Can you please review my 2nd commit.

squito · 2015-11-04T17:17:27Z

core/src/test/scala/org/apache/spark/storage/StorageStatusListenerSuite.scala

+    val localtestconf = new SparkConf().set(StorageStatusListener.TIME_TO_EXPIRE_KILLED_EXECUTOR,"5s")
+    val listener = new StorageStatusListener(localtestconf)
+    listener.removedExecutorIdToStorageStatus.put("1", new StorageStatus(null, 50))
+    Thread.sleep(5500)


you can avoid sleeping by using cache.setTicker

For that we'll have to set an arbitrary ticker to the main cache, We would not want to set any arbitrary ticker to the original cache. Creating a new cache in the test would not be testing of our functionality and would be equivalent of testing just the Guava Cache's code. right? Please correct, if wrong.

I think you can do something in between -- StorageStatusListener can have a private[storage] constructor which takes the ticker, and the public one just defaults it to the system ticker. Yes, you would not be testing exactly the same behavior, but it tests the important parts.

Sleeping isn't the worst thing in this case -- often it leads to flaky tests, though I don't think that would be the case here. Still, 5 seconds is awfully long for this test when it should take a tiny fraction of that, and it adds up over all the tests.

archit279thakur · 2015-11-05T19:07:59Z

@squito Can you please review it again.

squito · 2015-11-05T19:15:40Z

core/src/main/scala/org/apache/spark/storage/StorageStatusListener.scala

@@ -17,26 +17,55 @@

 package org.apache.spark.storage

+import java.util.concurrent.TimeUnit
+
+import scala.collection.JavaConversions.collectionAsScalaIterable


avoid using JavaConversions, you should prefer JavaConverters, which forces you to call .asScala, making the transformation much clearer to future code readers. the convention is to import scala.collection.JavaConverters._

archit279thakur · 2015-11-05T19:37:41Z

@squito Thanks for your comments. Incorporated them all and also gone through the link.
Please point out if I missed anything.

squito · 2015-11-05T20:19:06Z

core/src/test/scala/org/apache/spark/storage/StorageStatusListenerSuite.scala

+    val listener = new StorageStatusListener(localtestconf, ticker)
+    listener.removedExecutorIdToStorageStatus.put("1", new StorageStatus(null, 50))
+    ticker.advance(5, TimeUnit.SECONDS)
+    assert(listener.removedExecutorIdToStorageStatus.asMap.get("1") == null)


this is more complicated than it needs to be -- no need for an atomic (there is only one thread here) you can just use a long. also I'd check the removedExecutorStorageStatusList method, rather than the cache itself.

class MyTicker extends Ticker { var t = 0L override def read(): Long = t } val ticker = new MyTicker val listener = new StorageStatusListener(localtestconf, ticker) listener.removedExecutorIdToStorageStatus.put("1", new StorageStatus(null, 50)) assert(listener.removedExecutorStorageStatusList.nonEmpty) ticker.t = 5000000001L assert(listener.removedExecutorStorageStatusList.isEmpty)

squito · 2015-11-05T20:20:45Z

Jenkins, ok to test

squito · 2015-11-05T20:23:12Z

@archit279thakur can you also bring this up to date with master, and include before & after screenshots?
I'd like for this to also update the json endpoints. Finally, I think that as long we're storing removed executors, we should store the time they were removed.

@suyanNone can you take another look as well?

SparkQA · 2015-11-05T20:25:04Z

Test build #45141 has finished for PR 6263 at commit 1fdffc5.

This patch fails to build.
This patch does not merge cleanly.
This patch adds no public classes.

squito · 2015-11-05T20:26:17Z

core/src/test/scala/org/apache/spark/storage/StorageStatusListenerSuite.scala

@@ -150,4 +157,21 @@ class StorageStatusListenerSuite extends FunSuite {
    listener.onUnpersistRDD(SparkListenerUnpersistRDD(1))
    assert(listener.executorIdToStorageStatus("big").numBlocks === 0)
  }
+
+  test("Killed Executor Entry removed after configurable time") {
+    val localtestconf = new SparkConf().set(StorageStatusListener.TIME_TO_EXPIRE_KILLED_EXECUTOR,"5s")


nit: line too long

…into SPARK-7729 Conflicts: core/src/main/scala/org/apache/spark/ui/SparkUI.scala core/src/main/scala/org/apache/spark/ui/exec/ExecutorsPage.scala core/src/test/scala/org/apache/spark/storage/StorageStatusListenerSuite.scala core/src/test/scala/org/apache/spark/ui/storage/StorageTabSuite.scala

SparkQA · 2015-11-20T10:54:55Z

Test build #46414 has finished for PR 6263 at commit 3e23321.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

archit279thakur · 2015-11-20T11:22:26Z

@squito
In reply to:
@archit279thakur can you also bring this up to date with master, and include before & after screenshots? I'd like for this to also update the json endpoints. Finally, I think that as long we're storing removed executors, we should store the time they were removed.

Two things:

For that, I'll have to add a new column in the execTable on the UI. Should it be lastStatusChangedTime (with aliveTime or killedTime, depending on the status) or KilledTime (with blank values for the Alive executors) ?
This value would always be greater than the value currentTime - spark.ui.timeToExpireKilledExecutor. I am not really sure, we provide any useful insights by showing the time at which executor died.

SparkQA · 2015-11-20T11:38:15Z

Test build #46416 has finished for PR 6263 at commit b827f8f.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-20T12:08:36Z

Test build #46417 has finished for PR 6263 at commit 33fc892.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-20T12:31:00Z

Test build #46418 has finished for PR 6263 at commit 2cf4f71.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-20T15:07:55Z

Test build #46422 has finished for PR 6263 at commit e1577dc.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-20T15:37:32Z

Test build #46423 has finished for PR 6263 at commit 826587f.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2015-11-24T21:24:28Z

Hi @archit279thakur,

good questions about what to do with the time it was killed. The reason I wanted it included is so the user could put it together with the timeline, to see what stages were running when the executor was killed, to help them debug why the executor was removed. I dont' have strong opinions about where it should go in the UI -- I'm willing to believe that it will just lead to too much clutter. @CodingCat , any thoughts? But in either case, it would still be nice to have in the json endpoint.

btw, the mima failure is from this:

[error]  * method this(java.lang.String,java.lang.String,Int,Long,Long,Int,Int,Int,Int,Long,Long,Long,Long,Long,scala.collection.Map)Unit in class org.apache.spark.status.api.v1.ExecutorSummary does not have a correspondent in new version
[error]    filter with: ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.status.api.v1.ExecutorSummary.this")

you can add that line to project/MimaExcludes.scala, that constructor is private so this a false positive.

CodingCat · 2015-11-25T15:01:12Z

core/src/main/scala/org/apache/spark/storage/StorageStatusListener.scala

+
+@DeveloperApi
+object StorageStatusListener {
+  val TIME_TO_EXPIRE_KILLED_EXECUTOR = "spark.ui.timeToExpireKilledExecutor"


do we really need this class? how about just exposing this string to the end user?

CodingCat · 2015-11-25T15:07:11Z

@archit279thakur , would you mind just uploading some screenshot, so that we have more sense on the current page structure?

CodingCat · 2015-11-25T15:27:02Z

In the current version of patch, we use expiration time to prevent too many dead executors from appearing on the UI. It brings inconvenient overhead which makes the UI component to have a dependency on Guava...Additionally, there are cases that the executors are failed and restarted time and time again within a very short period (I met this in some of my applications when I introduced some bug, cannot remember what exactly happened)

I'm considering that we might be able to just cap the maximum number of rows in the table, like what we do in many other places (master/worker UI, etc.). Even we stick to the expiration time, TimeStampedHashMap might be a cleaner solution?

CodingCat · 2015-11-25T15:32:18Z

@squito Regarding the page structure, do not trust my sense of aesthetic, :-)

Personally, I prefer to separate the page into two sections, one for alive executors, one for dead ones,

or you provide the capability to the user to sort the entries with status

lianhuiwang · 2015-12-01T05:04:48Z

@archit279thakur @CodingCat @squito I have created new PR #10058. Can you take a look at it?Thanks.

rxin · 2015-12-31T02:46:57Z

I'm going to close this pull request. If this is still relevant and you are interested in pushing it forward, please open a new pull request. Thanks!

archit.thakur added 2 commits May 19, 2015 17:25

SPARK-7729:Executor which has been killed should also be displayed on…

7a82254

… Executors Tab.

Revert "SPARK-7729:Executor which has been killed should also be disp…

84563df

…layed on Executors Tab." This reverts commit 7a82254.

JoshRosen mentioned this pull request Oct 16, 2015

[SPARK-8100][UI]Make able to refer lost executor info in Spark UI #6644

Closed

SPARK-7729: Executor which has been killed should also be displayed o…

33eb9f5

…n Executors Tab.

squito reviewed Nov 4, 2015
View reviewed changes

archit.thakur added 2 commits November 5, 2015 12:50

SPARK-7729: review comments incorporated.

f004f21

SPARK-7729: review comments incorporated - 2.

050e949

squito reviewed Nov 5, 2015
View reviewed changes

archit.thakur added 2 commits November 6, 2015 00:51

SPARK-7729: review comment.

cc387cd

SPARK-7729: review comments - 4.

1fdffc5

squito reviewed Nov 5, 2015
View reviewed changes

archit.thakur added 8 commits November 20, 2015 15:49

SPARK-7729: review comments incorporated.

6e85c85

SPARK-7729: review comments incorporated - 2.

7c18511

SPARK-7729: review comment.

a791bcc

SPARK-7729: review comments - 4.

06042ed

SPARK-7729: another few minor tweeks.

ee12fe1

SPARK-7729: ordering correction.

e031a17

SPARK-7729: developer api annotation.

a0e7cef

SPARK-7729: scalastyle issues fixing.

b827f8f

SPARK-7729: rebasing with master.

33fc892

SPARK-7729: scalastyle again.

2cf4f71

SPARK-7729: build failure correction.

e1577dc

SPARK-7729: scalastyle again.

826587f

CodingCat reviewed Nov 25, 2015
View reviewed changes

vanzin mentioned this pull request Dec 5, 2015

[SPARK-7729][UI]Executor which has been killed should also be displayed on Executor Tab #10058

Closed

asfgit closed this in 7b4452b Dec 31, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-7729:Executor which has been killed should also be displayed on… #6263

SPARK-7729:Executor which has been killed should also be displayed on… #6263

archit279thakur commented May 19, 2015

JoshRosen commented Oct 16, 2015

suyanNone commented Oct 26, 2015

archit279thakur commented Oct 27, 2015

suyanNone commented Nov 3, 2015

archit279thakur commented Nov 4, 2015

squito Nov 4, 2015

archit279thakur Nov 5, 2015

squito Nov 5, 2015

archit279thakur commented Nov 5, 2015

squito Nov 5, 2015

archit279thakur Nov 5, 2015

archit279thakur commented Nov 5, 2015

squito Nov 5, 2015

squito commented Nov 5, 2015

squito commented Nov 5, 2015

SparkQA commented Nov 5, 2015

squito Nov 5, 2015

SparkQA commented Nov 20, 2015

archit279thakur commented Nov 20, 2015

SparkQA commented Nov 20, 2015

SparkQA commented Nov 20, 2015

SparkQA commented Nov 20, 2015

SparkQA commented Nov 20, 2015

SparkQA commented Nov 20, 2015

squito commented Nov 24, 2015

CodingCat Nov 25, 2015

CodingCat commented Nov 25, 2015

CodingCat commented Nov 25, 2015

CodingCat commented Nov 25, 2015

lianhuiwang commented Dec 1, 2015

rxin commented Dec 31, 2015

SPARK-7729:Executor which has been killed should also be displayed on… #6263

SPARK-7729:Executor which has been killed should also be displayed on… #6263

Conversation

archit279thakur commented May 19, 2015

JoshRosen commented Oct 16, 2015

suyanNone commented Oct 26, 2015

archit279thakur commented Oct 27, 2015

suyanNone commented Nov 3, 2015

archit279thakur commented Nov 4, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

archit279thakur commented Nov 5, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

archit279thakur commented Nov 5, 2015

Choose a reason for hiding this comment

squito commented Nov 5, 2015

squito commented Nov 5, 2015

SparkQA commented Nov 5, 2015

Choose a reason for hiding this comment

SparkQA commented Nov 20, 2015

archit279thakur commented Nov 20, 2015

SparkQA commented Nov 20, 2015

SparkQA commented Nov 20, 2015

SparkQA commented Nov 20, 2015

SparkQA commented Nov 20, 2015

SparkQA commented Nov 20, 2015

squito commented Nov 24, 2015

Choose a reason for hiding this comment

CodingCat commented Nov 25, 2015

CodingCat commented Nov 25, 2015

CodingCat commented Nov 25, 2015

lianhuiwang commented Dec 1, 2015

rxin commented Dec 31, 2015