Skip to content

Conversation

@KaiXinXiaoLei
Copy link

I set "spark.dynamicAllocation.enabled = true”, and run a big job. After some minutes, in driver, a executor is asked to remove. Then it's removed successfully, and the process of this executor is not exist. But it exists in ExecutorsPage of the web ui.

The log in driver :
2015-07-17 11:48:14,543 | INFO | [sparkDriver-akka.actor.default-dispatcher-3] | Removing block manager BlockManagerId(264, 172.1.1.8, 23811)
2015-07-17 11:48:14,543 | INFO | [dag-scheduler-event-loop] | Removed 264 successfully in removeExecutor
2015-07-17 11:48:21,226 | INFO | [sparkDriver-akka.actor.default-dispatcher-3] | Registering block manager 172.1.1.8:23811 with 10.4 GB RAM, BlockManagerId(264, 172.1.1.8, 23811)
2015-07-17 11:48:21,228 | INFO | [sparkDriver-akka.actor.default-dispatcher-3] | Added broadcast_781_piece0 in memory on 172.1.1.8:23811 (size: 38.6 KB, free: 10.4 GB)
2015-07-17 11:48:35,277 | ERROR | [sparkDriver-akka.actor.default-dispatcher-16] | Lost executor 264 on datasight-195: remote Rpc client disassociated
2015-07-17 11:48:35,277 | WARN | [sparkDriver-akka.actor.default-dispatcher-4] | Association with remote system [akka.tcp://sparkExecutor@datasight-195:23929] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
2015-07-17 11:48:35,277 | INFO | [sparkDriver-akka.actor.default-dispatcher-16] | Re-queueing tasks for 264 from TaskSet 415.0
2015-07-17 11:48:35,804 | INFO | [SparkListenerBus] | Existing executor 264 has been removed (new total is 10)

@KaiXinXiaoLei
Copy link
Author

The executor log about this executor :
2015-07-17 11:48:20,762 | ERROR | [SIGTERM handler] | RECEIVED SIGNAL 15: SIGTERM
2015-07-17 11:48:20,993 | WARN | [driver-heartbeater] | Told to re-register on heartbeat
2015-07-17 11:48:20,993 | INFO | [driver-heartbeater] | BlockManager re-registering with master
2015-07-17 11:48:21,052 | INFO | [Thread-1] | Shutdown hook called
2015-07-17 11:48:21,224 | INFO | [Thread-1] | Shutdown hook called
2015-07-17 11:48:21,184 | INFO | [driver-heartbeater] | Trying to register BlockManager
2015-07-17 11:48:21,227 | INFO | [driver-heartbeater] | Registered BlockManager
2015-07-17 11:48:21,227 | INFO | [driver-heartbeater] | Reporting 16 blocks to the master.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this doing? this is inefficient (creates regexes every time) and has strange syntax. Boolean or is not invoked as a method; extra parens.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen
val execInfoSorted = execInfo.filter(x => {
val numPattern = "[0-9]+".r
val noIntId = numPattern.findFirstIn(x.id).isEmpty
listener.getExecutorIds.contains(x.id).||(noIntId)
}).sortBy(_.id)

I intent to filter active executor id. In Executorspage, the type of executor id is digital or character, eg
executorspage
So i use regexes .

@andrewor14
Copy link
Contributor

add to whitelist

@SparkQA
Copy link

SparkQA commented Jul 21, 2015

Test build #37960 has finished for PR 7559 at commit f1a20cb.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 24, 2015

Test build #38329 has finished for PR 7559 at commit 0e973e6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@KaiXinXiaoLei
Copy link
Author

I think my change code is not the reason of this failed test , And in my machine, this test passed.

@srowen
Copy link
Member

srowen commented Jul 25, 2015

@KaiXinXiaoLei I think there are still some issues with your code, mostly that you're not explaining the change.

@KaiXinXiaoLei
Copy link
Author

@srowen I am sorry not to make clear. My problem is : a executor is removed successfully but it exists in ExecutorsPage of the web ui.

From code, i find the executor id in ExecutorsPage is got by it's blockManagerId:
val status = listener.storageStatusList(statusId)
val execId = status.blockManagerId.executorId
val hostPort = status.blockManagerId.hostPort

So i suggest, add a variable to save active executor id, then reading this value of variable to show active executor in ExecutorsPage,

For example, i add a variable "executorIds" in ExecutorsTab.scala to save active executor id, and in ExecutorsPage.scala, filter active executor id through "executorIds" .

@srowen
Copy link
Member

srowen commented Jul 27, 2015

I don't think it makes sense to add yet another piece of bookkeeping to patch over another problem. Update the existing data structures in the listener class. In any event I'm saying this change isn't OK from a style perspective.

... but I'm still not clear from your picture whether the executor is shown in an active state or not? is it not shown to be finished?

@KaiXinXiaoLei
Copy link
Author

@srowen In my picture, i just want to say the type of executor id is digital or character, eg: the executor id 5 and "driver" in picture.

Now, do you understand my problem? If my way to fix this problem is not OK, can you give me a idea to fix this problem? Thanks.

@srowen
Copy link
Member

srowen commented Jul 27, 2015

No, that's not what I'm referring to. It is hacky to filter executors based on numeric-or-not name but happens to work now. The problem is that most lines of code in this PR have a problem -- you have a var where a val would do, if you use collection mutation methods; there's an unnecessary getter; you're using methods like :+ with a ., unnecessary parens, missing whitespace, etc.

I think it's best to close this PR and start over, since I don't think the approach is right either. surely the problem, if there is one, is farther upstream? why does the listener have StorageStatus for removed executors?

In the JIRA, can you propose any easy reproduction of this case?

@srowen
Copy link
Member

srowen commented Aug 2, 2015

Do you mind closing this PR? I don't think we're making progress in discussing it and I don't think this is a correct change

@KaiXinXiaoLei
Copy link
Author

ok. i close this. and will find the better way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants