-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-9209] Using executor allocation, a executor is removed but it exists in ExecutorsPage of the web ui #7559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The executor log about this executor : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this doing? this is inefficient (creates regexes every time) and has strange syntax. Boolean or is not invoked as a method; extra parens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen
val execInfoSorted = execInfo.filter(x => {
val numPattern = "[0-9]+".r
val noIntId = numPattern.findFirstIn(x.id).isEmpty
listener.getExecutorIds.contains(x.id).||(noIntId)
}).sortBy(_.id)
I intent to filter active executor id. In Executorspage, the type of executor id is digital or character, eg

So i use regexes .
|
add to whitelist |
|
Test build #37960 has finished for PR 7559 at commit
|
|
Test build #38329 has finished for PR 7559 at commit
|
|
I think my change code is not the reason of this failed test , And in my machine, this test passed. |
|
@KaiXinXiaoLei I think there are still some issues with your code, mostly that you're not explaining the change. |
|
@srowen I am sorry not to make clear. My problem is : a executor is removed successfully but it exists in ExecutorsPage of the web ui. From code, i find the executor id in ExecutorsPage is got by it's blockManagerId: So i suggest, add a variable to save active executor id, then reading this value of variable to show active executor in ExecutorsPage, For example, i add a variable "executorIds" in ExecutorsTab.scala to save active executor id, and in ExecutorsPage.scala, filter active executor id through "executorIds" . |
|
I don't think it makes sense to add yet another piece of bookkeeping to patch over another problem. Update the existing data structures in the listener class. In any event I'm saying this change isn't OK from a style perspective. ... but I'm still not clear from your picture whether the executor is shown in an active state or not? is it not shown to be finished? |
|
@srowen In my picture, i just want to say the type of executor id is digital or character, eg: the executor id 5 and "driver" in picture. Now, do you understand my problem? If my way to fix this problem is not OK, can you give me a idea to fix this problem? Thanks. |
|
No, that's not what I'm referring to. It is hacky to filter executors based on numeric-or-not name but happens to work now. The problem is that most lines of code in this PR have a problem -- you have a I think it's best to close this PR and start over, since I don't think the approach is right either. surely the problem, if there is one, is farther upstream? why does the listener have In the JIRA, can you propose any easy reproduction of this case? |
|
Do you mind closing this PR? I don't think we're making progress in discussing it and I don't think this is a correct change |
|
ok. i close this. and will find the better way |
I set "spark.dynamicAllocation.enabled = true”, and run a big job. After some minutes, in driver, a executor is asked to remove. Then it's removed successfully, and the process of this executor is not exist. But it exists in ExecutorsPage of the web ui.
The log in driver :
2015-07-17 11:48:14,543 | INFO | [sparkDriver-akka.actor.default-dispatcher-3] | Removing block manager BlockManagerId(264, 172.1.1.8, 23811)
2015-07-17 11:48:14,543 | INFO | [dag-scheduler-event-loop] | Removed 264 successfully in removeExecutor
2015-07-17 11:48:21,226 | INFO | [sparkDriver-akka.actor.default-dispatcher-3] | Registering block manager 172.1.1.8:23811 with 10.4 GB RAM, BlockManagerId(264, 172.1.1.8, 23811)
2015-07-17 11:48:21,228 | INFO | [sparkDriver-akka.actor.default-dispatcher-3] | Added broadcast_781_piece0 in memory on 172.1.1.8:23811 (size: 38.6 KB, free: 10.4 GB)
2015-07-17 11:48:35,277 | ERROR | [sparkDriver-akka.actor.default-dispatcher-16] | Lost executor 264 on datasight-195: remote Rpc client disassociated
2015-07-17 11:48:35,277 | WARN | [sparkDriver-akka.actor.default-dispatcher-4] | Association with remote system [akka.tcp://sparkExecutor@datasight-195:23929] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
2015-07-17 11:48:35,277 | INFO | [sparkDriver-akka.actor.default-dispatcher-16] | Re-queueing tasks for 264 from TaskSet 415.0
2015-07-17 11:48:35,804 | INFO | [SparkListenerBus] | Existing executor 264 has been removed (new total is 10)