Skip to content

[ds kill taks with no explict reason][ds worker] ds kill taks with no explict reason #5244

@machimedes

Description

@machimedes

Dolphine Scheduler will kill task by itself with no explicit log info about the cause.
Also some opearation may be potentially daugerous.

As show in below pic, two task killed
image

There are logs copy from the logs on server, its not very Suggestive and helpful.
task 1149 and 1150 are those task killed.

[INFO] 2021-04-09 16:12:44.376 org.apache.dolphinscheduler.server.worker.processor.TaskKillProcessor:[87] - received kill command : TaskKillRequestCommand{taskInstanceId=1149}
[INFO] 2021-04-09 16:12:44.381 org.apache.dolphinscheduler.server.worker.processor.TaskKillProcessor:[87] - received kill command : TaskKillRequestCommand{taskInstanceId=1150}

[INFO] 2021-04-09 16:12:44.452 org.apache.dolphinscheduler.server.worker.processor.TaskKillProcessor:[120] - process id:10294, cmd:sudo kill -9 10294 10296 15 357 1149 10302 10305 10403 10405 10406 10407 10408 10409 10410 1041
1 10412 10413 10414 10415 10416 10417 10432 10434 10435 10446 10447 10448 10449 10450 10451 10452 10453 10454 10455 10456 10457 10458 10459 10460 10492 10496 10511 10512 10513 10514 10515 10516 10517 10518 10519 10520 10521 10
522 10523 10524 10525 10526 10527 10529 10530 10532 10533 10535 10536 10537 10538 10564 10566 10567 10573 10574 10575 10576 10577 10578 10579 10580 10587 10588 10599 10600 10601 10602 10603 10604 10605 10608 10615

[INFO] 2021-04-09 16:12:44.452 org.apache.dolphinscheduler.server.worker.processor.TaskKillProcessor:[120] - process id:10297, cmd:sudo kill -9 10297 10307 15 357 1150 10317 10320 10404 10419 10420 10421 10422 10423 10424 10425 10426 10427 10428 10429 10430 10431 10436 10440 10441 10461 10462 10463 10464 10465 10466 10467 10468 10469 10470 10471 10472 10473 10474 10475 10493 10499 10531 10539 10541 10542 10543 10544 10545 10546 10547 10548 10549 10550 10551 10552 10553 10554 10555 10556 10557 10558 10559 10560 10561 10562 10563 10565 10568 10569 10589 10590 10591 10592 10593 10594 10595 10596 10597 10598 10606 10607 10609 10610 10611 10612 10613 10614 10616

[INFO] 2021-04-09 16:12:44.463 - [taskAppId=TASK-15-357-1149]:[214] - process has exited, execute path:/tmp/dolphinscheduler/exec/process/9/15/357/1149, processId:10294 ,exitStatusCode:0
[INFO] 2021-04-09 16:12:44.464 - [taskAppId=TASK-15-357-1150]:[214] - process has exited, execute path:/tmp/dolphinscheduler/exec/process/9/15/357/1150, processId:10297 ,exitStatusCode:0
[INFO] 2021-04-09 16:12:44.466 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[141] - task instance id : 1149,task final status : KILL
[INFO] 2021-04-09 16:12:44.467 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[141] - task instance id : 1150,task final status : KILL
[INFO] 2021-04-09 16:12:44.467 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[161] - develop mode is: false
[INFO] 2021-04-09 16:12:44.467 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[161] - develop mode is: false
[ERROR] 2021-04-09 16:12:44.467 org.apache.dolphinscheduler.server.worker.processor.TaskKillProcessor:[132] - kill task error
org.apache.dolphinscheduler.common.shell.AbstractShell$ExitCodeException: kill: sending signal to 357 failed: No such process
kill: sending signal to 1150 failed: No such process
kill: sending signal to 10404 failed: No such process
kill: sending signal to 10419 failed: No such process
kill: sending signal to 10420 failed: No such process
kill: sending signal to 10421 failed: No such process
kill: sending signal to 10422 failed: No such process
kill: sending signal to 10423 failed: No such process
kill: sending signal to 10425 failed: No such process
kill: sending signal to 10426 failed: No such process
kill: sending signal to 10427 failed: No such process
kill: sending signal to 10428 failed: No such process
kill: sending signal to 10429 failed: No such process

version of Dolphin Scheduler: 1.3.5

Suggestion:
1 Please add more info about the cause for killing the task, resource shortage or operation conflict?
2 When kill a task, the log shows so much pid are kill by kill -9 command. why so much porcess is involved? will this casue other issues?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions