Skip to content

[Improvement][Worker] Do not verify the status of yarn in ShellCommandExecutor #5665

Closed
skymsg wants to merge 4 commits intoapache:devfrom
skymsg:dev
Closed

[Improvement][Worker] Do not verify the status of yarn in ShellCommandExecutor #5665
skymsg wants to merge 4 commits intoapache:devfrom
skymsg:dev

Conversation

@skymsg
Copy link

@skymsg skymsg commented Jun 19, 2021

Do not verify the status of yarn in ShellCommandExecutor (#5564)

Purpose of the pull request

fix #5564
Move the code that verify the status of yarn in ShellCommandExecutor to AbstractYarnTask .
My shell command ouput log contains text that match the pattern APPLICATION_REGEX = "application_\d+_\d+" but that task is not a yarn task. These yarn related code cause my shell command task failure.

Brief change log

Verify this pull request

This pull request is code cleanup without any test coverage.

@CalvinKirs CalvinKirs added first time contributor First-time contributor improvement make more easy to user or prompt friendly labels Jun 21, 2021
@CalvinKirs
Copy link
Member

deeply thx for your contribution, code style check fails.you can refer this doc(https://dolphinscheduler.apache.org/en-us/community/development/pull-request.html)

@codecov-commenter
Copy link

codecov-commenter commented Jun 25, 2021

Codecov Report

Merging #5665 (3b4f06a) into dev (cf99df3) will decrease coverage by 0.02%.
The diff coverage is 7.04%.

Impacted file tree graph

@@             Coverage Diff              @@
##                dev    #5665      +/-   ##
============================================
- Coverage     45.34%   45.32%   -0.03%     
+ Complexity     3689     3682       -7     
============================================
  Files           607      607              
  Lines         24860    24850      -10     
  Branches       2825     2824       -1     
============================================
- Hits          11274    11263      -11     
- Misses        12504    12505       +1     
  Partials       1082     1082              
Impacted Files Coverage Δ
...er/server/worker/task/AbstractCommandExecutor.java 22.01% <ø> (+2.01%) ⬆️
...duler/server/worker/task/CommandExecuteResult.java 100.00% <ø> (ø)
...nscheduler/server/worker/task/datax/DataxTask.java 55.31% <ø> (+0.23%) ⬆️
...cheduler/server/worker/task/python/PythonTask.java 10.52% <ø> (+0.26%) ⬆️
...nscheduler/server/worker/task/shell/ShellTask.java 6.89% <ø> (+0.11%) ⬆️
...scheduler/server/worker/task/AbstractYarnTask.java 6.94% <7.04%> (-13.06%) ⬇️
...dolphinscheduler/remote/future/ResponseFuture.java 81.35% <0.00%> (-1.70%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf99df3...3b4f06a. Read the comment docs.

@CalvinKirs
Copy link
Member

hi, how it is getting?

@skymsg
Copy link
Author

skymsg commented Jun 30, 2021

Sorry , I have a busy schedule recently. I will try to fix these error on the weekend.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jul 5, 2021

}
} else {
logger.error("process has failure , exitStatusCode:{}, processExitValue:{}, ready to kill ...",
result.getExitStatusCode(), process.exitValue());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also need to modify the kill method on line 214. In the current implementation, it will find appliacationId from log and kill the application on yarn, for normal tasks, we don't need to kill on yarn.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just remove the killYarnJob call from kill method maybe enough .

  public static void kill(TaskExecutionContext taskExecutionContext) {
      try {
          int processId = taskExecutionContext.getProcessId();
          if (processId == 0) {
              logger.error("process kill failed, process id :{}, task id:{}",
                      processId, taskExecutionContext.getTaskInstanceId());
              return;
          }

          String pidsStr = getPidsStr(processId);
          if (StringUtils.isNotEmpty(pidsStr)) {
              String cmd = String.format("kill -9 %s", pidsStr);
              cmd = OSUtils.getSudoCmd(taskExecutionContext.getTenantCode(), cmd);
              logger.info("process id:{}, cmd:{}", processId, cmd);
              OSUtils.exeCmd(cmd);
          }

      } catch (Exception e) {
          logger.error("kill task failed", e);
      }
      // find log and kill yarn job
      killYarnJob(taskExecutionContext);
 }

I found the TaskExecuteThread would call the cancelApplication to kill the yarn job when AbstractYarnTask handle method throws Exception.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cancelApplication method of AbstractYarnTask would call ProcessUtils.killYarnJob(taskExecutionContext) for itself

    public void cancelApplication(boolean status) throws Exception {
        cancel = true;
        // cancel process
        shellCommandExecutor.cancelApplication();
        TaskInstance taskInstance = processService.findTaskInstanceById(taskExecutionContext.getTaskInstanceId());
        if (status && taskInstance != null) {
            ProcessUtils.killYarnJob(taskExecutionContext);
        }
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skymsg In some case, the task may just exit with false status and won't throw an exception.

@skymsg skymsg closed this Nov 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

first time contributor First-time contributor improvement make more easy to user or prompt friendly

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement][Worker] Do not verify the status of yarn in ShellCommandExecutor

4 participants