You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had searched in the issues and found no similar issues.
What happened
Sometimes it happens that after stopping the workflow, the status of the killed task is still running
This is a screenshot of the task configuration:
What you expected to happen
Killed task status shows as killed
How to reproduce
I think the reason is that the status update of the workflow and task is independent. The workflow is set to the stopped state before the task is completely stopped, and then deleted from the processInstanceMap, so that the taskResponse cannot be processed.
The code must be changed to reproduce as much as possible.
Modify line 175 of org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService and add the following code
if (null != taskResponsePersistThread) {
if (taskResponsePersistThread.addEvent(taskResponseEvent))...
} else {
logger.error("task response persist thread is null");
}
Modify line 118 of org.apache.dolphinscheduler.server.worker.processor.TaskKillProcessor and add the following code
The reason for this is to simulate another site to remove the instance in processInstanceMap.
if (taskResponsePersistThread.eventSize() == 0) {
if (!processInstanceMap.containsKey(taskResponsePersistThread.getProcessInstanceId())) {
processTaskResponseMap.remove(taskResponsePersistThread.getProcessInstanceId());
logger.info("remove process instance: {}", taskResponsePersistThread.getProcessInstanceId());
}
continue;
}
This will cause the instance in processTaskResponseMap to be removed in the following code, resulting in TaskResponsePersistThread being empty and printing
task response persist thread is null
master log:
[INFO] 2022-12-21 20:44:27.372 org.apache.dolphinscheduler.server.master.processor.CacheProcessor:[52] - received command : CacheExpireCommand{CacheType=PROCESS_DEFINITION, cacheKey=7960963905984}
[INFO] 2022-12-21 20:44:27.390 org.apache.dolphinscheduler.server.master.processor.CacheProcessor:[70] - cache evict, type:processDefinition, key:7960963905984
[INFO] 2022-12-21 20:44:28.363 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[255] - find command 8, slot:0 :
[INFO] 2022-12-21 20:44:28.363 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[208] - find one command: id: 8, type: START_PROCESS
[INFO] 2022-12-21 20:44:28.482 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[221] - handle command end, command 8 process 8 start...
[INFO] 2022-12-21 20:44:28.650 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1370] - add task to stand by list, task name:sleep-20s, task id:0, task code:7960945419712
[INFO] 2022-12-21 20:44:28.685 org.apache.dolphinscheduler.service.process.ProcessService:[1088] - start submit task : sleep-20s, instance id:8, state: RUNNING_EXECUTION
[INFO] 2022-12-21 20:44:28.803 org.apache.dolphinscheduler.service.process.ProcessService:[1101] - end submit task to db successfully:8 sleep-20s state:SUBMITTED_SUCCESS complete, instance id:8 state: RUNNING_EXECUTION
[INFO] 2022-12-21 20:44:28.823 org.apache.dolphinscheduler.server.master.runner.task.CommonTaskProcessor:[164] - master submit success, task id:8, task name:sleep-20s, process id:8
[INFO] 2022-12-21 20:44:28.824 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1384] - remove task from stand by list, id: 8 name:sleep-20s
[INFO] 2022-12-21 20:44:29.043 org.apache.dolphinscheduler.server.master.processor.TaskAckProcessor:[71] - taskAckCommand : TaskExecuteAckCommand{taskInstanceId=8, startTime=Wed Dec 21 20:44:29 CST 2022, host='192.168.20.62:1234', status=1, logPath='/Users/molin/dev/github/dolphinscheduler/logs/7960963905984_2/8/8.log', executePath='/tmp/dolphinscheduler/exec/process/7960944102080/7960963905984_2/8/8', processInstanceId='0'}
[INFO] 2022-12-21 20:44:29.043 org.apache.dolphinscheduler.server.master.processor.TaskAckProcessor:[71] - taskAckCommand : TaskExecuteAckCommand{taskInstanceId=8, startTime=Wed Dec 21 20:44:29 CST 2022, host='192.168.20.62:1234', status=1, logPath='/Users/molin/dev/github/dolphinscheduler/logs/7960963905984_2/8/8.log', executePath='/tmp/dolphinscheduler/exec/process/7960944102080/7960963905984_2/8/8', processInstanceId='8'}
[ERROR] 2022-12-21 20:44:29.044 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[176] - task response persist thread is null
[INFO] 2022-12-21 20:44:33.368 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[226] - already exists handler process size:0
[INFO] 2022-12-21 20:44:33.703 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[233] - persist events 8 succeeded.
[INFO] 2022-12-21 20:44:42.226 org.apache.dolphinscheduler.server.master.processor.StateEventProcessor:[75] - received command : State Event :key: 8-0-8-0 type: PROCESS_STATE_CHANGE executeStatus: READY_STOP task instance id: 0 process instance id: 8 context: null
[INFO] 2022-12-21 20:44:42.975 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - handle process instance : 8 , events count:1
[INFO] 2022-12-21 20:44:42.980 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - already exists handler process size:0
[INFO] 2022-12-21 20:44:42.982 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[310] - process event: State Event :key: 8-0-8-0 type: PROCESS_STATE_CHANGE executeStatus: READY_STOP task instance id: 0 process instance id: 8 context: null
[INFO] 2022-12-21 20:44:42.983 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[497] - process:8 state RUNNING_EXECUTION change to READY_STOP
[INFO] 2022-12-21 20:44:43.013 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1422] - kill called on process instance id: 8, num: 1
[INFO] 2022-12-21 20:44:43.094 org.apache.dolphinscheduler.server.master.runner.task.CommonTaskProcessor:[209] - master kill taskInstance name :sleep-20s taskInstance id:8
[INFO] 2022-12-21 20:44:43.095 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[310] - process event: State Event :key: null type: PROCESS_STATE_CHANGE executeStatus: STOP task instance id: 0 process instance id: 8 context: null
[INFO] 2022-12-21 20:44:43.095 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[497] - process:8 state READY_STOP change to STOP
[INFO] 2022-12-21 20:44:43.107 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1320] - work flow process instance [id: 8, name:test_task_status_error-2-20221221204428415], state change from READY_STOP to STOP, cmd type: STOP
[INFO] 2022-12-21 20:44:43.307 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[139] - process instance 8 finished.
[INFO] 2022-12-21 20:44:43.385 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[222] - remove process instance: 8
[INFO] 2022-12-21 20:44:43.610 org.apache.dolphinscheduler.server.master.processor.TaskResponseProcessor:[72] - received command : TaskExecuteResponseCommand{taskInstanceId=8, processInstanceId=8, status=9, startTime=null, endTime=Wed Dec 21 20:44:43 CST 2022, host=null, logPath=null, executePath=null, processId=2407, appIds='', varPool=[]}
[ERROR] 2022-12-21 20:44:43.613 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[176] - task response persist thread is null
Search before asking
What happened
Sometimes it happens that after stopping the workflow, the status of the killed task is still running
This is a screenshot of the task configuration:
What you expected to happen
Killed task status shows as killed
How to reproduce
I think the reason is that the status update of the workflow and task is independent. The workflow is set to the stopped state before the task is completely stopped, and then deleted from the processInstanceMap, so that the taskResponse cannot be processed.
The code must be changed to reproduce as much as possible.
Modify line 175 of
org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService
and add the following codeModify line 118 of
org.apache.dolphinscheduler.server.worker.processor.TaskKillProcessor
and add the following codeThe reason for this is to simulate another site to remove the instance in processInstanceMap.
This will cause the instance in
processTaskResponseMap
to be removed in the following code, resulting inTaskResponsePersistThread
being empty and printingmaster log:
worker log:
Anything else
No response
Version
2.0.x
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: