Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd.Process.Kill() does not clean up the started process, causing the job to remain running. #1385

Closed
cobolbaby opened this issue Sep 15, 2023 · 2 comments · Fixed by #1389
Closed

Comments

@cobolbaby
Copy link
Contributor

cobolbaby commented Sep 15, 2023

Describe the bug
A clear and concise description of what the bug is.

https://github.com/distribworks/dkron/blob/master/builtin/bins/dkron-executor-shell/shell.go#L119C1-L124

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Init State:

2023-09-18 16-58-21屏幕截图

After cmd.Process.Kill()

1265843327

Specifications:

  • OS: dkron/dkron:3.2.4
  • Version: v3.2.4

Additional context
Add any other context about the problem here.

@yvanoers
Copy link
Collaborator

Have you encountered a situation where the process didn't get killed?
If so, any insights into why?

@cobolbaby
Copy link
Contributor Author

cobolbaby commented Sep 20, 2023

os.Process.Kill() only kill the parent process, not including the child processes.

The following is the code for fix:

	jobTimeout := args.Config["timeout"]
	var jt time.Duration

	if jobTimeout != "" {
		jt, err = time.ParseDuration(jobTimeout)
		if err != nil {
			return nil, errors.New("shell: Error parsing job timeout")
		}
		cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
	}

	log.Printf("shell: going to run %s", command)

	err = cmd.Start()
	if err != nil {
		return nil, err
	}

	var jobTimeoutMessage string
	var jobTimedOut bool

	if jt != 0 {
		slowTimer := time.AfterFunc(jt, func() {
			// Kill child process to avoid cmd.Wait()
			err := syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL) // note the minus sign
			if err != nil {
				jobTimeoutMessage = fmt.Sprintf("shell: Job '%s' execution time exceeding defined timeout %v. SIGKILL returned error. Job may not have been killed", command, jt)
			} else {
				jobTimeoutMessage = fmt.Sprintf("shell: Job '%s' execution time exceeding defined timeout %v. Job was killed", command, jt)
			}

			jobTimedOut = true
			return
		})

		defer slowTimer.Stop()
	}

@cobolbaby cobolbaby changed the title No retry mechanism after cmd.Process.Kill() fails, causing the task to remain running. cmd.Process.Kill() does not clean up the started process, causing the job to remain running. Sep 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants