Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix process hang when run with dagger run #5712

Merged
merged 1 commit into from
Oct 10, 2023

Conversation

wingyplus
Copy link
Contributor

This issue discover in #5666, the Elixir got hang when running with dagger run. After debugging, the process hang somehow when using SysProcAttr.Setpgid. Remove it fixes the issue.

Luckily, it fixes #5287 that found with Gradle also.

Fixes #5666
Fixes #5287

@vito
Copy link
Contributor

vito commented Aug 29, 2023

Will need to do some careful testing here; the intent is to ensure things like go run foo.go are properly terminated, because go run runs the compiled binary in a sub-process, and if you only kill the parent process it leaks the child.

@wingyplus
Copy link
Contributor Author

That is the one I’m worry about. I didn’t have a chance to test it more and I have no linux machine. :(

@vito
Copy link
Contributor

vito commented Aug 30, 2023

@wingyplus If you have a Mac I think it should be the same code + behavior

@wingyplus
Copy link
Contributor Author

I have tried with Go on macOS.

During running dagger run go run .. I run ps o pid,ppid,group,command, the process is:

  PID  PPID   GID COMMAND
18312 58973    20 /Users/thanabodee/src/github.com/dagger/dagger/bin/dagger run go run .
18748 18312    20 go run .
18798 18748    20 /var/folders/x7/tl6gyzn92vzcr35t94xgt6y00000gp/T/go-build3448096036/b001/exe/dagger_test

After I cancel the pipeline with ctrl+c. The process is gone but it needs to wait a little bit and receive the message signal: terminated like screenshot below:

Screenshot 2566-08-30 at 23 38 16

Compare to Elixir which's receive run canceled:

Screenshot 2566-08-30 at 23 39 56

I'm not sure why it took time to cancel on Go. But both are gracefully stop when execution is done.

@wingyplus
Copy link
Contributor Author

Let me know if we have another cases need to test. :)

@vito
Copy link
Contributor

vito commented Aug 30, 2023

The 10 second delay comes from the WaitDelay we configure. The SIGTERM isn't propagated to the child process of go run. I'm actually not sure how Go manages to stop the child process after 10 seconds, but it does.

Verified with the following:

package main

import (
	"fmt"
	"time"
)

func main() {
	for {
		time.Sleep(time.Second)
		fmt.Println("sup")
	}
}
dagger run go run .
# in another shell:
while sleep 0.1; ps auxf | grep 'go run\|/test' | grep -v grep; end

Then, press q in dagger run.

You'll see the ps output go through the following stages:

# initializing
vito     16562  0.0  0.1 737724 23688 pts/6    Sl+  14:38   0:00                  \_ dagger run go run .

# running subcmd
vito     16562  4.0  0.1 739068 28240 pts/6    Sl+  14:38   0:00                  \_ dagger run go run .
vito     16690  0.0  0.0 1090972 8992 pts/6    Dl+  14:38   0:00                      \_ go run .

# running child process
vito     16562  5.0  0.1 739324 28488 pts/6    Sl+  14:38   0:00                  \_ dagger run go run .
vito     16690  0.0  0.1 2127792 17932 pts/6   Sl+  14:38   0:00                      \_ go run .
vito     16826  0.0  0.0 710252  1844 pts/6    Sl+  14:38   0:00                          \_ /tmp/go-build3457198731/b001/exe/test

# pressing q or Ctrl+C stops `go run .` but not the child
vito     16562  4.3  0.1 739580 29096 pts/6    Sl+  14:38   0:00      |           \_ dagger run go run .
vito     16826  0.0  0.0 710508  1844 pts/6    Sl+  14:38   0:00      \_ /tmp/go-build3457198731/b001/exe/test

# 10 seconds later, everything stops

Maybe we can figure out what WaitDelay is doing after 10 seconds and have it just do that immediately? But with a softer signal, in case it's doing a SIGKILL.

@wingyplus
Copy link
Contributor Author

I still cannot figure out why Go is stuck on wait delay but it seems it exits with SIGTERM instead of run canceled.

For the gradle (https://github.com/wingyplus/dagger-kotlin) exit with code 143 due to SIGTERM instead of run canceled.

@wingyplus
Copy link
Contributor Author

Go cmd will listen the signal SIGINT. If we change to that signal it’ll exit without waiting. But it can’t be used with Elixir because when the VM receive that signal, it’ll open a prompt for reconfirmation. T T

@TomChv
Copy link
Member

TomChv commented Sep 13, 2023

@wingyplus So as far as I understand, we are blocked on the kind of signal to send to end the process?

@vito @sipsma Do you have any idea in mind? This PR is important to fix a major issue with gradle, I'll also check on my own but I'm not an expert in this area haha

This issue discover in dagger#5666, the Elixir got hang when running with
`dagger run`. After debugging, the process hang somehow when using
`SysProcAttr.Setpgid`. Remove it fixes the issue.

Luckily, it fixes dagger#5287 that found with Gradle also.

Fixes dagger#5666
Fixes dagger#5287

Signed-off-by: Thanabodee Charoenpiriyakij <wingyminus@gmail.com>
@TomChv
Copy link
Member

TomChv commented Sep 21, 2023

I rebased the PR, I'm taking a look again at this PR

@TomChv
Copy link
Member

TomChv commented Sep 21, 2023

Something went wrong after the rebase, the process still hang with gradle (cf #5287)

See: #5287 (comment)

@wingyplus
Copy link
Contributor Author

I'll help looking into it tomorrow. 🙇

@wingyplus
Copy link
Contributor Author

wingyplus commented Sep 26, 2023

@wingyplus So as far as I understand, we are blocked on the kind of signal to send to end the process?

@vito @sipsma Do you have any idea in mind? This PR is important to fix a major issue with gradle, I'll also check on my own but I'm not an expert in this area haha

@TomChv I faced 2 problems here:

  1. Setpgid blocked us from sending a signal on some programs such as Elixir and Gradle.
  2. After removing Setpgid, some programs may not respond on the SIGTERM such as, Go using SIGINT (or SIGQUIT on UNIX) to terminate the child process.

@gmile
Copy link
Contributor

gmile commented Sep 27, 2023

Really looking forward to seeing this one fixed. Writing pipelines using Elixir SDK through dagger run is not usable right now, since it always hangs at the end. Need to run elixir my-pipeline.exs instead.

Copy link
Contributor

@vito vito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the radio silence on this, I've got a lot of spinning plates and this is a particular rabbit hole I've been avoiding due to the intersection of various ecosystems and platform-specific behavior. 😅

To be honest my feeling is it's Go who is in the wrong here, and that go run should propagate signals to its child process so we don't have to jump through hoops. There are two bugs open for this upstream (golang/go#54868 and golang/go#40467) but no one has found the time to fix it yet. I suppose it's not a high priority because it doesn't normally matter in a shell, where there's already a process group for each command and Ctrl+C gets sent to the whole group.

At this point I'm OK with just leaving go run just as broken as it was before, since this seems to be causing a harsher pain for those in other ecosystems.

Approving now before I forget, won't merge since it's the weekend, but if anyone wants to go ahead and merge it feel free.

@wingyplus
Copy link
Contributor Author

@vito Thank you. ^^

@vito vito merged commit 41634f3 into dagger:main Oct 10, 2023
39 checks passed
@wingyplus wingyplus deleted the iss-5666-5287 branch October 10, 2023 15:11
@gerhard gerhard added this to the v0.8.8 milestone Oct 11, 2023
schlapzz pushed a commit to schlapzz/dagger that referenced this pull request Oct 24, 2023
This issue discover in dagger#5666, the Elixir got hang when running with
`dagger run`. After debugging, the process hang somehow when using
`SysProcAttr.Setpgid`. Remove it fixes the issue.

Luckily, it fixes dagger#5287 that found with Gradle also.

Fixes dagger#5666
Fixes dagger#5287

Signed-off-by: Thanabodee Charoenpiriyakij <wingyminus@gmail.com>
Signed-off-by: Christian Schlatter <schlatter@puzzle.ch>
schlapzz pushed a commit to schlapzz/dagger that referenced this pull request Nov 24, 2023
This issue discover in dagger#5666, the Elixir got hang when running with
`dagger run`. After debugging, the process hang somehow when using
`SysProcAttr.Setpgid`. Remove it fixes the issue.

Luckily, it fixes dagger#5287 that found with Gradle also.

Fixes dagger#5666
Fixes dagger#5287

Signed-off-by: Thanabodee Charoenpiriyakij <wingyminus@gmail.com>
Signed-off-by: Christian Schlatter <schlatter@puzzle.ch>
schlapzz pushed a commit to schlapzz/dagger that referenced this pull request Nov 24, 2023
This issue discover in dagger#5666, the Elixir got hang when running with
`dagger run`. After debugging, the process hang somehow when using
`SysProcAttr.Setpgid`. Remove it fixes the issue.

Luckily, it fixes dagger#5287 that found with Gradle also.

Fixes dagger#5666
Fixes dagger#5287

Signed-off-by: Thanabodee Charoenpiriyakij <wingyminus@gmail.com>
schlapzz pushed a commit to schlapzz/dagger that referenced this pull request Nov 24, 2023
This issue discover in dagger#5666, the Elixir got hang when running with
`dagger run`. After debugging, the process hang somehow when using
`SysProcAttr.Setpgid`. Remove it fixes the issue.

Luckily, it fixes dagger#5287 that found with Gradle also.

Fixes dagger#5666
Fixes dagger#5287

Signed-off-by: Thanabodee Charoenpiriyakij <wingyminus@gmail.com>
schlapzz pushed a commit to schlapzz/dagger that referenced this pull request Nov 24, 2023
This issue discover in dagger#5666, the Elixir got hang when running with
`dagger run`. After debugging, the process hang somehow when using
`SysProcAttr.Setpgid`. Remove it fixes the issue.

Luckily, it fixes dagger#5287 that found with Gradle also.

Fixes dagger#5666
Fixes dagger#5287

Signed-off-by: Thanabodee Charoenpiriyakij <wingyminus@gmail.com>
schlapzz pushed a commit to schlapzz/dagger that referenced this pull request Nov 24, 2023
This issue discover in dagger#5666, the Elixir got hang when running with
`dagger run`. After debugging, the process hang somehow when using
`SysProcAttr.Setpgid`. Remove it fixes the issue.

Luckily, it fixes dagger#5287 that found with Gradle also.

Fixes dagger#5666
Fixes dagger#5287

Signed-off-by: Thanabodee Charoenpiriyakij <wingyminus@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐞 Elixir SDK always hang when running with dagger run dagger run does not work with Gradle
5 participants