fix(python apps): fork child + exec appcmd in child to avoid issues#1580
fix(python apps): fork child + exec appcmd in child to avoid issues#1580cicoyle merged 56 commits intodapr:masterfrom
Conversation
…problems Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
…into fix-python-hangs
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
|
@sicoyle tests are very red |
Signed-off-by: Samantha Coyle <sam@diagrid.io>
…into fix-python-hangs
Signed-off-by: Samantha Coyle <sam@diagrid.io>
…into fix-python-hangs
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1580 +/- ##
==========================================
- Coverage 15.74% 15.45% -0.29%
==========================================
Files 64 64
Lines 7090 7220 +130
==========================================
Hits 1116 1116
- Misses 5888 6018 +130
Partials 86 86 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| cli, err := dockerClient.NewClientWithOpts(dockerClient.FromEnv) | ||
| cli, err := dockerClient.NewClientWithOpts(dockerClient.FromEnv, dockerClient.WithVersion("1.48")) |
There was a problem hiding this comment.
Adding a hardcoded Docker API version "1.48" may cause compatibility issues with older Docker installations. The Docker client should ideally negotiate the API version automatically, or this version should be made configurable. Consider whether this version requirement is documented and whether it's the minimum supported version for all features used in these tests.
| cli, err := dockerClient.NewClientWithOpts(dockerClient.FromEnv, dockerClient.WithVersion("1.48")) | |
| cli, err := dockerClient.NewClientWithOpts(dockerClient.FromEnv) |
There was a problem hiding this comment.
I added this as I was getting a compatibility issue due to versions so pinned on one that would work.
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
|
ready again @JoshVanL pls |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| err = syscall.Kill(-pgid, syscall.SIGTERM) | ||
| if err != nil { | ||
| // If process group doesn't exist (ESRCH), treat it as already stopped. | ||
| if err == syscall.ESRCH { | ||
| return nil | ||
| } | ||
| errKill := syscall.Kill(-pgid, syscall.SIGKILL) | ||
| // If process group doesn't exist, treat it as already stopped. | ||
| if errKill != nil && errKill != syscall.ESRCH { | ||
| return errKill | ||
| } | ||
| } | ||
| return nil |
There was a problem hiding this comment.
After sending SIGTERM (line 80), if it succeeds, the function returns immediately without waiting for the process to actually exit. This means the caller has no guarantee that the process group has terminated. Consider documenting that this is a fire-and-forget termination, or implementing a wait-and-verify mechanism if process termination confirmation is needed.
| } | ||
| return nil | ||
| } | ||
| // Kill the whole process group. |
There was a problem hiding this comment.
Changed signal from SIGINT to SIGTERM, which is more conventional for graceful shutdown. However, this could potentially affect applications that specifically handle SIGINT differently than SIGTERM. Consider documenting this change in user-facing documentation or release notes to inform users about the signal change.
| // Kill the whole process group. | |
| // Kill the whole process group. | |
| // Use SIGTERM for graceful shutdown of the process group. This differs from using SIGINT | |
| // and may affect applications that handle SIGINT and SIGTERM differently. |
cmd/run_unix.go
Outdated
| return fmt.Errorf("failed to fork/exec app: %w", err) | ||
| } | ||
| if output.AppCMD == nil { | ||
| return fmt.Errorf("app command is nil") |
There was a problem hiding this comment.
After successfully calling ForkExec and before assigning the process, there's a check for output.AppCMD == nil. If this is nil, the function returns an error but the forked child process has already been started (via ForkExec at line 59) and is now orphaned. Consider either ensuring AppCMD is never nil before calling ForkExec, or cleaning up the forked process if AppCMD is unexpectedly nil by calling syscall.Kill on the returned pid.
| return fmt.Errorf("app command is nil") | |
| // Clean up the forked child process to avoid leaving it orphaned. | |
| if killErr := syscall.Kill(pid, syscall.SIGKILL); killErr != nil { | |
| return fmt.Errorf("app command is nil; failed to kill child process %d: %w", pid, killErr) | |
| } | |
| return fmt.Errorf("app command is nil; killed child process %d", pid) |
Signed-off-by: Samantha Coyle <sam@diagrid.io>
fix(python apps): fork child + exec appcmd in child to avoid issues
Description
Fork is unsafe after threading has been used, at least for python. This is a known issue in the python community. See:
Forking after threading has been used can cause deadlocks/hangs when using threading primitives like Lock, Condition, or Queue in a process created via fork. All of this occurs within the internals of durabletask-python. This means it will impact any app using the python sdk, be it dapr agents or general python workflows. I tried various python versions and they all experience hangs, at various points due to this.
This PR corrects this by using syscall.ForkExec to fork a child, then execs python or whatever app language within that child so the app language starts via exec, avoiding it knowing about the fork. I will say a downfall of this is I lose the blue APP prefix stuff on the log lines. So this means changing the UX via terminals unfortunately. I tried adding that back with pipes, but then adding the pipes to give me the log prefix then broke my python app again causing the hang.
This is what it looks like btw with this PR. Blue INFO is from the sidecar and the app logs are in white without the ===APP=== prefix that we used to have:
I need to test this on a windows machine to make sure I don't leave zombies, and also I think at this point this is good enough to fix other OS's. I could leave a note in the docs that running python apps on windows could lead to unpredictable behavior for now and create a follow up issue for that. Thoughts?
Issue reference
Checklist
Please make sure you've completed the relevant tasks for this PR, out of the following list: