-
Notifications
You must be signed in to change notification settings - Fork 848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unblock process deletion in worker runtime #8479
Open
xtremerui
wants to merge
1
commit into
master
Choose a base branch
from
issue/8462-aborting-build-stuck
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenbh do you mean why should we delete it here? My knowledge is limited here about linux programming TBO. Not sure how it suppose to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of what happens is:
Stop(false)
on the worker which will try to gracefully kill the processSIGINT/SIGTERM
), but if that doesn't work it sends aSIGKILL
SIGINT/SIGTERM
and refuse to self-terminate, then it should've been killed by theSIGKILL
p.process.IO().Wait()
eventually returns (because you know, the process is killed and the IO is closed)p.process.Delete()
to free up any remaining OS resources reserved by the processBut from the #8462, it looks like the the resource is misbehaving by ignoring the
SIGINT/SIGTERM
, but for some reason theSIGKILL
wasn't issued, which allowed the process to continue executing and jamming up the workerThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pool resource hasn't been changed for a long time. The change comes from the big refactor in Concourse v7.5.
The problem is on the last part. During my investigation, I cancel the build while monitoring the process in the container.
Before cancelling,
After cancelling, my ssh session got kicked out and i have to ssh back to see:
worker logs after cancelling:
you can see the process was actually killed (v7.4 behaves the same), and the worker logs shows garden stopped the container, but the IO was still running and outputing those
...
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenbh I have attached a pipeline for reproducible steps and acceptance.