Skip to content

orphan jobs on the runner #3737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
d-bytebase opened this issue Mar 8, 2025 · 2 comments
Open

orphan jobs on the runner #3737

d-bytebase opened this issue Mar 8, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@d-bytebase
Copy link

I'm encountering an issue with my runners. They are showing an 'ACTIVE' status, but are not processing any jobs. Furthermore, I'm unable to delete these runners due to a message indicating 'jobs are running,' even though no jobs are actually in progress. I'm looking for assistance to resolve this stuck state.

https://github.com/bytebase/bytebase/settings/actions/runners

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Run '....'
  3. See error

Expected behavior
A clear and concise description of what you expected to happen.

Runner Version and Platform

Version of your runner?

OS of the machine running the runner? OSX/Windows/Linux/...

What's not working?

Please include error messages and screenshots.

Job Log Output

If applicable, include the relevant part of the job / step log output here. All sensitive information should already be masked out, but please double-check before pasting here.

Runner and Worker's Diagnostic Logs

If applicable, add relevant diagnostic log information. Logs are located in the runner's _diag folder. The runner logs are prefixed with Runner_ and the worker logs are prefixed with Worker_. Each job run correlates to a worker log. All sensitive information should already be masked out, but please double-check before pasting here.

@d-bytebase d-bytebase added the bug Something isn't working label Mar 8, 2025
@TingluoHuang
Copy link
Member

probably related to https://www.githubstatus.com/incidents/m7vl0x8k3j9c

@enescakir
Copy link
Contributor

Hi @TingluoHuang, there are no incidents for today, but we do occasionally encounter this issue. Do you have any unannounced incidents for today?

$ ps axo stat,ppid,pid,comm | grep -w defunct
Z       1866    3209 esbuild <defunct>

After terminating the zombie processes, the runner script was able to exit properly.

enescakir added a commit to ubicloud/ubicloud that referenced this issue Mar 19, 2025
We destroy the runner when GitHub sends the workflow job completed
webhook event.

Recently, the runner script has gotten stuck a few times due to zombie
processes.

This might be related to GitHub incidents, but we don't have enough
information to confirm it.

actions/runner#3737

Since GitHub servers think the script is still running, they don't stop
it.

Customers can force terminate the runner by clicking the button. This
destroys the underlying virtual machine, causing GitHub servers to lose
connection to the runner and mark the job as failed.
enescakir added a commit to ubicloud/ubicloud that referenced this issue Mar 19, 2025
We destroy the runner when GitHub sends the workflow job completed
webhook event.

Recently, the runner script has gotten stuck a few times due to zombie
processes.

This might be related to GitHub incidents, but we don't have enough
information to confirm it.

actions/runner#3737

Since GitHub servers think the script is still running, they don't stop
it.

Customers can force terminate the runner by clicking the button. This
destroys the underlying virtual machine, causing GitHub servers to lose
connection to the runner and mark the job as failed.
enescakir added a commit to ubicloud/ubicloud that referenced this issue Mar 19, 2025
We destroy the runner when GitHub sends the workflow job completed
webhook event.

Recently, the runner script has gotten stuck a few times due to zombie
processes.

This might be related to GitHub incidents, but we don't have enough
information to confirm it.

actions/runner#3737

Since GitHub servers think the script is still running, they don't stop
it.

Customers can force terminate the runner by clicking the button. This
destroys the underlying virtual machine, causing GitHub servers to lose
connection to the runner and mark the job as failed.
enescakir added a commit to ubicloud/ubicloud that referenced this issue Mar 19, 2025
We destroy the runner when GitHub sends the workflow job completed
webhook event.

Recently, the runner script has gotten stuck a few times due to zombie
processes.

This might be related to GitHub incidents, but we don't have enough
information to confirm it.

actions/runner#3737

Since GitHub servers think the script is still running, they don't stop
it.

Customers can force terminate the runner by clicking the button. This
destroys the underlying virtual machine, causing GitHub servers to lose
connection to the runner and mark the job as failed.
enescakir added a commit to ubicloud/ubicloud that referenced this issue Mar 19, 2025
We destroy the runner when GitHub sends the workflow job completed
webhook event.

Recently, the runner script has gotten stuck a few times due to zombie
processes.

This might be related to GitHub incidents, but we don't have enough
information to confirm it.

actions/runner#3737

Since GitHub servers think the script is still running, they don't stop
it.

Customers can force terminate the runner by clicking the button. This
destroys the underlying virtual machine, causing GitHub servers to lose
connection to the runner and mark the job as failed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants