Open
Description
Checks
- I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- I am using charts that are officially provided
Controller Version
0.12.1
Deployment Method
Helm
Checks
- This isn't a question or user support case (For Q&A and community support, go to Discussions).
- I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
1. We are seeing this issue when our cluster node evicts the pod due to resource pressure on it.
2. Runner pods failed with "Pod was rejected: Node didn't have enough resources: pods, requested: 1, used: 16, capacity: 16" as the node pool doesn't have enough resources.
3. Those failed runner pods hang with Pending runners in EphemeralRunnerSet/EphemeralRunner.
Describe the bug
The runner gets stuck in the "Failed" state for an indefinite time, failed during node pool scaling:


Describe the expected behavior
It should be cleared from AutoscalingrunnerSet/EphemeralRunnerSet/EphemeralRunner so offline runners will also be removed from github UI.
Additional Context
None
Controller Logs
None
Runner Pod Logs
None