You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This isn't a question or user support case (For Q&A and community support, go to Discussions).
I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
1. Use k8s Cluster Autoscaler
2. Create ScaleSet using Kubernetes mode
3. Run a docker container-based action
3. The cluster must not have the capacity to schedule the action's job pod
Describe the bug
When the k8s job pod tries to run, the k8s cannot find a node to schedule and throws the following event/error: Node didn't have enough resource: cpu, requested: 2000, used: 13920, capacity: 15890
The K8S Job has the following error on it: Job has reached the specified backoff limit
This causes the Actions job to fail
Describe the expected behavior
Job pod should wait for new nodes to come online to schedule (average: 45 seconds)
My employer's open source contribution policy prohibits me from posting this information in public, however i can post relevant redacted portions upon request
Runner Pod Logs
My employer's open source contribution policy prohibits me from posting this information in public, however i can post relevant redacted portions upon request.
The text was updated successfully, but these errors were encountered:
rteeling-evernorth
changed the title
Docker Container Action Jobs failing to schedule
Docker Container Action Jobs failing to schedule on autoscaled cluster
Mar 1, 2024
This issue is related to the hook ☺️. Are you using default hook implementation in your container mirror? If so, job schedules the pod to run on the same node where the runner is. If so, the problem is with the node capacity, not with the scheduler. By default, we are skipping the scheduler so we can use the volume mount from the runner pod. This can be avoided in case you use ReadWriteMany volumes, but would require you to configure envs appropriately.
Ah! That would explain it. Everything in my mirror is off-the-shelf for 0.8.2. I was using the default volume mount in the values file which is ReadWriteOnce. This would compel the behavior I am seeing. Thank you so much for the info!
Checks
Controller Version
0.7.0,0.8.2
Deployment Method
ArgoCD
Checks
To Reproduce
Describe the bug
When the k8s job pod tries to run, the k8s cannot find a node to schedule and throws the following event/error:
Node didn't have enough resource: cpu, requested: 2000, used: 13920, capacity: 15890
The K8S Job has the following error on it:
Job has reached the specified backoff limit
This causes the Actions job to fail
Describe the expected behavior
Job pod should wait for new nodes to come online to schedule (average: 45 seconds)
Additional Context
Controller Logs
My employer's open source contribution policy prohibits me from posting this information in public, however i can post relevant redacted portions upon request
Runner Pod Logs
My employer's open source contribution policy prohibits me from posting this information in public, however i can post relevant redacted portions upon request.
The text was updated successfully, but these errors were encountered: