Skip to content

ARC should handle OOM killed runners #143

Open
@antoineozenne-at-leocare

Description

Checks

Controller Version

0.8.0

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Deploy a release of `gha-runner-scale-set` with a `ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE` to customize the resource requests and limits of the runner.
2. Run a job in GitHub and getting this runner OOMKilled.

Describe the bug

When the runner is OOMKilled, nothing appends and the pod stays in OOMKilled status. The controller doesn't seem to handle this case, and the job finally times out.

Describe the expected behavior

I think ARC should handle the case the runner is OMMKilled by stopping the job in GitHub with an error status.

Additional Context

kubectl get pods -n arc-runners
# NAME                                                           READY   STATUS      RESTARTS   AGE
# arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m            1/1     Running     0          13h
# arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m-workflow   0/1     OOMKilled   0          136m

Controller Logs

2024-03-04T00:23:29Z	INFO	EphemeralRunnerSet	Created new ephemeral runner	{"ephemeralrunnerset": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l","namespace":"arc-runners"}, "runner": "arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m"}
2024-03-04T00:23:29Z	INFO	EphemeralRunner	Adding runner registration finalizer	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:29Z	INFO	EphemeralRunner	Successfully added runner registration finalizer	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:29Z	INFO	EphemeralRunner	Adding finalizer	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:29Z	INFO	EphemeralRunner	Successfully added finalizer	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:29Z	INFO	EphemeralRunner	Adding finalizer	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:29Z	INFO	EphemeralRunner	Successfully added finalizer	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:29Z	INFO	EphemeralRunner	Creating new ephemeral runner registration and updating status with runner config	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:29Z	INFO	EphemeralRunner	Creating ephemeral runner JIT config	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:31Z	INFO	EphemeralRunner	Created ephemeral runner JIT config	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}, "runnerId": 5715}
2024-03-04T00:23:31Z	INFO	EphemeralRunner	Updating ephemeral runner status with runnerId and runnerJITConfig	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:31Z	INFO	EphemeralRunner	Updated ephemeral runner status with runnerId and runnerJITConfig	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:31Z	INFO	EphemeralRunner	Creating new ephemeral runner secret for jitconfig.	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:31Z	INFO	EphemeralRunner	Creating new secret for ephemeral runner	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:31Z	INFO	EphemeralRunner	Created new secret spec for ephemeral runner	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:31Z	INFO	EphemeralRunner	Created ephemeral runner secret	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}, "secretName": "arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m"}
2024-03-04T00:23:31Z	INFO	EphemeralRunner	Creating new EphemeralRunner pod.	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:31Z	INFO	EphemeralRunner	Creating new pod for ephemeral runner	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:31Z	INFO	EphemeralRunner	Created new pod spec for ephemeral runner	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:31Z	INFO	EphemeralRunner	Created ephemeral runner pod	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}, "runnerScaleSetId": 9, "runnerName": "arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m", "runnerId": 5715, "configUrl": "https://github.com/XXX", "podName": "arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m"}
2024-03-04T00:23:31Z	INFO	EphemeralRunner	Waiting for runner container status to be available	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:31Z	INFO	EphemeralRunner	Waiting for runner container status to be available	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:59Z	INFO	EphemeralRunner	Waiting for runner container status to be available	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:59Z	INFO	EphemeralRunner	Ephemeral runner container is still running	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:59Z	INFO	EphemeralRunner	Updating ephemeral runner status with pod phase	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}, "phase": "Pending", "reason": "", "message": ""}
2024-03-04T00:23:59Z	INFO	EphemeralRunner	Updated ephemeral runner status with pod phase	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:23:59Z	INFO	EphemeralRunner	Ephemeral runner container is still running	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:24:13Z	INFO	EphemeralRunner	Ephemeral runner container is still running	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:24:13Z	INFO	EphemeralRunner	Updating ephemeral runner status with pod phase	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}, "phase": "Running", "reason": "", "message": ""}
2024-03-04T00:24:13Z	INFO	EphemeralRunner	Updated ephemeral runner status with pod phase	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T00:24:13Z	INFO	EphemeralRunner	Ephemeral runner container is still running	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}
2024-03-04T11:27:43Z	INFO	EphemeralRunner	Ephemeral runner container is still running	{"ephemeralrunner": {"name":"arc-runner-set-aks-stg-fc-001-at-f2g6l-runner-59x5m","namespace":"arc-runners"}}

Runner Pod Logs

...
[WORKER 2024-03-04 13:49:04Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2024-03-04 13:49:04Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2024-03-04 13:49:04Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[RUNNER 2024-03-04 13:49:14Z INFO JobDispatcher] Successfully renew job request 93068, job is valid till 03/04/2024 13:59:14
[WORKER 2024-03-04 13:49:14Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2024-03-04 13:49:14Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2024-03-04 13:49:14Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
...

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingk8s

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions