Open
Description
Checks
- I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- I am using charts that are officially provided
Controller Version
0.10.1
Deployment Method
Helm
Checks
- This isn't a question or user support case (For Q&A and community support, go to Discussions).
- I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
1. Configure dind runners with a custom template configuration like this:
metadata:
annotations:
karpenter.sh/do-not-disrupt: "true"
spec:
tolerations:
- key: example.com/distribution
value: bottlerocket
effect: NoSchedule
nodeSelector:
example.com/distribution: "bottlerocket"
initContainers:
- name: init-dind-externals
image: docker.private.example.com/private-actions-runnerset-runner:v1.2.1
command:
["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"]
volumeMounts:
- name: dind-externals
mountPath: /home/runner/tmpDir
- name: dind
image: docker.private.example.com/docker:dind
args:
- dockerd
- --host=unix:///run/docker/docker.sock
- --group=$(DOCKER_GROUP_GID)
env:
- name: DOCKER_GROUP_GID
value: "123"
securityContext:
privileged: true
restartPolicy: Always
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /run/docker
- name: dind-externals
mountPath: /home/runner/externals
containers:
- name: runner
image: docker.private.example.com/private-actions-runnerset-runner:v1.2.1
command: ["/home/runner/run.sh"]
env:
- name: DOCKER_HOST
value: unix:///run/docker/docker.sock
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /run/docker
readOnly: true
resources:
requests:
cpu: '{{`{{ .resources.runner.requests.cpu | default "1" }}`}}'
memory: '{{`{{ .resources.runner.requests.memory| default "2G" }}`}}'
limits:
cpu: '{{`{{ .resources.runner.limits.cpu | default "2" }}`}}'
memory: '{{`{{ .resources.runner.limits.memory| default "4G" }}`}}'
volumes:
- name: work
emptyDir: {}
- name: dind-sock
emptyDir: {}
- name: dind-externals
emptyDir: {}
githubServerTLS:
certificateFrom:
configMapKeyRef:
name: mw-root
key: mw-root.crt
runnerMountPath: /usr/local/share/ca-certificates/
2. Use a custom docker image with a base image of ghcr.io/actions/actions-runner:2.322.0
3. Create two workflows that run the same steps. One triggers on branch push and the other triggers on a pull request. When opening the pull request, both should run in the same PR.
4. One workflow will succeed and one will fail.
Describe the bug
One check fails with the message:
"lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error."
The other completes successfully.
Describe the expected behavior
The expected behavior is that both checks should succeed.
Additional Context
releaseName: '{{`{{.name}}`}}'
parameters:
- name: "githubConfigSecret"
value: "github-token"
- name: "githubConfigUrl"
value: 'https://github.example.com/{{`{{.repo}}`}}'
valuesObject:
minRunners: '{{`{{ .minRunners | default "0" }}`}}'
controllerServiceAccount:
name: arc-gha-rs-controller
namespace: arc-systems
template:
metadata:
annotations:
karpenter.sh/do-not-disrupt: "true"
spec:
tolerations:
- key: example.com/distribution
value: bottlerocket
effect: NoSchedule
nodeSelector:
example.com/distribution: "bottlerocket"
initContainers:
- name: init-dind-externals
image: exp-docker.repositories.example.com/custom-actions-runnerset-runner:v1.2.1
command:
["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"]
volumeMounts:
- name: dind-externals
mountPath: /home/runner/tmpDir
- name: dind
image: exp-docker.repositories.example.com/docker:dind
args:
- dockerd
- --host=unix:///run/docker/docker.sock
- --group=$(DOCKER_GROUP_GID)
env:
- name: DOCKER_GROUP_GID
value: "123"
securityContext:
privileged: true
restartPolicy: Always
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /run/docker
- name: dind-externals
mountPath: /home/runner/externals
containers:
- name: runner
image: exp-docker.repositories.example.com/custom-actions-runnerset-runner:v1.2.1
command: ["/home/runner/run.sh"]
env:
- name: DOCKER_HOST
value: unix:///run/docker/docker.sock
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /run/docker
readOnly: true
resources:
requests:
cpu: '{{`{{ .resources.runner.requests.cpu | default "1" }}`}}'
memory: '{{`{{ .resources.runner.requests.memory| default "2G" }}`}}'
limits:
cpu: '{{`{{ .resources.runner.limits.cpu | default "2" }}`}}'
memory: '{{`{{ .resources.runner.limits.memory| default "4G" }}`}}'
volumes:
- name: work
emptyDir: {}
- name: dind-sock
emptyDir: {}
- name: dind-externals
emptyDir: {}
githubServerTLS:
certificateFrom:
configMapKeyRef:
name: exp-root
key: exp-root.crt
runnerMountPath: /usr/local/share/ca-certificates/
The placeholders are because I am using an Argo CD Application Set to create a runner set for one of several repositories.
Controller Logs
The logs in this gist correspond to a window one minute before the failure of the jobs in question to one minute afterward. The repository is terraform-aws-karpenter and the runner is terraform-aws-karpenter-wrgr9-runner-ndhr7
https://gist.github.com/jwilkicki/c555b6eea0df28e30320f96a72ce1a79
Runner Pod Logs
Logs for entire runner terraform-aws-karpenter-wrgr9-runner-ndhr7 https://gist.github.com/jwilkicki/1c9e2aecf75b579c1890f5bd00a64101