Skip to content

Concurrent checks running for a PR result in one passing and losing communication with the server #3983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
4 tasks done
jwilkicki opened this issue Mar 21, 2025 · 1 comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers

Comments

@jwilkicki
Copy link

jwilkicki commented Mar 21, 2025

Checks

Controller Version

0.10.1

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Configure dind runners with a custom template configuration like this:
   
              metadata:
                annotations:
                  karpenter.sh/do-not-disrupt: "true"              
              spec:                
                tolerations:
                  - key: example.com/distribution
                    value: bottlerocket
                    effect: NoSchedule
                nodeSelector:
                    example.com/distribution: "bottlerocket"
                initContainers:
                  - name: init-dind-externals
                    image: docker.private.example.com/private-actions-runnerset-runner:v1.2.1
                    command:
                      ["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"]
                    volumeMounts:
                      - name: dind-externals
                        mountPath: /home/runner/tmpDir
                  - name: dind
                    image: docker.private.example.com/docker:dind
                    args:
                      - dockerd
                      - --host=unix:///run/docker/docker.sock
                      - --group=$(DOCKER_GROUP_GID)
                    env:
                      - name: DOCKER_GROUP_GID
                        value: "123"
                    securityContext:
                      privileged: true
                    restartPolicy: Always
                    volumeMounts:
                      - name: work
                        mountPath: /home/runner/_work
                      - name: dind-sock
                        mountPath: /run/docker
                      - name: dind-externals
                        mountPath: /home/runner/externals                   
                containers:
                  - name: runner
                    image: docker.private.example.com/private-actions-runnerset-runner:v1.2.1
                    command: ["/home/runner/run.sh"]
                    env:
                      - name: DOCKER_HOST
                        value: unix:///run/docker/docker.sock
                    volumeMounts:
                      - name: work
                        mountPath: /home/runner/_work
                      - name: dind-sock
                        mountPath: /run/docker                        
                        readOnly: true
                    resources:
                      requests:
                        cpu: '{{`{{ .resources.runner.requests.cpu | default "1" }}`}}'
                        memory: '{{`{{ .resources.runner.requests.memory| default "2G" }}`}}'
                      limits:
                        cpu: '{{`{{ .resources.runner.limits.cpu | default "2" }}`}}'
                        memory: '{{`{{ .resources.runner.limits.memory| default "4G" }}`}}'                  
                volumes:
                  - name: work
                    emptyDir: {}
                  - name: dind-sock
                    emptyDir: {}
                  - name: dind-externals
                    emptyDir: {}
            githubServerTLS:
              certificateFrom:
                configMapKeyRef:
                  name: mw-root
                  key: mw-root.crt
              runnerMountPath: /usr/local/share/ca-certificates/
   
2. Use a custom docker image with a base image of ghcr.io/actions/actions-runner:2.322.0
3. Create two workflows that run the same steps. One triggers on branch push and the other triggers on a pull request.  When opening the pull request, both should run in the same PR.
4. One workflow will succeed and one will fail.

Describe the bug

One check fails with the message:

"lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error."

The other completes successfully.

Describe the expected behavior

The expected behavior is that both checks should succeed.

Additional Context

releaseName: '{{`{{.name}}`}}'
          parameters:
            - name: "githubConfigSecret"
              value: "github-token"
            - name: "githubConfigUrl"
              value: 'https://github.example.com/{{`{{.repo}}`}}'
          valuesObject:
            minRunners: '{{`{{ .minRunners | default "0" }}`}}'
            controllerServiceAccount:
              name: arc-gha-rs-controller
              namespace: arc-systems
            template:
              metadata:
                annotations:
                  karpenter.sh/do-not-disrupt: "true"              
              spec:                
                tolerations:
                  - key: example.com/distribution
                    value: bottlerocket
                    effect: NoSchedule
                nodeSelector:
                    example.com/distribution: "bottlerocket"
                initContainers:
                  - name: init-dind-externals
                    image: exp-docker.repositories.example.com/custom-actions-runnerset-runner:v1.2.1
                    command:
                      ["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"]
                    volumeMounts:
                      - name: dind-externals
                        mountPath: /home/runner/tmpDir
                  - name: dind
                    image: exp-docker.repositories.example.com/docker:dind
                    args:
                      - dockerd
                      - --host=unix:///run/docker/docker.sock
                      - --group=$(DOCKER_GROUP_GID)
                    env:
                      - name: DOCKER_GROUP_GID
                        value: "123"
                    securityContext:
                      privileged: true
                    restartPolicy: Always
                    volumeMounts:
                      - name: work
                        mountPath: /home/runner/_work
                      - name: dind-sock
                        mountPath: /run/docker
                      - name: dind-externals
                        mountPath: /home/runner/externals                   
                containers:
                  - name: runner
                    image: exp-docker.repositories.example.com/custom-actions-runnerset-runner:v1.2.1
                    command: ["/home/runner/run.sh"]
                    env:
                      - name: DOCKER_HOST
                        value: unix:///run/docker/docker.sock
                    volumeMounts:
                      - name: work
                        mountPath: /home/runner/_work
                      - name: dind-sock
                        mountPath: /run/docker                        
                        readOnly: true
                    resources:
                      requests:
                        cpu: '{{`{{ .resources.runner.requests.cpu | default "1" }}`}}'
                        memory: '{{`{{ .resources.runner.requests.memory| default "2G" }}`}}'
                      limits:
                        cpu: '{{`{{ .resources.runner.limits.cpu | default "2" }}`}}'
                        memory: '{{`{{ .resources.runner.limits.memory| default "4G" }}`}}'                  
                volumes:
                  - name: work
                    emptyDir: {}
                  - name: dind-sock
                    emptyDir: {}
                  - name: dind-externals
                    emptyDir: {}
            githubServerTLS:
              certificateFrom:
                configMapKeyRef:
                  name: exp-root
                  key: exp-root.crt
              runnerMountPath: /usr/local/share/ca-certificates/


The placeholders are because I am using an Argo CD Application Set to create a runner set for one of several repositories.

Controller Logs

The logs in this gist correspond to a window one minute before the failure of the jobs in question to one minute afterward.  The repository is terraform-aws-karpenter and the runner is terraform-aws-karpenter-wrgr9-runner-ndhr7

https://gist.github.com/jwilkicki/c555b6eea0df28e30320f96a72ce1a79

Runner Pod Logs

Logs for entire runner terraform-aws-karpenter-wrgr9-runner-ndhr7 https://gist.github.com/jwilkicki/1c9e2aecf75b579c1890f5bd00a64101
@jwilkicki jwilkicki added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Mar 21, 2025
@kriangkraiw-tw
Copy link

Same issue, tried bumping up to version 0.11.0 but still not solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers
Projects
None yet
Development

No branches or pull requests

2 participants