Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lost communication with the server due to the GitHub API returning a BadRequest or Forbidden error #3519

Closed
4 tasks done
Thiry1 opened this issue May 15, 2024 · 7 comments
Closed
4 tasks done
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers

Comments

@Thiry1
Copy link

Thiry1 commented May 15, 2024

Checks

Controller Version

0.8.3

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

The issue occurs randomly, so a specific reproduction method has not been identified.

Describe the bug

The following error is displayed in the GitHub Actions execution log:

The self-hosted runner: runner-bgkcw-runner-rh47x lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.

When checking the runner's pod logs, I found the following two errors:

[RUNNER 2024-05-15 07:40:21Z ERR  GitHubActionsService] GET request to https://pipelinesghubeus10.actions.githubusercontent.com/XXX/_apis/distributedtask/pools/1/messages?sessionId=34d4403f-0a18-48ad-a541-f773c0e60b88&status=Online&runnerVersion=2.316.1&os=Linux&architecture=X64&disableUpdate=true failed. HTTP Status: Forbidden
[RUNNER 2024-05-15 07:40:29Z ERR  GitHubActionsService] POST request to https://pipelinesghubeus10.actions.githubusercontent.com/XXX/_apis/oauth2/token failed. HTTP Status: BadRequest

These errors occur simultaneously in jobs running on different nodes at the same time.

Describe the expected behavior

The job completes without errors.

Additional Context

ghaRunnerScaleSetValues:
  runnerScaleSetName: "XXX"
  githubConfigUrl: "https://github.com/XXX"
  githubConfigSecret: "github-apps-secret"
  maxRunners: 8
  minRunners: 1
  containerMode:
    type: "kubernetes"
    kubernetesModeWorkVolumeClaim:
      accessModes:
        - "ReadWriteOnce"
      storageClassName: "gp3"
      resources:
        requests:
          storage: "10Gi"
    kubernetesModeServiceAccount:
      annotations: null
  template:
    spec:
      initContainers:
        - name: "kube-init"
          image: "XXX"
          command: ["sudo", "chown", "-R", "1001:1001", "/home/runner/_work"]
          volumeMounts:
            - name: "work"
              mountPath: "/home/runner/_work"
      containers:
        - name: "runner"
          image: "XXX"
          command: ["/bin/bash", "-c"]
          args: ["sudo nohup containerd & sudo nohup buildkitd & /home/runner/run.sh"]
          resources:
            requests:
              cpu: 5
              memory: "16Gi"
            limits:
              cpu: 8
              memory: "32Gi"
          env:
            - name: "ACTIONS_RUNNER_CONTAINER_HOOKS"
              value: "/home/runner/k8s/index.js"
            - name: "ACTIONS_RUNNER_POD_NAME"
              valueFrom:
                fieldRef:
                  fieldPath: "metadata.name"
            - name: "ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER"
              value: "false"
          volumeMounts:
            - name: "work"
              mountPath: "/home/runner/_work"
          securityContext:
            privileged: true
      volumes:
        - name: "work"
          ephemeral:
            volumeClaimTemplate:
              spec:
                accessModes:
                  - "ReadWriteOnce"
                storageClassName: "gp3"
                resources:
                  requests:
                    storage: "10Gi"

Controller Logs

https://gist.github.com/Thiry1/ab048d389e8801946dd85ff4b221bffd

Runner Pod Logs

https://gist.github.com/Thiry1/6ab73735da6d1202dd0000243b04c953
@Thiry1 Thiry1 added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels May 15, 2024
Copy link
Contributor

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@cb-krishnapatel
Copy link

cb-krishnapatel commented May 16, 2024

Hi, im also facing similar issue here with error :- Http response code: NotFound from 'POST https://api.github.com/actions/runner-registration' Due to this runner got shutdown all together. Can someone help us out here? Also a speculation I'm using github app for authentication, and the generated token used gets expired in few hrs. Could this be the reason?

@ps78674
Copy link

ps78674 commented May 16, 2024

We have the same problem. After last step in the job, runner sends HTTP DELETE to GH API and (sometimes) gets 403 status.
[RUNNER 2024-05-16 11:51:53Z ERR GitHubActionsService] DELETE request to https://pipelinesghubeus22.actions.githubusercontent.com/XXX/_apis/distributedtask/pools/1/sessions/55495043-e955-4249-88c6-6b68672da670 failed. HTTP Status: Forbidden

@nikola-jokic
Copy link
Member

Hey everyone, are you still experiencing the problem? Maybe it was a temporary error.
@ps78674 the error you posted is definitely a temporary error, and is fixed in 0.9.2 ☺️

@Thiry1
Copy link
Author

Thiry1 commented May 22, 2024

@nikola-jokic

Yes, the error has persisted from the time I created this issue until today. Unfortunately, I migrated to CodeBuild a few hours ago to circumvent this problem. Therefore, it is difficult for me to provide additional information on this issue.

Since many other people do not seem to be experiencing this problem, it might be an issue specific to my environment. Though it may not be relevant, let me explain my configuration: I set up a cluster on AWS EKS, used Karpenter for auto-scaling, and ran ARC runners on EC2 spot instances. Since there are no logs indicating forced termination of spot instances by AWS, it does not seem to be a problem with the spot instances.

Another possible factor might be that I am using ARC for an organization that has a GitHub Enterprise Cloud contract.

@joaoluiznaufel
Copy link

check if the runners have a sidecar or a init container. This can have side effects for the runner when it's try to communicate with github. Init containers can be setup by some operator, for example, dynatrace. :)

@nikola-jokic
Copy link
Member

Since it seems like other components that are not related to ARC are causing this issue, and the forbidden error is fixed in the latest release, I will close this issue. And please submit another issue if you find the root cause or confirm that ARC is causing this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers
Projects
None yet
Development

No branches or pull requests

5 participants