Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARC in k8s mode not working with ResourceQuotas. Jobs fail instead of queuing. #3630

Open
4 tasks done
ropelli opened this issue Jul 1, 2024 · 1 comment
Open
4 tasks done
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers

Comments

@ropelli
Copy link

ropelli commented Jul 1, 2024

Checks

Controller Version

0.9.3

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

In k8s mode using the container hook template, you need to request more cpu,memory,storage etc. than the resource quota. For example:

  1. Define resource quota with hard limit for cpus/memory e.g. 8 cpus
apiVersion: v1
kind: ResourceQuota
metadata:
  name: arc-runners-quota
  namespace: arc-runners
spec:
  hard:
    requests.cpu: "8"
  1. Set up ARC with autoscalingrunnerset k8s mode with a container hook template
apiVersion: actions.github.com/v1alpha1
kind: AutoscalingRunnerSet
metadata:
  name: self-hosted-k8s
  namespace: arc-runners
spec:
...
  template:
    spec:
      containers:
      - name: runner
        resources:
          requests:
            cpu: "0"
        env:
        ...
        - name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
          value: /home/runner/workflow-pod-config/workflow-pod-config.yaml
        ...
        volumeMounts:
        ...
        - mountPath: /home/runner/workflow-pod-config
          name: workflow-pod-config
      volumes:
      ...
      - name: workflow-pod-config
        configMap:
          name: workflow-pod-config
          items:
            - key: workflow-pod-config.yaml
              path: workflow-pod-config.yaml
...
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: workflow-pod-config
  namespace: arc-runners
data:
  workflow-pod-config.yaml: |
    apiVersion: v1
    kind: PodTemplate
    metadata:
      labels:
        app: runner-pod-template
    spec:
      serviceAccountName: kube-mode-workflow
      containers:
      - name: $job
        resources:
          requests:
            cpu: "5"
  1. Run a workflow with two jobs matching the autoscalingunnerset name
push:
  branches: [ main ]
jobs:
  job1:
    runs-on: self-hosted-k8s
    container:
      image: ubuntu:22.04
    steps:
    - run: sleep 60
  job2:
    runs-on: self-hosted-k8s
    container:
      image: ubuntu:22.04
  - run: sleep 60

You can also do this with a single job that goes over the resource quota. But above is more likely scenario.

Describe the bug

In the example provided, one job will run successfully, the other will fail when trying to create the workflow pod as the resource quota is temporarily being exceeded.

In general, jobs fail due to quota being temporarily being exceeded.
image

Describe the expected behavior

In the example provided, one job should run at a time and queue properly and complete one after the other. Leading to a successful build.

In general, when quota is temporarily exceeded, we should try again after a while preferably through a queue implementation. Though it is also bit stupid that the runner starts even when there is no room for the workflow pod but I am not sure what could be done without changing the architecture of ARC a lot.

Additional Context

This can be considered separate issue than #3629 as I believe the code needs to change in a different spot and it behaves a little bit differently. This is why I created two issues.

Controller Logs

https://gist.github.com/ropelli/2260d7303e4a09c75170105ee1afdaac

Runner Pod Logs

https://gist.github.com/ropelli/8d8ea405ad54128c24e42292a3aaeb09
@ropelli ropelli added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Jul 1, 2024
@jonathan-fileread
Copy link

Thanks @ropelli for mentioning this. Very curious on a fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers
Projects
None yet
Development

No branches or pull requests

2 participants