Skip to content

Runner to workflow pods take 3 minutes to start on RWX & containerMode: Kubernetes #207

Open
@jonathan-fileread

Description

@jonathan-fileread

Checks

Controller Version

0.9.3

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

Setup arc runner scaleset with containerMode: Kubernetes
Use an NFS based storageclass to back the nodes
build a docker image via GHA using kaniko

Describe the bug

After initializing the runner pod (which is fairly immediate) - the github actions jobs (6 of them) seems to get stuck polling for 2-3 minutes waiting to spin up the workflow pod to continue the github action job.

The runner pod logs show every 5-10 seconds there is a job that polls for 2-3 minutes before the container hook is called and the workflow pod is spun up.

See Line 6-52 in the scaleset logs gist below, you'll see this line get called every few seconds.
[WORKER 2024-12-03 19:21:58Z INFO HostContext] Well known directory 'Root': '/home/runner'

This bug started occuring when we switched to RWX, new storage class using NFS based azure files. I suppose it might be the slowness to provision a PVC using azure files versus traditional disk based setup on RWO

Describe the expected behavior

After initializing the runner pod on new github actions job- the workflow pods should spin up near immediately to process the docker builds from each GHA job.

Additional Context

Here is the arc runner scaleset code
   initContainers:
      - name: kube-init
        image: ghcr.io/actions/actions-runner:latest
        command: ["/bin/sh", "-c"]
        args:
          - |
            sudo chown -R ${local.github_runner_user_gid}:123 /home/runner/_work
        volumeMounts:
          - name: work
            mountPath: /home/runner/_work
    securityContext:
      fsGroup: 123 ## needed to resolve permission issues with mounted volume. https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors#error-access-to-the-path-homerunner_work_tool-is-denied

    containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        command: ["/home/runner/run.sh"]
        env:
        - name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
          value: /home/runner/pod-templates/default.yml
        - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
          value: "false"  ## To allow jobs without a job container to run, set ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER to false on your runner container. This instructs the runner to disable this check.
        - name: ACTIONS_RUNNER_USE_KUBE_SCHEDULER     # Flag enables separate scheduling for worker pods
          value: "true"
        volumeMounts:
        - name: pod-templates
          mountPath: /home/runner/pod-templates
          readOnly: true
    volumes:
      - name: work
        ephemeral:
          volumeClaimTemplate:
            spec:
              accessModes: ["ReadWriteMany"]
              storageClassName: ${local.storage_class_name}
              resources:
                requests:
                  storage: ${local.volume_claim_size}
      - name: pod-templates
        configMap:
          name: "runner-pod-template"


containerMode:
  type: "kubernetes"  ## type can be set to dind or kubernetes
  ## the following is required when containerMode.type=kubernetes
  kubernetesModeWorkVolumeClaim:
    accessModes: ["ReadWriteMany"]
    storageClassName: ${local.storage_class_name}
    resources:
      requests:
        storage: ${local.volume_claim_size}
    EOF
  ]
}

locals {
  job_template_name = "runner-pod-template"
}

resource "kubernetes_config_map" "job_template" {
  metadata {
    name      = local.job_template_name
    namespace = local.gha_runner_namespace
  }
  data = {
    "default.yml" = yamlencode({
      apiVersion = "v1"
      kind       = "PodTemplate"
      metadata = {
        name = local.job_template_name
      }
      spec = {
        containers = [
          {
            name  = "$job"
            resources = {
              requests = {
                cpu = "3000m"
              }
              limits = {
                cpu = "3000m"
              }
            }
          }
        ]
      }
    })
  }
}



# GHA job
          /kaniko/executor --dockerfile=".Dockerfilehere" \
            --context="${{ github.repositoryUrl }}#${{ github.ref }}#${{ github.sha }}"  \
            --destination="randomcontainerregistry:taghere" \
            --use-new-run \
            --snapshot-mode=redo \
            --compressed-caching=false \
            --registry-mirror=mirror.gcr.io \
            --cache=true --cache-copy-layers=false --cache-ttl=500h \
            --push-retry 5
 # Storage class

resource "kubernetes_manifest" "csi_storage_class" {
  manifest = {
    apiVersion = "storage.k8s.io/v1"
    kind       = "StorageClass"
    metadata = {
      name = "storageclassawesome"
    }
    provisioner      = "file.csi.azure.com"
    allowVolumeExpansion = true
    parameters = {
      resourceGroup  = "yup"
      storageAccount = "yup"
      skuName        = "Premium_LRS"
      location       = "sdfsf"
      server         = "test.net"
    }
    reclaimPolicy      = "Delete"
    volumeBindingMode  = "Immediate"
    mountOptions       = [
      "dir_mode=0777",
      "file_mode=0777",
      "uid=1000",
      "gid=1000",
      "mfsymlinks",
      "cache=strict",
      "nosharesock",
      "actimeo=30"
    ]

Controller Logs

ARC Controller & Scaleset Logs: https://gist.github.com/jonathan-fileread/fd0978bef66784e20d6b50bce50cd3b9

Runner Pod Logs

ARC Controller & Scaleset Logs: https://gist.github.com/jonathan-fileread/fd0978bef66784e20d6b50bce50cd3b9

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions