Skip to content

Workflow pod forced on the same node in Kubernetes mode, even with RWX volume #227

Open
@LeonoreMangold

Description

@LeonoreMangold

Checks

Controller Version

0.11.0

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Deploy a runnerset in Kubernetes mode, using a ReadWriteMany volume type for `kubernetesModeWorkVolumeClaim`
2. Set the env variable `ACTIONS_RUNNER_USE_KUBE_SCHEDULER` to "true" on the runner pod to enable separate scheduling of the workflow pod
3. Trigger a workflow using this runner and look at the manifest of the created workflow pod

Describe the bug

In Kubernetes mode, a nodeAffinity is set on the workflow pod so that it gets scheduled on the same node than the runner :

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - <runnerNodeName>

This is not a wanted behavior when we are using a ReadWriteMany volume for the "_work" volume, because we expect the runner and workflow pods to be able to schedule on different nodes depending on their resource requests.

Describe the expected behavior

The workflow pod is handled by the Kubernetes scheduler separately from the runner pod and can be placed on whatever node has enough resources available. The work volume is shared between both pods even on different nodes, thanks to the ReadWriteMany mode.

Additional Context

containerMode:
    type: "kubernetes"
    kubernetesModeWorkVolumeClaim:
      accessModes: ["ReadWriteMany"]
      storageClassName: <azure-file-nfs-sc>
      resources:
        requests:
          storage: 1Ti

  template:
    spec:
      containers:
      - name: runner
        image: <container-image>
        command: ["/home/runner/run.sh"]
        securityContext:
          privileged: true
        env:
        - name: ACTIONS_RUNNER_USE_KUBE_SCHEDULER
          value: "true"

Controller Logs

Not relevant

Runner Pod Logs

Not relevant

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions