Skip to content

Using hostPID: true + nsenter in argoexec causes host processes (e.g. sshd, systemd-resolved) to be killed when a Pod is deleted #14489

@hywell-h

Description

@hywell-h

Pre-requisites

  • I have double-checked my configuration
  • I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened? What did you expect to happen?

Our workflow executor (the “emissary”/argoexec container) runs with hostPID: true. Inside the container, we launch host-level commands by calling something like:
`nsenter --target 1 --mount --uts --ipc --net --pid -- /usr/bin/some-binary …
The intention is to execute operations directly on the node’s PID namespace. However, as soon as Kubernetes deletes the Pod, the node’s critical processes (e.g. sshd, systemd-resolved) also receive SIGTERM/SIGKILL and get shut down.

Version(s)

v3.6.5

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflow that uses private images.

apiVersion: v1
kind: Pod
metadata:
  name: argoexec-ssh-mp-test02
spec:
  containers:
  - image: 10.10.16.36:31373/ubuntu-argoexec:latest
    imagePullPolicy: IfNotPresent
    name: main
    command:
    - /usr/bin/argoexec
    - emissary
    - --loglevel
    - warning
    - --log-format
    - text
    - --gloglevel
    - "0"
    - --
    - nsenter
    - -t
    - "1"
    - -m
    - -u
    - -i
    - -n
    - -p
    - --
    - bash
    - -c
    args:
    - sleep infinity
    env:
    - name: ARGO_POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: ARGO_POD_UID
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.uid
    - name: ARGO_TEMPLATE
      value: '{}'
    resources: {}
    securityContext:
      privileged: true
    volumeMounts:
    - name: host
      mountPath: /host
  dnsPolicy: ClusterFirstWithHostNet
  enableServiceLinks: true
  hostPID: true
  nodeName: mp-test02
  restartPolicy: Never
  schedulerName: default-scheduler
  volumes:
  - name: host
    hostPath:
      path: /

Logs from the workflow controller

-

Logs from in your workflow's wait container

-

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions