Skip to content

[EKS] [Fargate] [BUG] - Mechanism creating fargate pods do not respect selector within Kubernetes mutatingwebhookconfiguration object #2606

@MindAwakeBodyAsleep

Description

@MindAwakeBodyAsleep

Hello,

Issue description:
In EKS cluster we are deploying opa-gatekeeper with following mutatingwebhookconfiguration:

apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  annotations:
    meta.helm.sh/release-name: gatekeeper
    meta.helm.sh/release-namespace: gatekeeper-system
  creationTimestamp: "2024-10-09T05:14:31Z"
  generation: 18
  labels:
    app: gatekeeper
    app.kubernetes.io/managed-by: Helm
    chart: gatekeeper
    gatekeeper.sh/system: "yes"
    heritage: Helm
    release: gatekeeper
  name: gatekeeper-mutating-webhook-configuration
  resourceVersion: "919873785"
  uid: 6654cbb4-73e7-4858-8182-409db2a597ef
webhooks:
- admissionReviewVersions:
  - v1
  - v1beta1
  clientConfig:
    caBundle: **************************
    service:
      name: gatekeeper-webhook-service
      namespace: gatekeeper-system
      path: /v1/mutate
      port: 443
  failurePolicy: Fail
  matchPolicy: Exact
  name: mutation.gatekeeper.sh
  namespaceSelector:
    matchExpressions:
    - key: admission.gatekeeper.sh/ignore
      operator: DoesNotExist
  objectSelector: {}
  reinvocationPolicy: Never
  rules:
  - apiGroups:
    - '*'
    apiVersions:
    - '*'
    operations:
    - CREATE
    resources:
    - '*'
    scope: '*'
  - apiGroups:
    - '*'
    apiVersions:
    - '*'
    operations:
    - UPDATE
    resources:
    - services
    scope: '*'
  sideEffects: None
  timeoutSeconds: 1

The gatekepeer-system namespace has the label admission.gatekeeper.sh/ignore: no-self-managing.

When opa-gatekeeper fargate profile is deployed using terraform with following configuration:

    opa-gatekeeper = {
      name = "opa-gatekeeper"
      selectors = [
        {
          namespace = "gatekeeper-system"
          labels = {
            "app.kubernetes.io/name" = "gatekeeper"
            "control-plane"          = "controller-manager"
          }
        }
      ]
      timeouts = {
        create = "20m"
        delete = "20m"
      }
      tags = local.common_tags
    }

Problem: When gatekeeper-controller-manager is provisioned as fargate pod and number of replicas is 1 it cannot be provisioned once again.

Fargate node was nominated bot still hangs in Pending state:

kubectl get pods -n gatekeeper-system -w -o wide
NAME                        READY   STATUS    RESTARTS   AGE    IP       NODE     NOMINATED NODE                                READINESS GATES
gatekeeper-controller-manager-6c5c99fb84-tnrc2   0/1     Pending   0          5m4s   <none>   <none>   ee1851c444-15e3d73c35244fc4a90fd176a93f9f8a   <none>

I see following error in kubernetes events:

102s        Warning   FailedScheduling       pod/gatekeeper-controller-manager-6c5c99fb84-tnrc2     Pod provisioning timed out (will retry) for pod: gatekeeper-system/gatekeeper-controller-manager-6c5c99fb84-tnrc2

PS1: Issue does not occur when application is deployed on EC2 instance.
PS2: Issue occurs with other applications provisioned as fargate - tested on karpenter.
PS3: failurePolicy had been set to Fail on purpose - it is business requirement

Metadata

Metadata

Assignees

No one assigned

    Labels

    EKSAmazon Elastic Kubernetes ServiceFargateAWS Fargate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions