Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sidecar-injector.dapr.io admission webhook deny access to AKS nodes #3699

Closed
fraozy opened this issue Sep 21, 2021 · 21 comments
Closed

Sidecar-injector.dapr.io admission webhook deny access to AKS nodes #3699

fraozy opened this issue Sep 21, 2021 · 21 comments
Labels
kind/bug Something isn't working P0
Milestone

Comments

@fraozy
Copy link

fraozy commented Sep 21, 2021

In what area(s)?

/area runtime

What version of Dapr?

1.4.0 (chart: dapr-1.4.0)

Expected Behavior

Using recommended Azure procedure, I should be able to connect to AKS nodes (https://docs.microsoft.com/en-us/azure/aks/ssh).

Actual Behavior

Tried to connect to a worker node of my AKS, using the recommended command (kubectl debug node/ -it --image=mcr.microsoft.com/aks/fundamental/base-ubuntu:v0.0.11) but connection is denied due "sidecar-injector.dapr.io" admission webhook

Steps to Reproduce the Problem

In an AKS (version 1.20.7) with DAPR 1.4.0 installed, try to connect to a node:

kubectl debug node/aks-default-42010746-vmss0000a5 -it --image=mcr.microsoft.com/aks/fundamental/base-ubuntu:v0.0.11
Creating debugging pod node-debugger-aks-default-42010746-vmss0000a5-tpg7z with container debugger on node aks-default-42010746-vmss0000a5.
Error from server: admission webhook "sidecar-injector.dapr.io" denied the request: service account 'masterclient' not on the list of allowed controller accounts

image

I tried to connect at same node after drain it, but got the same issue.

C:\Users\cae7ca>kubectl drain aks-default-42010746-vmss0000a5 --ignore-daemonsets
node/aks-default-42010746-vmss0000a5 already cordoned
WARNING: ignoring DaemonSet-managed Pods: infrastructure/datadog-jm27l, infrastructure/kured-77hb8, kube-system/azure-cni-networkmonitor-4mpz5, kube-system/azure-ip-masq-agent-5kl6x, kube-system/kube-proxy-rcd8n

C:\Users\cae7ca>kubectl get nodes | findstr aks-default-42010746-vmss0000a5
NAME                              STATUS                     ROLES   AGE   VERSION
aks-default-42010746-vmss0000a5   Ready,SchedulingDisabled   agent   26d   v1.20.7

C:\Users\cae7ca>kubectl debug node/aks-default-42010746-vmss0000a5 -it --image=mcr.microsoft.com/aks/fundamental/base-ubuntu:v0.0.11
Creating debugging pod node-debugger-aks-default-42010746-vmss0000a5-6fg9x with container debugger on node aks-default-42010746-vmss0000a5.
Error from server: admission webhook "sidecar-injector.dapr.io" denied the request: service account 'masterclient' not on the list of allowed controller accounts

I tried to find the service account 'masterclient' on all namespaces but it does not exist.

I tested the same command to connect to AKS node in a system with DAPR 1.2.0 (same AKS version 1.20.7 and configuration) and it worked with success.

@fraozy fraozy added the kind/bug Something isn't working label Sep 21, 2021
@daixiang0
Copy link
Member

daixiang0 commented Sep 22, 2021

Similar as #3571

Could upgrade and try this dapr/docs#1799?

@stefanJ-hub
Copy link

by looking into: #3571 (comment)

i could get the impression that 1.3.1 did not have this.
AND milestone has been added for 1.5 (https://github.com/dapr/dapr/milestone/11)

could this also be brought to a 1.4 hotfix?
we do NOT want to go to 1.3 and 1.5 is too far ahead

@SamuelMcAravey
Copy link

We are seeing the same error being triggered by cert-manager trying to verify a TLS endpoint with Let's Encrypt:

admission webhook \"sidecar-injector.dapr.io\" denied the request: service account 'system:serviceaccount:cluster-svcs:cert-manager' not on the list of allowed controller accounts

Reverting to 1.3.1 worked, but this means we can't upgrade to 1.4.0 until this issue is resolved.

@yaron2
Copy link
Member

yaron2 commented Sep 22, 2021

/cc @artursouza @msfussell

This might warrant a 1.4.1 with notes that call to upgrade if this error is encountered.

@yaron2
Copy link
Member

yaron2 commented Sep 22, 2021

@SamuelMcAravey we will release a 1.4.1 hotfix today.

@SamuelMcAravey
Copy link

Great, we'll give it a shot once it's released.

@artursouza
Copy link
Member

I am cutting a hotfix now for this. We will have 1.4.1.

@yaron2
Copy link
Member

yaron2 commented Sep 22, 2021

Great, we'll give it a shot once it's released.

You can try the :edge version now if you want as it contains the fix. Will help us validate.

@artursouza artursouza mentioned this issue Sep 22, 2021
7 tasks
@artursouza artursouza added this to the v1.4 milestone Sep 22, 2021
@SamuelMcAravey
Copy link

Unfortunately we are still seeing that error message. We just upgraded our cluster from 1.3.1 to 'edge' using the 1.4.0 helm chart (values below). We set the tag to 'edge' and imagePullPolicy to 'Always' and upgraded. Here is the output from the describe command on a Dapr container showing we are using the edge version:

...
    State:          Running
      Started:      Wed, 22 Sep 2021 11:23:40 -0700
...
Containers:
  dapr-sidecar-injector:
    Container ID:  containerd://39576c200680864d0caff5aab9ceaa682d16f1660e45d48ed1af3cafccd16255
    Image:         docker.io/daprio/dapr:edge
...

The error we are seeing is from cert-manager when spinning up a new container to verify the HTTP route. That new container is associated with the ASP.NET Core app that is using Dapr, and they are also in different namespaces.

Error:
E0922 19:11:46.210960 1 controller.go:163] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="admission webhook \"sidecar-injector.dapr.io\" denied the request: service account 'system:serviceaccount:cluster-svcs:cert-manager' not on the list of allowed controller accounts" ...

Helm Chart config:

global:
  registry: docker.io/daprio
  tag: "edge"
  dnsSuffix: ".cluster.local"
  logAsJson: false
  imagePullPolicy: Always #IfNotPresent
  imagePullSecrets: ""
  nodeSelector: {}
  ha:
    enabled: true
    replicaCount: 3
    disruption:
      minimumAvailable: ""
      maximumUnavailable: "25%"
  prometheus:
    enabled: true
    port: 9090
  mtls:
    enabled: true
    workloadCertTTL: 24h
    allowedClockSkew: 15m
  daprControlPlaneOs: linux

@yaron2
Copy link
Member

yaron2 commented Sep 22, 2021

We need to understand why the cert manager service account is showing up here.

@fraozy
Copy link
Author

fraozy commented Sep 22, 2021

I also tried here on my end and still the same issue. I tried to follow the steps documented at dapr/docs#1799 as a workaround, but it did not worked.

C:\Users\>kubectl create clusterrolebinding dapr-masterclient --clusterrole=dapr-operator-admin --user masterclient
clusterrolebinding.rbac.authorization.k8s.io/dapr-masterclient created

C:\Users\>kubectl get clusterrolebinding dapr-masterclient -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: "2021-09-22T16:25:04Z"
  managedFields:
  - apiVersion: rbac.authorization.k8s.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:roleRef:
        f:apiGroup: {}
        f:kind: {}
        f:name: {}
      f:subjects: {}
    manager: kubectl-create
    operation: Update
    time: "2021-09-22T16:25:04Z"
  name: dapr-masterclient
  resourceVersion: "62603172"
  uid: e53eab02-bd0a-49f0-83c6-371088eb2944
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: dapr-operator-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: masterclient

C:\Users\>kubectl debug node/aks-default-42010746-vmss0000al -it --image=mcr.microsoft.com/aks/fundamental/base-ubuntu:v0.0.11
Creating debugging pod node-debugger-aks-default-42010746-vmss0000al-wbblw with container debugger on node aks-default-42010746-vmss0000al.
Error from server: admission webhook "sidecar-injector.dapr.io" denied the request: service account 'masterclient' not on the list of allowed controller accounts

I tried to create a serviceaccount called 'masterclient' (the one "sidecar-injector.dapr.io " is complaining) and added it at a clusterrolebinding (binding with clusterrole dapr-operator-admin), but the error continue.

C:\Users\>kubectl -n dapr-system create sa masterclient
serviceaccount/masterclient created

C:\Users\>kubectl edit clusterrolebinding dapr-masterclient -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: "2021-09-22T16:25:04Z"
  managedFields:
  - apiVersion: rbac.authorization.k8s.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:roleRef:
        f:apiGroup: {}
        f:kind: {}
        f:name: {}
    manager: kubectl-create
    operation: Update
    time: "2021-09-22T16:25:04Z"
  - apiVersion: rbac.authorization.k8s.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:subjects: {}
    manager: kubectl-edit
    operation: Update
    time: "2021-09-22T17:02:20Z"
  name: dapr-masterclient
  resourceVersion: "62612193"
  uid: e53eab02-bd0a-49f0-83c6-371088eb2944
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: dapr-operator-admin
subjects:
- kind: ServiceAccount
  name: masterclient
  namespace: dapr-system

C:\Users\>kubectl debug node/aks-default-42010746-vmss0000al -it --image=mcr.microsoft.com/aks/fundamental/base-ubuntu:v0.0.11
Creating debugging pod node-debugger-aks-default-42010746-vmss0000al-z544h with container debugger on node aks-default-42010746-vmss0000al.
Error from server: admission webhook "sidecar-injector.dapr.io" denied the request: service account 'masterclient' not on the list of allowed controller accounts

Tried to move the serviceaccount from dapr-system namespace to default, but no success to fix the issue.

@yaron2
Copy link
Member

yaron2 commented Sep 22, 2021

@SamuelMcAravey I just want to make sure: are sidecars not getting injected, or is the issue just these error messages showing up in the logs?

@SamuelMcAravey
Copy link

No sidecars are being injected with the cert-manager containers. We have it annotated to include dapr.io/enabled: "false" on those just to be sure.

From what I can tell cert-manager makes use of the admission webhooks to spin up the resources needed for cert-manager. It looks like with Dapr 1.4.0 there was a new webhook introduced which is failing when looking at the service account. That appears to explain why we are seeing this issue with cert-manager and @fraozy with AKS.

Also, we rolled back to 1.3.1 and upgraded again to the edge version to be sure, and on 1.3.1 everything works, and on the edge version, we see that error.

@yaron2
Copy link
Member

yaron2 commented Sep 22, 2021

Alright, elevating this to P0 status.

@yaron2 yaron2 added the P0 label Sep 22, 2021
@pkedy
Copy link
Member

pkedy commented Sep 22, 2021

I confirmed that the latest commit to master (bb97511) fixes the issue
image

I confirmed using the following build/deploy steps between master and 0b105361c1ba47910c33bf388506f3f877e53258.

export DAPR_REGISTRY=docker.io/????
export DAPR_TAG=dev
export DAPR_NAMESPACE=dapr-system
export TARGET_OS=linux
export TARGET_ARCH=amd64
export GOOS=linux
export GOARCH=amd64

make && make docker-push

helm install \
dapr --wait --timeout 5m0s \
-n $DAPR_NAMESPACE \
--set dapr_sidecar_injector.sidecarImagePullPolicy=Always \
--set global.imagePullPolicy=Always \
--set global.prometheus.enabled=true \
--set global.ha.enabled=false --set-string global.tag=$DAPR_TAG-$TARGET_OS-$TARGET_ARCH \
--set-string global.registry=$DAPR_REGISTRY --set global.logAsJson=true \
--set global.daprControlPlaneOs=$TARGET_OS --set global.daprControlPlaneArch=$TARGET_ARCH \
--set dapr_placement.logLevel=debug \
--set dapr_placement.cluster.forceInMemoryLog=true ./charts/dapr

export NODENAME=`kubectl get nodes | grep Ready | grep -v Disabled | head -n 1 | cut -d " " -f 1`

kubectl debug node/$NODENAME -it --image=mcr.microsoft.com/aks/fundamental/base-ubuntu:v0.0.11

I also discovered that if I comment this issue out in 1.4.0, it works
https://github.com/dapr/dapr/pull/3503/files#diff-d5342f32a749b172ddd2344afebd0f1a4ccaea7b4cf01b17f5a24e22677e2a96R44

@yaron2
Copy link
Member

yaron2 commented Sep 23, 2021

@SamuelMcAravey the fix is out with 1.4.1.

@lizzzcai
Copy link

Hi @yaron2, I am still facing a similar error on 1.4.1.

error message:

  - lastTransitionTime: '2021-09-24T07:48:19Z'
    message: >-
      failed to create task run pod
      "function-sample-builder-cgmn8-buildrun-nfkb8-nlbjs": admission webhook
      "sidecar-injector.dapr.io" denied the request: service account
      'system:serviceaccount:tekton-pipelines:tekton-pipelines-controller' not
      on the list of allowed controller accounts. Maybe invalid TaskSpec
    reason: CouldntGetTask
    status: 'False'
    type: Succeeded

@yaron2
Copy link
Member

yaron2 commented Sep 24, 2021

Hi @yaron2, I am still facing a similar error on 1.4.1.

error message:

  - lastTransitionTime: '2021-09-24T07:48:19Z'
    message: >-
      failed to create task run pod
      "function-sample-builder-cgmn8-buildrun-nfkb8-nlbjs": admission webhook
      "sidecar-injector.dapr.io" denied the request: service account
      'system:serviceaccount:tekton-pipelines:tekton-pipelines-controller' not
      on the list of allowed controller accounts. Maybe invalid TaskSpec
    reason: CouldntGetTask
    status: 'False'
    type: Succeeded

Thanks for reporting.

/cc @pkedy @wcs1only @artursouza

@yaron2
Copy link
Member

yaron2 commented Sep 24, 2021

Hi @yaron2, I am still facing a similar error on 1.4.1.

error message:

  - lastTransitionTime: '2021-09-24T07:48:19Z'
    message: >-
      failed to create task run pod
      "function-sample-builder-cgmn8-buildrun-nfkb8-nlbjs": admission webhook
      "sidecar-injector.dapr.io" denied the request: service account
      'system:serviceaccount:tekton-pipelines:tekton-pipelines-controller' not
      on the list of allowed controller accounts. Maybe invalid TaskSpec
    reason: CouldntGetTask
    status: 'False'
    type: Succeeded

Is the pod getting created without the Dapr sidecar, or is the pod not created at all?

@artursouza
Copy link
Member

Tracking this as a new issue since the original problem reported was fixed in 1.4.1.

@artursouza
Copy link
Member

@lizzzcai Please, try 1.4.2-rc.1 and let us know if it addressed the issue. Then we will cut 1.4.2 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working P0
Projects
None yet
Development

No branches or pull requests

8 participants