-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sidecar-injector.dapr.io admission webhook deny access to AKS nodes #3699
Comments
Similar as #3571 Could upgrade and try this dapr/docs#1799? |
by looking into: #3571 (comment) i could get the impression that 1.3.1 did not have this. could this also be brought to a 1.4 hotfix? |
We are seeing the same error being triggered by cert-manager trying to verify a TLS endpoint with Let's Encrypt:
Reverting to 1.3.1 worked, but this means we can't upgrade to 1.4.0 until this issue is resolved. |
This might warrant a 1.4.1 with notes that call to upgrade if this error is encountered. |
@SamuelMcAravey we will release a 1.4.1 hotfix today. |
Great, we'll give it a shot once it's released. |
I am cutting a hotfix now for this. We will have 1.4.1. |
You can try the :edge version now if you want as it contains the fix. Will help us validate. |
Unfortunately we are still seeing that error message. We just upgraded our cluster from 1.3.1 to 'edge' using the 1.4.0 helm chart (values below). We set the tag to 'edge' and imagePullPolicy to 'Always' and upgraded. Here is the output from the describe command on a Dapr container showing we are using the edge version:
The error we are seeing is from cert-manager when spinning up a new container to verify the HTTP route. That new container is associated with the ASP.NET Core app that is using Dapr, and they are also in different namespaces. Error: Helm Chart config: global:
registry: docker.io/daprio
tag: "edge"
dnsSuffix: ".cluster.local"
logAsJson: false
imagePullPolicy: Always #IfNotPresent
imagePullSecrets: ""
nodeSelector: {}
ha:
enabled: true
replicaCount: 3
disruption:
minimumAvailable: ""
maximumUnavailable: "25%"
prometheus:
enabled: true
port: 9090
mtls:
enabled: true
workloadCertTTL: 24h
allowedClockSkew: 15m
daprControlPlaneOs: linux |
We need to understand why the cert manager service account is showing up here. |
I also tried here on my end and still the same issue. I tried to follow the steps documented at dapr/docs#1799 as a workaround, but it did not worked.
I tried to create a serviceaccount called 'masterclient' (the one "sidecar-injector.dapr.io " is complaining) and added it at a clusterrolebinding (binding with clusterrole dapr-operator-admin), but the error continue.
Tried to move the serviceaccount from dapr-system namespace to default, but no success to fix the issue. |
@SamuelMcAravey I just want to make sure: are sidecars not getting injected, or is the issue just these error messages showing up in the logs? |
No sidecars are being injected with the cert-manager containers. We have it annotated to include From what I can tell cert-manager makes use of the admission webhooks to spin up the resources needed for cert-manager. It looks like with Dapr 1.4.0 there was a new webhook introduced which is failing when looking at the service account. That appears to explain why we are seeing this issue with cert-manager and @fraozy with AKS. Also, we rolled back to 1.3.1 and upgraded again to the edge version to be sure, and on 1.3.1 everything works, and on the edge version, we see that error. |
Alright, elevating this to P0 status. |
I confirmed that the latest commit to I confirmed using the following build/deploy steps between export DAPR_REGISTRY=docker.io/????
export DAPR_TAG=dev
export DAPR_NAMESPACE=dapr-system
export TARGET_OS=linux
export TARGET_ARCH=amd64
export GOOS=linux
export GOARCH=amd64
make && make docker-push
helm install \
dapr --wait --timeout 5m0s \
-n $DAPR_NAMESPACE \
--set dapr_sidecar_injector.sidecarImagePullPolicy=Always \
--set global.imagePullPolicy=Always \
--set global.prometheus.enabled=true \
--set global.ha.enabled=false --set-string global.tag=$DAPR_TAG-$TARGET_OS-$TARGET_ARCH \
--set-string global.registry=$DAPR_REGISTRY --set global.logAsJson=true \
--set global.daprControlPlaneOs=$TARGET_OS --set global.daprControlPlaneArch=$TARGET_ARCH \
--set dapr_placement.logLevel=debug \
--set dapr_placement.cluster.forceInMemoryLog=true ./charts/dapr
export NODENAME=`kubectl get nodes | grep Ready | grep -v Disabled | head -n 1 | cut -d " " -f 1`
kubectl debug node/$NODENAME -it --image=mcr.microsoft.com/aks/fundamental/base-ubuntu:v0.0.11 I also discovered that if I comment this issue out in 1.4.0, it works |
@SamuelMcAravey the fix is out with |
Hi @yaron2, I am still facing a similar error on error message:
|
Thanks for reporting. |
Is the pod getting created without the Dapr sidecar, or is the pod not created at all? |
Tracking this as a new issue since the original problem reported was fixed in 1.4.1. |
@lizzzcai Please, try 1.4.2-rc.1 and let us know if it addressed the issue. Then we will cut 1.4.2 release. |
In what area(s)?
/area runtime
What version of Dapr?
1.4.0 (chart: dapr-1.4.0)
Expected Behavior
Using recommended Azure procedure, I should be able to connect to AKS nodes (https://docs.microsoft.com/en-us/azure/aks/ssh).
Actual Behavior
Tried to connect to a worker node of my AKS, using the recommended command (kubectl debug node/ -it --image=mcr.microsoft.com/aks/fundamental/base-ubuntu:v0.0.11) but connection is denied due "sidecar-injector.dapr.io" admission webhook
Steps to Reproduce the Problem
In an AKS (version 1.20.7) with DAPR 1.4.0 installed, try to connect to a node:
I tried to connect at same node after drain it, but got the same issue.
I tried to find the service account 'masterclient' on all namespaces but it does not exist.
I tested the same command to connect to AKS node in a system with DAPR 1.2.0 (same AKS version 1.20.7 and configuration) and it worked with success.
The text was updated successfully, but these errors were encountered: