Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error from server (InternalError): Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": No agent available #6457

Open
geekette86 opened this issue Oct 30, 2023 · 5 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@geekette86
Copy link

Env: PRIVATE GKE
Error : Error from server (InternalError): Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": No agent available
It is not an firewall issue , i test it ,
the no agent available it is what i can't understand , even if i try to forward the webhook servcei i get same error

@geekette86
Copy link
Author

downgrade from 1.13.1 to 1.11.1 solved the problem

@wallrj
Copy link
Member

wallrj commented Nov 10, 2023

The only reference I can find to the error message "No agent available" is in konnectivity / apiserver-network-proxy:

https://github.com/kubernetes-sigs/apiserver-network-proxy/blob/1a6a315c087281ce026a2821397c6a4b546f1d67/pkg/server/backend_manager.go#L358

My guess is that GKE is using that to allow their Kubernetes API server (which runs on an isolated network)
to connect to services like the cert-manager webhook which run in the cluster.

GKE will gradually begin using the Konnectivity service for versions 1.19.4-gke.200 and later. Konnectivity replaces SSH tunnels between the control plane and nodes with a more secure TCP proxy. The change will first be introduced for non-private clusters.
-- https://cloud.google.com/kubernetes-engine/docs/release-notes#January_22_2021

So you may find that it wasn't a cert-manager bug. Try upgrading again and let us know how it goes.

@maelvls This might be another candidate for: https://cert-manager.io/docs/troubleshooting/webhook/

@zoosmand
Copy link

If you look at url carefully, you'll see it is not complete.
"cert-manager-webhook.cert-manager.svc" is the service name, while the full url must be like this "cert-manager-webhook.cert-manager.svc.cluster.domain".
The k8s control plane ignores the domain name somehow.
Though I haven't found the cause of this problem on one of my clusters yet, I agree it hides in the connectivity.

@jetstack-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale

@jetstack-bot jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 12, 2024
@cert-manager-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
/lifecycle rotten
/remove-lifecycle stale

@cert-manager-prow cert-manager-prow bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants