New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Waiting for http-01 challenge propagation: failed to perform self check GET request #3238
Comments
Can you access /triage support |
@meyskens I was also getting same issue. Providing the result. I was using HAproxy Ingress controller but create ingress based annotation with ngnix. |
Hi, I'm having the same problem (with nginx ingress). I tried curling the validation URL from a pod in the cluster and got the following response:
Whereas doing the same from outside the cluster returns the secret as expected. Does this mean there's something strange going on with DNS in Kubernetes? |
Interestingly, I can access external domains from the pod, but I can't seem to access any of the domains that are hosted inside this cluster. For example:
vs:
|
I'm facing the same problem as @nabsul |
Based on some other threads I've been reading, this seems to be related to a bug in Kubernetes DNS, not directly related to cert-manager (but certain affects cert-manager) heavily. In case this is helpful to others stuck on this issue, I've unblocked myself by manually generating certs and uploading them to my cluster. (Note: I prefer to use a VM or pod to do the following because the source IP address gets logged to public records at let's encrypt)
It's tedious, but at least my certs won't expire while we're waiting for this bug to get fixed. Docs: https://certbot.eff.org/docs/using.html#manual |
In the meantime, I wonder how hard/bad it would be to hack cert-manager itself to skip the "self check GET request" step altogether. It's a great idea to do this check, but I don't think it's absolutely necessary to the cert renewal process. |
WARNING: This is a random idea that I haven't fully thought through. Attempt at your own risk :-). Although I would love to, I most likely don't have time to mess with this idea, but if anyone wants to give it a shot, I would try replacing the testReachability() function here with a simple You'd then need to build a Docker image, upload it to docker hub, and use it instead of the official image in your cluster. Again, if this works at all, it should be considered a temporary solution until a formal fix comes out. |
I strongly reccomend not doing that, have you tried https://cert-manager.io/docs/usage/certificate/#temporary-certificates-whilst-issuing ? |
@meyskens what do you recommend ?? |
Hi all. Has there been any news on this issue? I gave up on looking for a solution and instead figured out how to manually renew Let's Encrypt certs in Kubernetes. In case any of your are still stuck, I've shared code and instructions to do this here: https://github.com/nabsul/k8s-letsencrypt The instructions are long for the sake of clarity, but it's actually not that bad to do it manually. |
Is There a new development about this issue? |
I've lost faith in cert-manager due to the lack of progress on this issue. I'm writing a replacement that automates what I described in my previous comment. |
I am also having this issue. Its far too on-and-off |
i have also the same issue |
If anyone is stuck and willing to try out some experimental code, please reach out to me (LinkedIn and Twitter username is the same as here). |
@nabsul Hi there. Are you referring to https://github.com/nabsul/k8s-letsencrypt ? I'd like to give it a go. Need to find time this week. I'd be happy to move the conversation over to your repo if that works for you. |
Hi here, I got exactly the same issue while deploying the Gitlab Helm charts, that rely on Jetstack's cert-manager (version v0.10.1). The log of the I was working on french k8s as a service provider Scaleway (https://www.scaleway.com/en/kubernetes-kapsule/). The default network setup is using Cilium as container network interface (CNI). On a cluster with the same provider using the Calico CNI, the problem is gone, certificates are properly issued. |
finally i have got a solution, i have follow the tutorial on digitalocean and step number 5 solve the issue |
@nimerfarahty Do you know if the solution suggested by digital ocean works if you have multiple domains on the same load balancer? |
@just1689 It's actually a new project based on the one you referenced . I'll be making the code public today and will tag you from there. |
Same issue here (digital ocean). Which probably points to a bug upstream, I'm not sure if this issue only started occurring after installation of |
Seems to be the same issue here: #466 |
This PR (implementation of the KEP) might help: kubernetes/kubernetes#92312 |
Hi, @meyskens is it possible that cert-manager makes requests over https instead of http? For example: over https:
< HTTP/1.1 200 OK |
@nimerfarahty solved my problem. I agree this needs to be addressed. ASAP |
I'm facing the same issue with Linode, Step 5 on digital ocean solution above didn't solve the issue for me. any ideas? |
I was able to fix this, the chain of issues started as follow: I had the following in the annotation in my ingress controller nginx.ingress.kubernetes.io/use-regex: "true"nginx.ingress.kubernetes.io/rewrite-target: /this caused all URLs to be rewritten to / commenting these 2 lines made things work |
Hi guys to fix problem on DOKS cluster, enough add these annotations on ingress-nginx-controller service:
|
@bjornarhagen I'm sorry but I don't remember the exact thread. What I remember is that for some reason, DNS would fail to correctly resolve to allow a pod to access a domain that is part of the cluster you're on (see this comment: #3238 (comment)) Does this help at all? #656 (comment) I ended up writing my own cert manager. I was interested in learning Let's Encrypt, and so far it's been running flawlessly for close to a year. It's not production quality, but at least I understand it and can fix anything that goes wrong. My next project is to replace nginx-controller, and then my Kubernetes cluster will be perfect! |
@nabsul Thank you so much for your help on this :) |
I ran into the same problem (for me the error was Some debugging tips for others:
|
its seems like a patch was provided on kubernetes side (kubernetes/kubernetes#92312), however we still face the issue with 1.21. Any Ideas how to solve this with AWS NLB? without the hairpin-proxy solution? |
I my setup of the bare metal kubernetes + metallb + nginx-ingress I was able to resolve the issue by adding |
I found this issue in our The issue in our setup is the network policy. I need to do the following:
If you install everything in a single namespace I suspect you don't have to do this. |
Issues go stale after 90d of inactivity. |
Stale issues rot after 30d of inactivity. |
Rotten issues close after 30d of inactivity. |
@jetstack-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
agree with @nabsul, I am able to reach DNS
|
Same problem here in a standalone Kubernetes cluster. |
Same problem here, on a bare-metal k3s cluster. Running traefik 2.9.5 and cert-manager 1.10.1, both via helm. |
If you're up experimenting, I've created an alternative here: https://kcert.dev |
I am also facing the same problem. I am using GCP internal HTTPS load balancer which only supports one type of traffic either
Since this ingress only accepts |
After reading all this, it seems to me that some basic networking knowledge is missing. I support the decision to not provide a possiblity to disable the check, because you can very easily get blocked by letsencrypt when you have too many invalid requests! If you have a standalone cluster, you probably have a problem like this. Example:
Now, the NAT most likely is configured so that the address translation only gets done for traffic that comes from So basically, your node cannot access You should be able to verify that by comparing the result of Possible solutions
More information: https://wiki.kptree.net/doku.php?id=linux_router:nftables#hairpin_nat |
Major 4head moment. You're absolutely right. After fighting with this all morning, the solution was to add test.domain.com > node IP to my local DNS. Worked like a charm after that. |
There can be many reasons for timeouts, we first need to trace where the failure lies. https://cert-manager.io/docs/troubleshooting/#troubleshooting-a-failed-certificate-request. For me, the reason was DNS requests dropping leading to exceeding the context deadline for self-check. After inspecting each object, I traced it down to In my client, this timeout was actually caused by DNS. I was able to trace it by doing the
Note the addition of
As I've a
Note, As there was a TLDR: It can't be dns.... it's always DNS! 😂 |
Not sure how to proceed. I'm facing this problem as well. As in @shadyabhi scenario, I was able to find more information related to the problem in the "challenge" CRD, but the information there is not very useful to understand the problem. I have this reason: And I have these Events:
The problem was some netpol misconfiguration. |
I ran into this problem yesterday and today. After debugging, I switched over to DNS01 as a validation method and it works great. Kubernetes 1.27.3-00 I tried to read through RFC8555 and my best guess is that my situation is a race condition between the HTTP01 challenger and external-dns updates. |
Hi, just adding my two cents in case it helps someone. I ran into this issue, with the following symptoms:
In my case, it's an issue with our haproxy ingress controller because we enabled proxy protocol. Disabling it made the challenge solve immediately. We'll now have to look at how we can keep proxy protocol AND solved challenges, but that's not the topic of this issue. |
Hi I was facing a similar issue with EKS and adding the following to the ingress annotations solved it;
|
Status:
Presented: true
Processing: true
Reason: Waiting for http-01 challenge propagation: failed to perform self check GET request 'http://abc.com/.well-known/acme-challenge/Oej8tloD2wuHNBWS6eVhSKmGkZNfjLRemPmpJoHOPkA': Get "http://abc.com/.well-known/acme-challenge/Oej8tloD2wuHNBWS6eVhSKmGkZNfjLRemPmpJoHOPkA": dial tcp 18.192.17.98:80: connect: connection timed out
State: pending
Installation -
I am using AWS eks-
I cloned nginx-ngress in local and then I am installing it eks using annotation
service.beta.kubernetes.io/aws-load-balancer-type: nlb
I install certbot using helm
I applied a issuer and ingress resource. Till now I haven't created any application deployment.
When I am doing kubectl describe challenge I am getting above error message.
I am doing nothing extra. I had tried all the possible way but its not working . Can anyone help here
The text was updated successfully, but these errors were encountered: