Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACME challenge fails with nginx.ingress.kubernetes.io/permanent-redirect #11315

Open
renepupil opened this issue Apr 25, 2024 · 13 comments
Open
Labels
kind/support Categorizes issue or PR as a support question. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@renepupil
Copy link

renepupil commented Apr 25, 2024

What happened:

Error: Error accepting authorization: acme: authorization error for our-domain.ch: 403 urn:ietf:params:acme:error:unauthorized: During secondary validation: <ip>: Invalid response from http://our-domain.ch/.well-known/acme-challenge/NoqmY7zPrO4y7EQB9r6gmlgQUMXFJg9GNtum9FgYn8s: 403

What you expected to happen:

I would expect the ingress to configure nginx in such a way that it doesn't redirect on accessing http://our-domain.ch/.well-known/acme-challenge/NoqmY7zPrO4y7EQB9r6gmlgQUMXFJg9GNtum9FgYn8s, but this case is handled specially, so that the acme challenge is successful...

This http://our-domain.ch/.well-known/acme-challenge/NoqmY7zPrO4y7EQB9r6gmlgQUMXFJg9GNtum9FgYn8s redirects to https://foo.com after the failed acme challenge, but likely also during the acme challenge.

As we are on a managed kubernetes cluster, I can not take a look at the created nginx config, or the cluster-issuer configuration, but I strongly suggest this problem lays in the wrong handling of the permanent-redirect by the ingress nginx, as other certificates in our cluster are created without problem.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

helm.sh/chart=ingress-nginx-4.9.1

Kubernetes version (use kubectl version):

WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:53:42Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.15+rke2r1", GitCommit:"da6089da4974a0a180c226c9353e1921fa3c248a", GitTreeState:"clean", BuildDate:"2023-10-18T16:31:24Z", GoVersion:"go1.20.10 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.27) and server (1.25) exceeds the supported minor version skew of +/-1

Environment:

  • Cloud provider or hardware configuration: not sure if I can give that

  • OS (e.g. from /etc/os-release): my local system

  • Kernel (e.g. uname -a): 22.04.4 LTS (Jammy Jellyfish)

  • Install tools: managed, don't know

  • Basic cluster related info:

    • kubectl version: See above
    • kubectl get nodes -o wide: Can not give that because of IPs.
  • How was the ingress-nginx-controller installed: Probably helm, don't know.

  • Current State of the controller: Not sure, if I'm allowed to share this info.

  • Current state of ingress object, if applicable: Not sure, if I'm allowed to share this info.

  • Others:

kubectl --namespace=rest-ingress describe challenge

Name:         ingress-cert-ppzsw-363926471-2456233727
Namespace:    rest-ingress
Labels:       <none>
Annotations:  <none>
API Version:  acme.cert-manager.io/v1
Kind:         Challenge
Metadata:
  Creation Timestamp:  2024-04-25T16:29:12Z
  Finalizers:
    finalizer.acme.cert-manager.io
  Generation:  1
  Owner References:
    API Version:           acme.cert-manager.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Order
    Name:                  ingress-cert-ppzsw-363926471
    UID:                   537efb1f-9b4d-4de8-be46-6e278a6f0a82
  Resource Version:        345155008
  UID:                     ed3f5cf8-e2f8-49d7-a902-a3b2f9d38a64
Spec:
  Authorization URL:  https://acme-v02.api.letsencrypt.org/acme/authz-v3/342936176147
  Dns Name:           our-domain.ch 
  Issuer Ref:
    Group:  cert-manager.io
    Kind:   ClusterIssuer
    Name:   letsencrypt-prod
  Key:      QHVv0l6NXBnzHCNTqmkaI9OyYJ3gIQyEXMmyjGE8U9M.Y4nXKJHjF4S08EPIIlBf0KSxnwzNB9ws_Oa1FD-00nI
  Solver:
    http01:
      Ingress:
        Pod Template:
          Metadata:
          Spec:
            Tolerations:
              Operator:  Exists
  Token:                 QHVv0l6NXBnzHCNTqmkaI9OyYJ3gIQyEXMmyjGE8U9M
  Type:                  HTTP-01
  URL:                   https://acme-v02.api.letsencrypt.org/acme/chall-v3/342936176147/oc5jsA
  Wildcard:              false
Status:
  Presented:   false
  Processing:  false
  Reason:      Error accepting authorization: acme: authorization error for ourdomain.ch: 403 urn:ietf:params:acme:error:unauthorized: During secondary validation: 85.131.147.68: Invalid response from http://our-domain.ch /.well-known/acme-challenge/QHVv0l6NXBnzHCNTqmkaI9OyYJ3gIQyEXMmyjGE8U9M: 403
  State:       invalid
Events:        <none>

How to reproduce this issue:

Apply a minimal ingress like this:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: rest-ingress
  namespace: our-rest-ingress
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    acme.cert-manager.io/http01-edit-in-place: "false" # Error with or without
    cert-manager.io/issue-temporary-certificate: "true" # Error with or without
    nginx.ingress.kubernetes.io/permanent-redirect: https://foo.com
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - our-domain.ch 
    secretName: ingress-cert
  rules:
  - host: our-domain.ch 

Is there something wrong with the configuration?

@renepupil renepupil added the kind/bug Categorizes issue or PR as related to a bug. label Apr 25, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 25, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@renepupil renepupil changed the title ACME challenges fails with nginx.ingress.kubernetes.io/permanent-redirect ACME challenge fails with nginx.ingress.kubernetes.io/permanent-redirect Apr 25, 2024
@longwuyuan
Copy link
Contributor

  • cert-manager works for tons of users already so this requires more detail
  • the template of a new bug report asks questions that collect info for readers to look at data to base comments on
  • look at the template and answer the questions in md format

/remove-kind bug
/kind support
/triage needs-information

@k8s-ci-robot k8s-ci-robot added kind/support Categorizes issue or PR as a support question. triage/needs-information Indicates an issue needs more information in order to work on it. and removed kind/bug Categorizes issue or PR as related to a bug. labels Apr 25, 2024
@renepupil
Copy link
Author

cert-manager works for tons of users already so this requires more detail

The question is still if the redirect is expected behavior and as others said in the linked issue, they expect different behavior, as well as I do, so I just think it's a bug.

the template of a new bug report asks questions that collect info for readers to look at data to base comments on

Yeah, sry, force of habit to empty the template on issue creation.
I added all info I "could", meaning some details especially regarding setup I con not provide, because we are managed, also not sure if I can just expose or IPs.

I think the issue is general and not a specific version bug, but again could be layer 8 here.

look at the template and answer the questions in md format

Done that.

@longwuyuan Can you or some of your team just try the minimal configuration?

I think it could be fastly discovered that the given behaviour exist in newer kubernetes/ingress configuration...

@longwuyuan
Copy link
Contributor

Thanks for the info. Your thought process is a bit clearer now, but in my opinion, its not super helpful for all readers. Just my opinion and others will opine differently. Basically the info that helps readers is ;

  • k get all -n ingress-nginx
  • k -n ingress-nginx describe po $ingress-nginx-controller
  • k -n ingress-nginx logs $ingress-nginx-controller-podname
  • k -n $appnamespace get svc,ing
  • k -n $appnamespace describe ing $ingname
  • Your complete curl command as used for real with a -v
  • Other such info

Redacted info also works as there is some insight on the state of the resources and the log of the events. But since its redirection issue, it helps if you cognitively replace FQDNs as appropriate with real behaviour, so context is retained without the real-hostname used.

I am assuming that you want to use permanent-redirect annotation, but you want it to NOT do a blanket-redirection of all the requests, that match that specific ingress rule. You want that annotation to do selective intelligent redirection by NOT-redirecting acme traffic, because you want the cert, but you want it to redirect everything else. Is this a fair view of what you expect. And you suspect this is a bug ?

@renepupil
Copy link
Author

renepupil commented Apr 26, 2024

@longwuyuan Thanks for you feedback

You want that annotation to do selective intelligent redirection by NOT-redirecting acme traffic, because you want the cert, but you want it to redirect everything else. Is this a fair view of what you expect. And you suspect this is a bug ?

Exactly, this was already requested in #6853:

Most likely, this could be solved by ensuring that a redirect is not added if the location starts with ".well-known/acme-challenge".

A workaround was suggested by @danton721

I have done it a bit differently if it is still valid for discussion:

nginx.ingress.kubernetes.io/server-snippet: |
  if ($request_uri !~* ^/.well-known) {
    return 301 $scheme://my.site.com$request_uri;
  }

AFAIK, without this, for HTTPS, there can not be created a valid certificate, so browsers will not redirect traffic when accessing https://our-domain.ch, but complain about the missing certificate.

I would consider this a "bug" if the user has to use the .well-known workaround, at least it should be documented for nginx.ingress.kubernetes.io/permanent-redirect, that is doesn't work for HTTPS.

@longwuyuan
Copy link
Contributor

ok. that info is helpful.

Can you explain this paradox that first you create a ingress with a specific hostname. And then instead of routing traffic to a backend pod behind that FQDN, you want to do seemingly contradictory things simultaneously on that rule being matched. Why generate a certificate for a FQDN when you are going to permanently-redirect from there with a 30X response.

@renepupil
Copy link
Author

you want to do seemingly contradictory things simultaneously on that rule being matched.

That is a valid question (I will try a solution by DNS later on, but I think this will not work):

  1. We deploy tenant websites like <tenant>.our-domain.ch in a custom namespace for each tenant, within such a tenant namespaces there is an ingress routing requests to <tenant>.our-domain.ch to the backend service of that tenant. For each tenant ingress, we create a certificate using the cert manager.
  2. For the "main" domain our-domain.ch we want to redirect the traffic to our company homepage (the target of the permanent redirect) https://foo.com, as currently accessing our-domain.ch without a tenant gives 404. This works when accessing http://our-domain.ch, but when accessing https://our-domain.ch the browser is complaining about missing certificate, as we only have auto certificates for each subdomain, not a single wildcard certificate for all subdomains.

Does that clarify our case?

As long as there is not valid certificate for the "main" domain our-domain.ch, we can not redirect traffic from it, when accessing https://our-domain.ch.

But if you ask yourself why we want this, shouldn't you ask yourself why the nginx.ingress.kubernetes.io/permanent-redirect annotation exists in the first place?

It seems like you question the very existence of the annotation you maintain...

@longwuyuan
Copy link
Contributor

Thanks for the information. Its very helpful in understanding your use case.

  • The target of asking you questions is to ascertain actionable items in short and long-term

  • I think there is some fairness to expect that permanent-redirect annotation should redirect selectively so that it can integrate with cert-manager

  • But annotations in the eingress-nginx controller are implementations around ingress and hence HTTP & HTTPS protocols

  • So annotations in the controller can not practically integrate with every single HTTP/HTTPS related software out there. It is practically impossible to implement and maintain such integration

  • Also the general approach is dedicated focus on different parts of the ecosystem so its less efficient that one software does everything

Please wait for comments from otners. Maybe someone has a practical solution to selectively do redirection

@renepupil
Copy link
Author

I think there is some fairness to expect that permanent-redirect annotation should redirect selectively so that it can integrate with cert-manager

Thanks for acknowledgement.

can not practically integrate with every single HTTP/HTTPS related software out there.

Fair point, I wouldn't expect this to have a Prio...

@longwuyuan Can we maybe hint that nginx.ingress.kubernetes.io/permanent-redirect doesn't work out of the box with HTTPS in the docs?

@longwuyuan
Copy link
Contributor

The summary is ;

  • upstream nginx offers returning code 301 and that is the function of this annotation

  • some people can potentially use nginx if and do selective redirection but that translates to a snippet in the controller

  • this controller does not integrate with another popular & critical tool called external-dnsfor another feature i.e. internal+external LB together in one install. The controller fails to return the internal-LB FQDN to external-dns. this is another example of the difficulty in external tool integrations

  • selective redirection while responding with a 301 is corner-case use not widely demanded by multitude of users and there is acute shortage of developer-time here so the likely solution in future is the gateway-api

  • if the cert-manager team wants to put a PR to NOT route the ACME url for this annotation, the PR will get reviewed but some user has to take up the initiative to talk to cert-manager team and follow it up etc

  • Your suggestion for docs improvement is fair if the content is informative about the use-case and does not imply that selective redirection is a standard that this annotation is deviating from. Please submit PR

  • Such integrations with other tools are desirable & most welcome as PRs but the effort to develop it to maturity and then testing/maintenance requires massive resources that are just not available in the project

@longwuyuan
Copy link
Contributor

/remove-triage needs-information

@k8s-ci-robot k8s-ci-robot removed the triage/needs-information Indicates an issue needs more information in order to work on it. label Apr 26, 2024
@rouke-broersma
Copy link

@renepupil Disabling http01-edit-in-place (default) should work, because cert-manager will generate a new ingress with a more specific path instead of editing your existing ingress. This more specific path takes precedence and does not contain the redirect annotation.

You said

This http://our-domain.ch/.well-known/acme-challenge/NoqmY7zPrO4y7EQB9r6gmlgQUMXFJg9GNtum9FgYn8s redirects to https://foo.com after the failed acme challenge, but likely also during the acme challenge.

This is not a fair assumption because cert-manager may have removed the ingress by the time you tested this after the failed acme challenge. If cert-manager has already removed the acme challenge ingress due to failures, only the redirect ingress remains so of course the acme challenge url would redirect to foo.com.

It's interesting that the error mentions 403 as the response code, while entirely possible that your corporate website returns Forbidden for .well-known I would rather expect 404 instead.

Have you confirmed that the acme challenge actually hits your corporate website?

@renepupil
Copy link
Author

renepupil commented May 8, 2024

Have you confirmed that the acme challenge actually hits your corporate website?

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: redirect-to-another
  namespace: our-redirects
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    acme.cert-manager.io/http01-ingress-class: nginx
    acme.cert-manager.io/http01-edit-in-place: "false"
    nginx.ingress.kubernetes.io/permanent-redirect: https://another-domain.ch
    nginx.ingress.kubernetes.io/from-to-www-redirect: "false"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - our-domain.ch
    secretName: ingress-cert
  rules:
  - host: our-domain.ch

So, we found out that our WAF was blocking the acme challenge cause Let's encrypt uses google services that we now allow. The ingress has an event, that the certificate could get created successfully:

Normal CreateCertificate 21s cert-manager-ingress-shim Successfully created Certificate "ingress-cert"

But when calling https://our-domain.ch in Chrome, I still get NET::ERR_CERT_COMMON_NAME_INVALID. (Redirect works from non https page http://our-domain.ch.

Subject: *.our-domain.ch
Issuer: R3
Expires on: Jul 8, 2024
Current date: May 8, 2024

Maybe it's the wildcard in the certificate, but I would have expected the certificate creation to create a certificate that works without subdomain...

@rouke-broersma Any idea what the problem could be?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
Development

No branches or pull requests

4 participants