Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Implement delay on ACME registration (TLS Provider) #2174

Open
micahhausler opened this issue Sep 25, 2017 · 11 comments
Open

Feature Request: Implement delay on ACME registration (TLS Provider) #2174

micahhausler opened this issue Sep 25, 2017 · 11 comments

Comments

@micahhausler
Copy link

@micahhausler micahhausler commented Sep 25, 2017

I'm using Kubernetes external-dns with Traefik as an Ingress controller + ACME, and I'm running into an issue where the external-DNS has not created the DNS record before traefik tries to create the ACME cert.

I'm wary of using onDemand as it seems it will create an ACME request for any domain (forged Host headers too), and I don't want to open myself to being rate limited.

Even though I could use a DNS based challenge, I still have the possibility of the domain not yet being created. I could set acme.delayDontCheckDNS = 61, but I'd prefer to not mess with using my DNS provider.

My current workaround is to:

  • Create an Ingress rule
  • Wait 60 seconds to allow the DNS to get created
  • Delete the ingress rule
  • Recreate the ingress rule

Do you want to request a feature or report a bug?

Feature Request

What did you do?

I created an ingress rule with ACME turned on.

What did you expect to see?

Traefik retry or wait until my new domain name propagates.

What did you see instead?

time="2017-09-15T20:21:02Z" level=error msg="map[coffee.skuid.com:acme: Error 400 - urn:acme:error:unknownHost - No valid IP addresses found for coffee.skuid.com
Error Detail:
	Validation for coffee.skuid.com:443
	Resolved to:
		
	Used: 

]" 
time="2017-09-15T20:21:02Z" level=error msg="Error getting ACME certificates [coffee.skuid.com] : Cannot obtain certificates map[coffee.skuid.com:acme: Error 400 - urn:acme:error:unknownHost - No valid IP addresses found for coffee.skuid.com
Error Detail:
	Validation for coffee.skuid.com:443
	Resolved to:
		
	Used: 

]+v" 

Output of traefik version: (What version of Traefik are you using?)

Version:      v1.4.0-rc3
Codename:     roquefort
Go version:   go1.9
Built:        2017-09-18_04:38:27PM
OS/Arch:      linux/amd64

What is your environment & configuration (arguments, toml, provider, platform, ...)?

Kubernetes DaemonSet (1 pod per node) + AWS

# traefik.toml
logLevel = "INFO"
defaultEntryPoints = ["http","https"]

[entryPoints]
  [entryPoints.https]
    address = ":443"
    proxyprotocol = true
    [entryPoints.https.tls]
  [entryPoints.http]
    address = ":80"
    proxyprotocol = true
    compress = false

[kubernetes]
  labelselector = "kubernetes.io/ingress.class = traefik"

[accessLogs]
  format = "json"

[acme]
  email = "me@example.com"
  storage = "traefik/acme/account"
  entryPoint = "https"
  onHostRule = true

[web]
  address = ":8080"
  [web.metrics.prometheus]
    Buckets = [0.1,0.3,1.2,5.0]

[consul]
  prefix = "traefik"
  endpoint = "consul.default.svc.cluster.local"
@dtomcej

This comment has been minimized.

Copy link
Member

@dtomcej dtomcej commented Oct 1, 2017

Could this not be accomplished by a k8s job instead?

@micahhausler

This comment has been minimized.

Copy link
Author

@micahhausler micahhausler commented Oct 2, 2017

No, I'm not sure I follow. What are you suggesting a k8s job do?

@allamand

This comment has been minimized.

Copy link

@allamand allamand commented Dec 4, 2017

If we could traefik to wait some sort of timeout at startup to let networks be available before making any ACME challenge would be very helpful.

@djeeg

This comment has been minimized.

Copy link
Contributor

@djeeg djeeg commented Dec 12, 2017

Same situation on docker-for-azure when provisioning by stacks

With a config like this:

  traefikedge:
    image: traefik:1.4.3-alpine
    command: "--acme.domains='www.domain.com,static.domain.com'"
    ports:
      - "80:80"
      - "443:443"

Its takes 10-30s for the exposed ports to be regsitered with the azure load balancer.

Sometimes the ACME challenge works
Sometimes I get 400 errors
Similar to these

My current workaround is to initally provision the service with zero replicas

deploy:
      mode: replicated
      replicas: 0

Then wait for azure to provision the port mappings
Then scale up the traefikedge container

A configurable delay similar to delayDontCheckDNS would be quite helpful

@srbry

This comment has been minimized.

Copy link

@srbry srbry commented Feb 9, 2018

Has any progress been made on this? I have also hit this case when spinning up traefik instances on VMs behind an AWS ELB. The ELB needs to healthcheck that traefik is listening but this often takes ~20 seconds. In that time traefik has already tried and failed to validate.

@tpdownes

This comment has been minimized.

Copy link

@tpdownes tpdownes commented Dec 1, 2018

Update: the idea below doesn't work because external-dns can't know the external IP without learning it from the Ingress. I am deploying multiple services behind a Traefik daemon set. So I think the options are:

  • switch to DNS challenge with acme.dnsProvider.$VAR and acme.delayBeforeCheck set
  • manually create DNS record as part of deployment (which is not the point)

I'm spinning up a service that would be helped by this. It might be sufficient to create a configuration knob that tells Traefik to check for the existence of a DNS record first (A/CNAME?) prior to trying the challenge. This would be limiting in environments where external DNS cannot be queried (not my case).

I'm guessing that a workaround for me will be to deploy the service with a k8s annotation... wait 1-2 minutes, then deploy the ingress with the host rule. I'm using k8s+external-dns+traefik, with external-dns defaulting to creating DNS entries by searching the traefik ingress for host rules. But it can also search for service annotations.

e.g. this:

apiVersion: v1
kind: Service
metadata:
  name: nginx
  annotations:
    external-dns.alpha.kubernetes.io/hostname: nginx.external-dns-test.my-org.com.

and waiting 2 min to deploy the ingress with the host rule that will be triggered via traefik configured with acme.onHostRule. Must also have configured external-dns to look at services and ingresses:

        args:
        - --source=service
        - --source=ingress

and also have configured traefik with (Helm values):

kubernetes:
  ingressEndpoint:
    useDefaultPublishedService: true
@tsmgeek

This comment has been minimized.

Copy link

@tsmgeek tsmgeek commented Jan 31, 2019

Has any progress been made on this? I have also hit this case when spinning up traefik instances on VMs behind an AWS ELB. The ELB needs to healthcheck that traefik is listening but this often takes ~20 seconds. In that time traefik has already tried and failed to validate.

Yup exactly the same issue I am having with ELB (NLB), traefik starts instantly but nothing is routed to the new backends for up to 90s.

@tpdownes

This comment has been minimized.

Copy link

@tpdownes tpdownes commented Feb 8, 2019

The way I read the code is that the HTTP Challenge (and TLS challenge) could be modified to have an addPreCheck feature similar to the DNS challenge. It looks like the DNS challenge defaults to a pre-check function that performs actual DNS queries and the delay check knob in Traefik replaces that with a simple delay.

Just a starting point if anyone wants to slap together a PR.

@pabloxio

This comment was marked as off-topic.

Copy link

@pabloxio pabloxio commented Mar 19, 2019

I need this feature 😢

@porridge

This comment has been minimized.

Copy link

@porridge porridge commented Dec 2, 2019

I only read @tpdownes comment after throwing together this ugly hack which seems to do the trick for me. Perhaps someone else finds it useful.

@soupdiver

This comment has been minimized.

Copy link

@soupdiver soupdiver commented Dec 2, 2019

This is an issue for 2+ years now.
It seems for most people a timeout before certs are requested would work.
Would a config option like timeoutBeforeCertRequest break some guidelines or isn't it that easy for some reason? 🤔

Coming from: #3652

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.