Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traefik should check for stale ACME locks in the backend KV store #2671

Open
igoratencompass opened this issue Jan 7, 2018 · 4 comments

Comments

Projects
None yet
6 participants
@igoratencompass
Copy link

commented Jan 7, 2018

Do you want to request a feature or report a bug?

I would say it is a bug

What did you do?

I launched Traefik pod in Kubernetes via Deployment with etcd backend. The etcd KV has been pre-populated using Kubernets Job. If I increase the instance count to 2 in the Deployment a new Traefik Pod is launched that trows the following error:

time="2018-01-07T05:11:19Z" level=debug msg="Building ACME client..." 
time="2018-01-07T05:11:19Z" level=error msg="Error building ACME client &{Email: Registration:<nil> PrivateKey:[] DomainsCertificate:{Certs:[] lock:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0}} ChallengeCerts:map[]}: private key was nil"

I assume due to the /traefik/acme/account/lock___lock key created by the first one. Now if I delete the first Pod the lock does not get removed which means the ACME settings stay unavailable for the second Pod and the new one that the Deployment will launch to replace the first one.

What did you expect to see?

I would expect to see some kind of lock management mechanism in the code that periodically checks for stale locks in order to enable a replacement Pod to remove it and take over the ACME client functionality.

What did you see instead?

Output of traefik version: (What version of Traefik are you using?)

v1.5.0-rc4

What is your environment & configuration (arguments, toml, provider, platform, ...)?

# (paste your configuration here)

If applicable, please paste the log output in debug mode (--debug switch)

time="2018-01-07T05:11:19Z" level=debug msg="Building ACME client..." 
time="2018-01-07T05:11:19Z" level=error msg="Error building ACME client &{Email: Registration:<nil> PrivateKey:[] DomainsCertificate:{Certs:[] lock:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0}} ChallengeCerts:map[]}: private key was nil"
@vladimirtiukhtin

This comment has been minimized.

Copy link

commented Feb 16, 2018

Same issue. If I redeploy daemon set, no one from all nodes is working with ACME

@ldez

This comment has been minimized.

Copy link
Member

commented Feb 16, 2018

Remember the gentle way to participate:

  • add a "reaction" on the first message of the issue.
  • add more useful and detailed information on the subject.
  • resolve the issue by making a PR.
@dennybaa

This comment has been minimized.

Copy link

commented Dec 29, 2018

Btw it happens when the pod restarts and another one still not shutdown. (Using the official latest chart)
https://gist.github.com/dennybaa/c67ba07a88a51fbd7a7eb1813909751f

@rnsv

This comment has been minimized.

Copy link

commented Feb 10, 2019

Any updates on this issue. I see the same problem. I am running traefik:v1.7.8 in kubernetes v1.13.2

level=info msg="Creating in-cluster Provider client"
level=info msg="Node dadc79c6-cfca-4cb5-a7e4-32305d673abcd elected worker ♝"
level=error msg="Cannot unmarshall private key []"
level=error msg="Error building ACME client &{Email: Registration:<nil> PrivateKey:[] KeyType: DomainsCertificate:{Certs:[] lock:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0}} ChallengeCerts:map[] HTTPChallenge:map[]}: private key was nil"
level=info msg="Node dadc79c6-cfca-4cb5-a7e4-32305d673abcd elected leader ♚"
level=info msg="Starting ACME renew job..."
level=info msg="The key type is empty. Use default key type 4096."

Acme Config

[acme]
  email = "admin@my-domain.com"
  storage = "/traefik/acme/account"
  caServer = "https://acme-v02.api.letsencrypt.org/directory"
  entryPoint = "https"
  [acme.dnsChallenge]
    provider = "route53"
  [[acme.domains]]
    main = "*.my-domain.com"
    sans = ["demo.my-domain.com"]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.