Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when getting certificate - Traefik on Swarm with Consul #2556

Closed
Horgix opened this issue Dec 11, 2017 · 5 comments
Closed

Segfault when getting certificate - Traefik on Swarm with Consul #2556

Horgix opened this issue Dec 11, 2017 · 5 comments
Labels
area/acme contributor/waiting-for-feedback kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. status/5-frozen-due-to-age

Comments

@Horgix
Copy link

Horgix commented Dec 11, 2017

Hello,

I would like to report a segmentation fault on latest versions of Traefik when trying to get an ACME certificate with Traefik running in Docker, on a swarm mode cluster with configuration stored in Consul.

What did you do?

Tried to access some endpoint through Traefik, with HTTPS and certificate generation via Let's Encrypt.

What did you expect to see?

Expected to get a certificate generated and served.

What did you see instead?

Segfault when trying to load the ACME certificate when resolving the Challenge:

traefik_traefik.1.w4x67i9e5ig5@host.example.org    | time="2017-12-10T22:45:31Z" level=debug msg="Look for provided certificate to validate [traefik.example.work]..."
traefik_traefik.1.w4x67i9e5ig5@host.example.org    | legolog: 2017/12/10 22:45:31 [INFO][traefik.example.work] acme: Obtaining bundled SAN certificate
traefik_traefik.1.w4x67i9e5ig5@host.example.org    | time="2017-12-10T22:45:31Z" level=debug msg="No provided certificate found for domains [traefik.example.work], get ACME certificate."
traefik_traefik.1.w4x67i9e5ig5@host.example.org    | time="2017-12-10T22:45:31Z" level=debug msg="Challenge GetCertificate traefik.example.work"
traefik_traefik.1.w4x67i9e5ig5@host.example.org    | time="2017-12-10T22:45:31Z" level=debug msg="Loading ACME certificates [traefik.example.work]..."
traefik_traefik.1.w4x67i9e5ig5@host.example.org    | panic: runtime error: invalid memory address or nil pointer dereference
traefik_traefik.1.w4x67i9e5ig5@host.example.org    | [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x9e8207]
traefik_traefik.1.w4x67i9e5ig5@host.example.org    |
traefik_traefik.1.w4x67i9e5ig5@host.example.org    | goroutine 154 [running]:
traefik_traefik.1.w4x67i9e5ig5@host.example.org    | github.com/containous/traefik/vendor/github.com/xenolf/lego/acme.(*Client).getChallenges.func1(0x0, 0xc420967ec0, 0xc420967e60, 0xc42032dc60, 0x13)
traefik_traefik.1.w4x67i9e5ig5@host.example.org    | 	/go/src/github.com/containous/traefik/vendor/github.com/xenolf/lego/acme/client.go:550 +0xd7
traefik_traefik.1.w4x67i9e5ig5@host.example.org    | created by github.com/containous/traefik/vendor/github.com/xenolf/lego/acme.(*Client).getChallenges
traefik_traefik.1.w4x67i9e5ig5@host.example.org    | 	/go/src/github.com/containous/traefik/vendor/github.com/xenolf/lego/acme/client.go:547 +0x133

Traefik version

Reproduced with 3 different official Traefik images:

  • traefik:v1.4.3-alpine
  • traefik:v1.4.5-alpine
  • traefik:v1.5.0-rc2-alpine

Output for the traefik:v1.5.0-rc2-alpine:

Version:      v1.5.0-rc2
Codename:     cancoillotte
Go version:   go1.9.2
Built:        2017-12-06_03:07:42PM
OS/Arch:      linux/amd64

What is your environment & configuration (arguments, toml, provider, platform, ...)?

  • 3 nodes cluster with RR DNS for the record I'm trying to get a certificate for
  • Traefik is running as a Docker Swarm service on this 3 nodes cluster
  • Traefik is getting its configuration from a Consul cluster (and thus, storing his SSL certificates in Consul too)
  • I tried 2 different swarm modes for running Traefik (both result in this segfault)
    • global mode, with a 3 node cluster so 3 Traefik instance
    • ... but being afraid that this could cause the troubles I had with the challenge (the RR DNS would cause the challenge resolution to potentially be requested to another Traefik instance than the one that initiated it, even if they might be able to still resolve it since they're storing everything in Consul), I ended up running it in replicated mode with a single replica

traefik.toml that has been stored in Consul through traefik storeconfig (hit #927 btw):

debug = true
logLevel = "DEBUG"
defaultEntryPoints = ["http", "https"]

[acme]
email = "foobar@example.org"
storage = "/root/acme.json" # or "traefik/acme/account" if using KV store
entryPoint = "https"
acmeLogging = true
onDemand = true
OnHostRule = true

[entryPoints]
  [entryPoints.http]
  address = ":80"
    [entryPoints.http.redirect]
      entryPoint = "https"
  [entryPoints.https]
  address = ":443"
    [entryPoints.https.tls]

[web]
address = ":8080"
ReadOnly = true

[docker]
endpoint = "unix:///var/run/docker.sock"
domain = "example.work"
watch = true
swarmmode = true
exposedbydefault = false

[consul]
endpoint = "consul.example.org:8500"
watch = true
prefix = "traefik"

docker-compose.yml used to spawn the service:

version: '3.1'

services:
  traefik:
    image: traefik:v1.5.0-rc2-alpine
    ports:
      - "80:80"
      - "443:443"
      - "8080:8080"
    networks:
      - distributed
      - consul
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    deploy:
      mode: replicated
      labels:
        traefik.enable:         "true"
        traefik.port:           "8080"
        traefik.backend:        "traefik"
        traefik.frontend.rule:  "Host:traefik.example.work"
        traefik.docker.network: "traefik_distributed"
    command:
      - "--consul"
      - "--consul.endpoint=consul.example.org:8500"

networks:
  distributed:
    driver: overlay
  consul:
    external:
      name: consul_consul_distributed

Note

For the record, I finally solved the segfault by changing storage to storagefile (at least I think it's what solved it) and by playing with what was stored in Consul

@ldez ldez added area/acme kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. status/0-needs-triage labels Dec 11, 2017
@emilevauge
Copy link
Member

emilevauge commented Dec 11, 2017

Thanks @Horgix for reporting. For the record, does this segfault make traefik crash or does Traefik recover from it ?

@Horgix
Copy link
Author

Horgix commented Dec 11, 2017

@emilevauge :

For the record, does this segfault make traefik crash or does Traefik recover from it ?

It plainly crashes from what I remember, and a new Traefik instance was launched right away by Docker to replace it since it runs as a swarm service

@softkot
Copy link

softkot commented Dec 21, 2017

Have to note that has the same issue with etcd KV backend.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x806c37]

goroutine 124 [running]:
github.com/containous/traefik/vendor/github.com/xenolf/lego/acme.(*Client).getChallenges.func1(0x0, 0xc4205fbd40, 0xc4205fbce0, 0xc42035deb0, 0x10)
	/go/src/github.com/containous/traefik/vendor/github.com/xenolf/lego/acme/client.go:540 +0xd7
created by github.com/containous/traefik/vendor/github.com/xenolf/lego/acme.(*Client).getChallenges
	/go/src/github.com/containous/traefik/vendor/github.com/xenolf/lego/acme/client.go:537 +0x133

P.S. Run 1.4.5-alpine

@nmengin
Copy link
Contributor

nmengin commented Dec 22, 2017

Hello @Horgix , @softkot.

For the record, I finally solved the segfault by changing storage to storagefile (at least I think it's what solved it) and by playing with what was stored in Consul

We recently fixed a problem with storage and storageFile during the ACME configuration migration to KV stores in PR #2598.
The fix has been merged into the version 1.5-rc3.

Moreover @softkot, you use the deprecated option onDemand in your ACME configuration.
This option is not necessary and can generate problems.

Can you try to launch your environment :

  • With the image traefik:1.5
  • Turning the option onDemand to false (or deleting it).

@traefiker
Copy link
Contributor

Hi! I'm Træfiker 🤖 the bot in charge of tidying up the issues.

I have to close this one because of its lack of activity 😞

Feel free to re-open it or join our Slack workspace for more community #support.

@traefik traefik locked and limited conversation to collaborators Sep 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/acme contributor/waiting-for-feedback kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. status/5-frozen-due-to-age
Projects
None yet
Development

No branches or pull requests

6 participants