Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caddy hangs after: Started certificate maintenance routine #2561

Closed
tevino opened this issue Apr 11, 2019 · 4 comments
Closed

Caddy hangs after: Started certificate maintenance routine #2561

tevino opened this issue Apr 11, 2019 · 4 comments

Comments

@tevino
Copy link

tevino commented Apr 11, 2019

1. Which version of Caddy are you using (caddy -version)?

Caddy 1.0.0-beta1

2. What are you trying to do?

Deploy an HTTPS proxy.

3. What is your Caddyfile?

https://example.com:443 {
    tls me@example.com
    gzip
    root /srv
    forwardproxy {
        basicauth USER PASS
        hide_ip
    }
} 

4. How did you run Caddy (give the full command and describe the execution environment)?

By using an ansible role

5. Please paste any relevant HTTP request(s) here.

6. What did you expect to see?

Caddy starts successfully.

7. What did you see instead (give full error messages and/or log)?

  1. No port was listened by caddy
  2. Caddy hangs after [INFO][FileStorage:/etc/ssl/caddy] Started certificate maintenance routine
  3. strace -ifp CADDYS_PID shows the following:
strace: Process 22027 attached with 7 threads
[pid 22033] [000000000045e6b3] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid 22032] [000000000045e860] epoll_pwait(4,  <unfinished ...>
[pid 22031] [000000000045e6b3] futex(0x27cb518, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 22030] [000000000045e6b3] futex(0xc0000eebc8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 22029] [000000000045e6b3] futex(0x27cb600, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 22028] [000000000045e6b3] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid 22027] [000000000045e6b3] futex(0x27af228, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 22033] [000000000045e6b3] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 22033] [000000000045e6b3] futex(0x27ae5d0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 22028] [000000000045e6b3] <... restart_syscall resumed> ) = 0
[pid 22033] [000000000045e6b3] futex(0x27af228, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22028] [000000000045e7d7] sched_yield( <unfinished ...>
[pid 22033] [000000000045e6b3] <... futex resumed> ) = 1
[pid 22028] [000000000045e7d7] <... sched_yield resumed> ) = 0
[pid 22027] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22033] [000000000045e6b3] futex(0x27b2640, FUTEX_WAIT_PRIVATE, 0, {3587, 856927871} <unfinished ...>
[pid 22028] [000000000045e6b3] futex(0x27ae4d0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22027] [00000000004b446a] newfstatat(AT_FDCWD, "/etc/ssl/caddy/locks",  <unfinished ...>
[pid 22028] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22027] [00000000004b446a] <... newfstatat resumed> {st_mode=S_IFDIR|0700, st_size=4096, ...}, 0) = 0
[pid 22028] [000000000045e11d] nanosleep({0, 20000},  <unfinished ...>
[pid 22027] [00000000004b446a] openat(AT_FDCWD, "/etc/ssl/caddy/locks/cert_acme_example.com_httpsacme-v02.api.letsencrypt.orgdirectory.lock", O_RDONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0644 <unfinished ...>
[pid 22028] [000000000045e11d] <... nanosleep resumed> NULL) = 0
[pid 22027] [00000000004b446a] <... openat resumed> ) = -1 EEXIST (File exists)
[pid 22028] [000000000045e11d] nanosleep({0, 20000},  <unfinished ...>
[pid 22027] [00000000004b446a] newfstatat(AT_FDCWD, "/etc/ssl/caddy/locks/cert_acme_example.com_httpsacme-v02.api.letsencrypt.orgdirectory.lock",  <unfinished ...>
[pid 22028] [000000000045e11d] <... nanosleep resumed> NULL) = 0
[pid 22027] [00000000004b446a] <... newfstatat resumed> {st_mode=S_IFREG|0644, st_size=0, ...}, 0) = 0
[pid 22028] [000000000045e11d] nanosleep({0, 20000},  <unfinished ...>
[pid 22027] [000000000045e6b3] futex(0x27b2640, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22033] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22027] [000000000045e6b3] <... futex resumed> ) = 1
[pid 22033] [000000000045e6b3] futex(0xc000354848, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 22028] [000000000045e11d] <... nanosleep resumed> NULL) = 0
[pid 22027] [000000000045e6b3] futex(0xc000354848, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22033] [000000000045e6b3] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 22028] [000000000045e11d] nanosleep({0, 20000},  <unfinished ...>
[pid 22027] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22033] [000000000045e6b3] futex(0xc000354848, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 22028] [000000000045e11d] <... nanosleep resumed> NULL) = 0
[pid 22027] [000000000045e6b3] futex(0x27b2640, FUTEX_WAIT_PRIVATE, 0, {0, 999677281} <unfinished ...>
[pid 22028] [000000000045e6b3] futex(0x27ae5d0, FUTEX_WAIT_PRIVATE, 0, {60, 0} <unfinished ...>
[pid 22027] [000000000045e6b3] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 22027] [000000000045e6b3] futex(0x27ae5d0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22028] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22027] [000000000045e6b3] <... futex resumed> ) = 1
[pid 22028] [000000000045e7d7] sched_yield( <unfinished ...>
[pid 22027] [000000000045e6b3] futex(0xc000354848, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22033] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22028] [000000000045e7d7] <... sched_yield resumed> ) = 0
[pid 22027] [000000000045e6b3] <... futex resumed> ) = 1
[pid 22033] [00000000004b446a] newfstatat(AT_FDCWD, "/etc/ssl/caddy/locks",  <unfinished ...>
[pid 22028] [000000000045e6b3] futex(0x27ae4d0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22027] [000000000045e6b3] futex(0x27b2640, FUTEX_WAIT_PRIVATE, 0, {3586, 853836672} <unfinished ...>
[pid 22033] [00000000004b446a] <... newfstatat resumed> {st_mode=S_IFDIR|0700, st_size=4096, ...}, 0) = 0
[pid 22028] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22033] [00000000004b446a] openat(AT_FDCWD, "/etc/ssl/caddy/locks/cert_acme_example.com_httpsacme-v02.api.letsencrypt.orgdirectory.lock", O_RDONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0644 <unfinished ...>
[pid 22028] [000000000045e11d] nanosleep({0, 20000},  <unfinished ...>
[pid 22033] [00000000004b446a] <... openat resumed> ) = -1 EEXIST (File exists)
[pid 22028] [000000000045e11d] <... nanosleep resumed> NULL) = 0
[pid 22033] [00000000004b446a] newfstatat(AT_FDCWD, "/etc/ssl/caddy/locks/cert_acme_example.com_httpsacme-v02.api.letsencrypt.orgdirectory.lock",  <unfinished ...>
[pid 22028] [000000000045e11d] nanosleep({0, 20000},  <unfinished ...>
[pid 22033] [00000000004b446a] <... newfstatat resumed> {st_mode=S_IFREG|0644, st_size=0, ...}, 0) = 0
[pid 22033] [000000000045e6b3] futex(0x27b2640, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22028] [000000000045e11d] <... nanosleep resumed> NULL) = 0
[pid 22033] [000000000045e6b3] <... futex resumed> ) = 1
[pid 22028] [000000000045e11d] nanosleep({0, 20000},  <unfinished ...>
[pid 22027] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22033] [000000000045e6b3] futex(0xc000354848, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 22028] [000000000045e11d] <... nanosleep resumed> NULL) = 0
[pid 22027] [000000000045e6b3] futex(0xc000354848, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22033] [000000000045e6b3] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 22028] [000000000045e11d] nanosleep({0, 20000},  <unfinished ...>
[pid 22027] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22033] [000000000045e6b3] futex(0xc000354848, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 22027] [000000000045e6b3] futex(0x27b2640, FUTEX_WAIT_PRIVATE, 0, {0, 999829394} <unfinished ...>
[pid 22028] [000000000045e11d] <... nanosleep resumed> NULL) = 0
[pid 22028] [000000000045e6b3] futex(0x27ae5d0, FUTEX_WAIT_PRIVATE, 0, {60, 0} <unfinished ...>
[pid 22027] [000000000045e6b3] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 22027] [000000000045e6b3] futex(0x27ae5d0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22028] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22027] [000000000045e6b3] <... futex resumed> ) = 1
[pid 22028] [000000000045e7d7] sched_yield( <unfinished ...>
[pid 22027] [000000000045e6b3] futex(0xc000354848, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22033] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22028] [000000000045e7d7] <... sched_yield resumed> ) = 0
[pid 22027] [000000000045e6b3] <... futex resumed> ) = 1
[pid 22033] [00000000004b446a] newfstatat(AT_FDCWD, "/etc/ssl/caddy/locks",  <unfinished ...>
[pid 22028] [000000000045e6b3] futex(0x27ae4d0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22027] [000000000045e6b3] futex(0x27b2640, FUTEX_WAIT_PRIVATE, 0, {3585, 852267403} <unfinished ...>
[pid 22033] [00000000004b446a] <... newfstatat resumed> {st_mode=S_IFDIR|0700, st_size=4096, ...}, 0) = 0
[pid 22028] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22033] [00000000004b446a] openat(AT_FDCWD, "/etc/ssl/caddy/locks/cert_acme_example.com_httpsacme-v02.api.letsencrypt.orgdirectory.lock", O_RDONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0644 <unfinished ...>
[pid 22028] [000000000045e11d] nanosleep({0, 20000},  <unfinished ...>
[pid 22033] [00000000004b446a] <... openat resumed> ) = -1 EEXIST (File exists)
[pid 22028] [000000000045e11d] <... nanosleep resumed> NULL) = 0
[pid 22033] [00000000004b446a] newfstatat(AT_FDCWD, "/etc/ssl/caddy/locks/cert_acme_example.com_httpsacme-v02.api.letsencrypt.orgdirectory.lock",  <unfinished ...>
[pid 22028] [000000000045e11d] nanosleep({0, 20000},  <unfinished ...>
[pid 22033] [00000000004b446a] <... newfstatat resumed> {st_mode=S_IFREG|0644, st_size=0, ...}, 0) = 0
[pid 22033] [000000000045e6b3] futex(0x27b2640, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22028] [000000000045e11d] <... nanosleep resumed> NULL) = 0
[pid 22033] [000000000045e6b3] <... futex resumed> ) = 1
[pid 22028] [000000000045e11d] nanosleep({0, 20000},  <unfinished ...>
[pid 22027] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22033] [000000000045e6b3] futex(0xc000354848, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 22028] [000000000045e11d] <... nanosleep resumed> NULL) = 0
[pid 22027] [000000000045e6b3] futex(0xc000354848, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22033] [000000000045e6b3] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 22028] [000000000045e11d] nanosleep({0, 20000},  <unfinished ...>
[pid 22027] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22033] [000000000045e6b3] futex(0xc000354848, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 22028] [000000000045e11d] <... nanosleep resumed> NULL) = 0
[pid 22027] [000000000045e6b3] futex(0x27b2640, FUTEX_WAIT_PRIVATE, 0, {0, 999704236} <unfinished ...>
[pid 22028] [000000000045e6b3] futex(0x27ae5d0, FUTEX_WAIT_PRIVATE, 0, {60, 0} <unfinished ...>
[pid 22027] [000000000045e6b3] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 22027] [000000000045e6b3] futex(0x27ae5d0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22028] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22027] [000000000045e6b3] <... futex resumed> ) = 1
[pid 22028] [000000000045e7d7] sched_yield( <unfinished ...>
[pid 22027] [000000000045e6b3] futex(0xc000354848, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22033] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22028] [000000000045e7d7] <... sched_yield resumed> ) = 0
[pid 22027] [000000000045e6b3] <... futex resumed> ) = 1
[pid 22033] [00000000004b446a] newfstatat(AT_FDCWD, "/etc/ssl/caddy/locks",  <unfinished ...>
[pid 22028] [000000000045e6b3] futex(0x27ae4d0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22027] [000000000045e6b3] futex(0x27b2640, FUTEX_WAIT_PRIVATE, 0, {3584, 850027712} <unfinished ...>
[pid 22033] [00000000004b446a] <... newfstatat resumed> {st_mode=S_IFDIR|0700, st_size=4096, ...}, 0) = 0
[pid 22028] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22033] [00000000004b446a] openat(AT_FDCWD, "/etc/ssl/caddy/locks/cert_acme_example.com_httpsacme-v02.api.letsencrypt.orgdirectory.lock", O_RDONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0644 <unfinished ...>
[pid 22028] [000000000045e11d] nanosleep({0, 20000},  <unfinished ...>
[pid 22033] [00000000004b446a] <... openat resumed> ) = -1 EEXIST (File exists)
[pid 22028] [000000000045e11d] <... nanosleep resumed> NULL) = 0
[pid 22033] [00000000004b446a] newfstatat(AT_FDCWD, "/etc/ssl/caddy/locks/cert_acme_example.com_httpsacme-v02.api.letsencrypt.orgdirectory.lock",  <unfinished ...>
[pid 22028] [000000000045e11d] nanosleep({0, 20000},  <unfinished ...>
[pid 22033] [00000000004b446a] <... newfstatat resumed> {st_mode=S_IFREG|0644, st_size=0, ...}, 0) = 0
[pid 22028] [000000000045e11d] <... nanosleep resumed> NULL) = 0
[pid 22033] [000000000045e6b3] futex(0x27b2640, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22028] [000000000045e11d] nanosleep({0, 20000},  <unfinished ...>
[pid 22033] [000000000045e6b3] <... futex resumed> ) = 1
[pid 22027] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22033] [000000000045e6b3] futex(0xc000354848, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 22028] [000000000045e11d] <... nanosleep resumed> NULL) = 0
[pid 22027] [000000000045e6b3] futex(0xc000354848, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22033] [000000000045e6b3] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 22028] [000000000045e11d] nanosleep({0, 20000},  <unfinished ...>
[pid 22027] [000000000045e6b3] <... futex resumed> ) = 0
[pid 22033] [000000000045e6b3] futex(0xc000354848, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 22027] [000000000045e6b3] futex(0x27b2640, FUTEX_WAIT_PRIVATE, 0, {0, 999722753} <unfinished ...>
[pid 22028] [000000000045e11d] <... nanosleep resumed> NULL) = 0

8. Why is this a bug, and how do you think this should be fixed?

9. What are you doing to work around the problem in the meantime?

  1. remove anything inside CADDYPATH which is /etc/ssl/caddy in my case
  2. replace /usr/local/bin/caddy with an older(v0.11.5) one
  3. setcap CAP_NET_BIND_SERVICE=+eip /usr/local/bin/caddy to allow the new caddy listens on lower port
  4. systemctl restart caddy

it works now.

10. Please link to any related issues, pull requests, and/or discussion.

Bonus: What do you use Caddy for? Why did you choose Caddy?

@mholt
Copy link
Member

mholt commented Apr 11, 2019

Was Caddy forcefully quit at all within the last 2 hours? Esp. during this phase?

@tevino
Copy link
Author

tevino commented Apr 12, 2019

I don't think so.

Caddy was always stopped by systemctl stop caddy in this case, if it ever got stopped.

However, eventually systemd may quit Caddy forcefully if it hangs too long after the stop.

@mholt
Copy link
Member

mholt commented Apr 12, 2019

My guess is that Caddy is waiting on some locks (in the $CADDYPATH/acme/locks folder) to be released. This seems likely to me... and locks are designed to expire after 2 hours because this can be a problem sometimes. I'll see if we can improve on this, but in the future, it's much better to just delete the locks folder if you are sure that no other Caddy instances which access that folder are running.

Thanks for the detailed report!

@tevino
Copy link
Author

tevino commented Apr 12, 2019

My guess is that Caddy is waiting on some locks (in the $CADDYPATH/acme/locks folder) to be released.

I had the same thought, and yes, there are no other Caddy instances.

I've tried to remove the locks and other files in $CADDYPATH to make Caddy work, but after that the rate limit of Let's encrypt got reached, then I never dug deeper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants