Fix certbot process getting stuck in dstack-proxy#2143
Conversation
**Problem**: if obtaining a certificate times out, `subprocess.run` cancels the certbot command with SIGKILL. However, the command is run with sudo, so SIGKILL actually kills the sudo process, while its child certbot process is adopted by init and keeps running indefinitely. This breaks adding or updating services on the gateway, since certbot refuses to run if another certbot process exists. Attempts to implement graceful cancelling by sending SIGTERM to the sudo process were not successful - for some reason sudo ignores SIGTERM originating from its parent. The solution in this commit uses the `timeout` shell command to set the timeout on the actual certbot process instead of the sudo process.
|
I think the core reason why
|
|
I tried it first, but it didn't work for some reason.
I can terminate the sudo process manually by sending SIGTERM from a shell, but when the proxy application was trying to do the same thing with |
**Problem**: if obtaining a certificate times out, `subprocess.run` cancels the certbot command with SIGKILL. However, the command is run with sudo, so SIGKILL actually kills the sudo process, while its child certbot process is adopted by init and keeps running indefinitely. This breaks adding or updating services on the gateway, since certbot refuses to run if another certbot process exists. Attempts to implement graceful cancelling by sending SIGTERM to the sudo process were not successful - for some reason sudo ignores SIGTERM originating from its parent. The solution in this commit uses the `timeout` shell command to set the timeout on the actual certbot process instead of the sudo process.
**Problem**: if obtaining a certificate times out, `subprocess.run` cancels the certbot command with SIGKILL. However, the command is run with sudo, so SIGKILL actually kills the sudo process, while its child certbot process is adopted by init and keeps running indefinitely. This breaks adding or updating services on the gateway, since certbot refuses to run if another certbot process exists. Attempts to implement graceful cancelling by sending SIGTERM to the sudo process were not successful - for some reason sudo ignores SIGTERM originating from its parent. The solution in this commit uses the `timeout` shell command to set the timeout on the actual certbot process instead of the sudo process.
**Problem**: if obtaining a certificate times out, `subprocess.run` cancels the certbot command with SIGKILL. However, the command is run with sudo, so SIGKILL actually kills the sudo process, while its child certbot process is adopted by init and keeps running indefinitely. This breaks adding or updating services on the gateway, since certbot refuses to run if another certbot process exists. Attempts to implement graceful cancelling by sending SIGTERM to the sudo process were not successful - for some reason sudo ignores SIGTERM originating from its parent. The solution in this commit uses the `timeout` shell command to set the timeout on the actual certbot process instead of the sudo process.
Problem: if obtaining a certificate times out,
subprocess.runcancels the certbot command withSIGKILL. However, the command is run with sudo, so SIGKILL actually kills the sudo process, while its child certbot process is adopted by init and keeps running indefinitely. This breaks adding or
updating services on the gateway, since certbot
refuses to run if another certbot process exists.
Attempts to implement graceful cancelling by
sending SIGTERM to the sudo process were not
successful - for some reason sudo ignores SIGTERM
originating from its parent.
The solution in this commit uses the
timeoutshell command to set the timeout on the actual
certbot process instead of the sudo process.
#1595