Skip to content

Add flag to treat SIGHUP/SIGUSR1 as termination#269

Closed
alangefe wants to merge 1 commit intoCESNET:devel-serverfrom
alangefe:sighup-term
Closed

Add flag to treat SIGHUP/SIGUSR1 as termination#269
alangefe wants to merge 1 commit intoCESNET:devel-serverfrom
alangefe:sighup-term

Conversation

@alangefe
Copy link
Copy Markdown
Contributor

netopeer2-server's signal handler currently attempts to restart netopeer2-server if it receives SIGHUP or SIGUSR1. Unfortunately, we've noticed several problems with the server restart mechanism. For example, libssh's thread.c does not properly clean up its state, so netopeer2-server crashes with a segmentation fault when restarting.

This pull request does not attempt to address any of the problems with the server restart mechanism. Instead, it adds an optional configuration flag DISABLE_RESTART_SIGNALS that allows us to treat SIGHUP and SIGUSR1 as signals that will terminate the server instead of restarting it. The flag is disabled by default, so the existing behavior remains unchanged.

`netopeer2-server`'s signal handler currently attempts to restart
netopeer2-server if it receives `SIGHUP` or `SIGUSR1`. Unfortunately, we've
noticed several problems with the server restart mechanism. For example,
libssh's thread.c does not properly clean up its state, so netopeer2-server
crashes with a segmentation fault when restarting.

This pull request does not attempt to address any of the problems with the
server restart mechanism. Instead, it adds an optional configuration flag
`DISABLE_RESTART_SIGNALS` that allows us to treat `SIGHUP` and `SIGUSR1` as
signals that will _terminate_ the server instead of restarting it. The flag
is disabled by default, so the existing behavior remains unchanged.
michalvasko added a commit to CESNET/libnetconf2 that referenced this pull request Jun 1, 2018
michalvasko added a commit that referenced this pull request Jun 1, 2018
@michalvasko
Copy link
Copy Markdown
Member

Hi Andrew,
there were also some problems in libnetconf2 and netopeer2-server which should be solved now so I successfully restarted the server. I am using libssh 0.7.5 and valgrind did not detect any memory problems but it can depend also on OpenSSL version and I know these problems occur in libssh.

On the other hand, we would not want to add a lot of options with little use so may I ask what exactly is the problem? That the restart does not work perfectly? Or you do not want to allow it at all? Based on libssh mailing list I believe that the new version should fix many bugs and especially be compatible with most recent OpenSSL even though we may still wait for it for quite some time.

Regards,
Michal

@alangefe
Copy link
Copy Markdown
Contributor Author

alangefe commented Jun 1, 2018

Hi Michal,

Our main concern is the libssh initialization problem after the restart. We don't presently have a need for the server restart on SIGHUP/SIGUSR1, so we were opting for a clean termination instead of the libssh segfault.

The libssh problem is replicable when using libssh 0.7.5 and the openssl 1.0.2 branch. Here's a stack trace of the crash after server restart:

ssh_pthread_mutex_lock(void ** lock) (libssh/src/threads/pthread.c:70)
libcrypto_lock_callback(int mode, int i, const char * file, int line) (libssh/src/threads.c:113)
CRYPTO_lock(int mode, int type, const char * file, int line) (openssl/crypto/cryptlib.c:596)
int_err_get(int create) (openssl/crypto/err/err.c:360)
int_err_set_item(ERR_STRING_DATA * d) (openssl/crypto/err/err.c:406)
err_load_strings(int lib, ERR_STRING_DATA * str) (openssl/crypto/err/err.c:674)
ERR_load_ERR_strings() (openssl/crypto/err/err.c:661)
ERR_load_crypto_strings() (openssl/crypto/err/err_all.c:114)
SSL_load_error_strings() (openssl/ssl/ssl_err2.c:66)
nc_ssh_tls_init() (libnetconf2/src/session.c:1422)
nc_init() (libnetconf2/src/session.c:1478)
nc_server_init(struct ly_ctx * ctx) (libnetconf2/src/session_server.c:476)
server_init() (netopeer2/server/main.c:951)
main(int argc, char ** argv) (netopeer2/server/main.c:1331)

ssh_pthread_mutex_lock() is being called with an uninitialized lock, namely, libssh's libcrypto_mutxes in its threads.c. This is the access that causes the segfault.

Looking upstack, we can see that ssh_init() hasn't been called yet. It's about to be called as soon as the OpenSSL library initialization (like SSL_load_error_strings()) is done.

Regards,
Andrew

michalvasko added a commit that referenced this pull request Jun 4, 2018
This feature required proper init/cleanup of libssh and
OpenSSL which was not achieved with all version combinations.
System scripts that may need to restart netopeer2 can
simply terminate it and start it again.

Fixes #269
@michalvasko
Copy link
Copy Markdown
Member

Hi Andrew,
you are right, I tried to fix it but it seems there would be changes needed in libssh. So, we decided to remove the feature.

Regards,
Michal

@michalvasko michalvasko closed this Jun 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants