Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[#376] Disable SSL_MODE_AUTO_RETRY due to SSL_read() hangs since TLS …
…1.3 sends more non-application data records after handshake The online_rollback/gtm8421 subtest (and online_rollback/basic subtest) failed on systems where TLS 1.3 was in use in the system installed openssl library (OpenSSL 1.1.1). The real issue seems to be a hang in the source server. An strace showed the below C-stack at the time of the read() system call hang. The read() call eventually got interrupted by a timer (after 15 seconds) and so returned back from SSL_read() to the caller gtm_tls_recv() but it is effectively a hang. /usr/lib/libc-2.28.so(__read+0x15) [0xec775] /usr/lib/libcrypto.so.1.1(BIO_sock_should_retry+0x60) [0xb0100] /usr/lib/libcrypto.so.1.1(BIO_number_written+0xba) [0xab33a] /usr/lib/libcrypto.so.1.1(ERR_load_BIO_strings+0x1b3) [0xaa1c3] /usr/lib/libcrypto.so.1.1(BIO_read+0x23) [0xaa793] /usr/lib/libssl.so.1.1(SSL_rstate_string+0x270) [0x22500] /usr/lib/libssl.so.1.1(SSL_rstate_string+0x448b) [0x2671b] /usr/lib/libssl.so.1.1(SSL_rstate_string+0x1e1f) [0x240af] /usr/lib/libssl.so.1.1(SSL_rstate_string+0x92c6) [0x2b556] /usr/lib/libssl.so.1.1(SSL_get_default_timeout+0x8c) [0x361ac] /usr/lib/libssl.so.1.1(SSL_read+0x24) [0x362b4] plugin/libgtmtls.so(gtm_tls_recv+0xaf) [0xb613] libyottadb.so(intrsafe_gtm_tls_recv+0x62) [0x4eca31] libyottadb.so(repl_recv+0x184) [0x3b9fba] libyottadb.so(gtmsource_recv_ctl+0x2e7) [0xc0b70] libyottadb.so(gtmsource_process+0x35ec) [0xc5345] libyottadb.so(gtmsource+0x275e) [0xbb27c] libyottadb.so(mupip_main+0x229) [0x2e1a3] mupip(dlopen_libyottadb+0x6bd) [0x1917] mupip(main+0x3b) [0x1244] /usr/lib/libc-2.28.so(__libc_start_main+0xf3) [0x24223] mupip(_start+0x2e) [0x113e] ------------- https://wiki.openssl.org/index.php/TLS1.3 says the following about TLS 1.3. TLSv1.3 sends more non-application data records after the handshake is finished. At least the session ticket and possibly a key update is send after the finished message. With TLSv1.2 it happened in case of renegotiation. SSL_read() has always documented that it can return SSL_ERROR_WANT_READ after processing non-application data, even when there is still data that can be read. When SSL_MODE_AUTO_RETRY is set using SSL_CTX_set_mode() OpenSSL will try to process the next record, and so not return SSL_ERROR_WANT_READ while it still has data available. Because many applications did not handle this properly, SSL_MODE_AUTO_RETRY has been made the default. If the application is using blocking sockets and SSL_MODE_AUTO_RETRY is enabled, and select() is used to check if a socket is readable this results in SSL_read() processing the non-application data records, but then try to read an application data record which might not be available and hang. ------------- The YottaDB TLS plugin had previously enabled SSL_MODE_AUTO_RETRY. While going through the code, I do not see any reason why this is necessary as the caller code of gtm_tls_recv() (e.g. repl_recv()) seems to be coded to handle the GTMTLS_WANT_READ and GTMTLS_WANT_WRITE return codes (which are TLS plugin names for the underlying SSL error codes SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE). YottaDB uses blocking sockets for the replication servers as well as for the SOCKET device. Now that we know TLS 1.3 can cause hangs in SSL_read(), SSL_MODE_AUTO_RETRY is disabled. Note that it is not enough to remove the SSL_CTX_set_mode() call that previously enabled SSL_MODE_AUTO_RETRY as this is the default in TLS 1.3. Hence the SSL_CTX_set_mode() call replacement with SSL_CTX_clear_mode().
- Loading branch information