Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #12003] Hang in TlsStream::Handshake #3902

Closed
icinga-migration opened this issue Jun 21, 2016 · 4 comments

Comments

Projects
None yet
1 participant
@icinga-migration
Copy link
Member

commented Jun 21, 2016

This issue has been migrated from Redmine: https://dev.icinga.com/issues/12003

Created by spjmurray on 2016-06-21 14:42:06 +00:00

Assignee: spjmurray
Status: Resolved (closed on 2016-06-22 07:25:52 +00:00)
Target Version: 2.5.0
Last Update: 2016-06-22 07:25:52 +00:00 (in Redmine)

Icinga Version: 2.4.10
Backport?: Not yet backported
Include in Changelog: 1

This is the annoying issue that has plagued me due to restarting an icinga2 satellite heavily under load during log replay. What I've managed to derive from a GDB session:

  • Agent establishes TCP connection to parent
  • Parent goes down uncleanly/FIN packet never arrives
  • Agent ApiListener thread is sat waiting for the TLS handshake to complete/fail
  • All SocketEvents threads are sat happily in epoll_wait() - why the hell the established socket isn't ever ready for POLLOUT I have no idea, feel free to discuss

What I propose: enable keep alive packets on all TcpSockets. Hopefully this will generate EPOLLERR, SSL_do_handshake will fail in TlsStream::OnEvent, The next iteration of ApiListener::ApiTimerHandler should work... probably maybe :)

I've got plenty of debug should you need it, plus for a short while I have the GDB session still open.

Attachments

Changesets

2016-06-22 07:25:00 +00:00 by spjmurray e3645aa

Fix hanging API connections

There was a problem identified where an upstream API connection was found hanging waiting
for a TLS handshake to complete.  Seeingly the TCP connection was ESTABLISHED locally but
not cleanly terminated remotely.  The Socket events layer never triggered the TLS handshake
oddly.  This however enables TCP keep alive packets to detect broken connections, raising
EPOLLERR and breaking the deadlock condition so that the agent will attempt to reconnect
at a later time.

fixes #12003

Signed-off-by: Gunnar Beutner <gunnar.beutner@netways.de>

2016-07-05 11:16:14 +00:00 by mfriedrich 85afec8

Fix setsockopt() error on Windows

refs #12003

Relations:

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Jun 21, 2016

Updated by spjmurray on 2016-06-21 14:48:01 +00:00

  • File added 0001-Fix-Hanging-API-Connections.patch

From ae4f933cda89b3a4530c87f0fe673fccebce9aec Mon Sep 17 00:00:00 2001
From: Simon Murray <spjmurray@yahoo.co.uk>
Date: Tue, 21 Jun 2016 15:46:53 +0100
Subject: [PATCH] Fix Hanging API Connections

There was a problem identified where an upstream API connection was found hanging waiting
for a TLS handshake to complete. Seeingly the TCP connection was ESTABLISHED locally but
not cleanly terminated remotely. The Socket events layer never triggered the TLS handshake
oddly. This however enables TCP keep alive packets to detect broken connections, raising
EPOLLERR and breaking the deadlock condition so that the agent will attempt to reconnect
at a later time.

refs #12003

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Jun 22, 2016

Updated by gbeutner on 2016-06-22 07:25:45 +00:00

  • Status changed from New to Assigned
  • Assigned to set to spjmurray
  • Target Version set to 2.5.0
@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Jun 22, 2016

Updated by spjmurray on 2016-06-22 07:25:52 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset e3645aa.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Jun 22, 2016

Updated by mfriedrich on 2016-06-22 09:42:50 +00:00

  • Relates set to 11865
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.