Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the time it takes for Octopus Server to determine a network connections to Tentacle is dropped. #8517

Closed
LukeButters opened this issue Dec 5, 2023 · 2 comments
Assignees

Comments

@LukeButters
Copy link

LukeButters commented Dec 5, 2023

The enhancement

The Need

Octopus Server and Tentacle establish long-lived TCP connections to transfer Requests and Responses. The timeouts for Reads and Writes over the TCP connection can be up to 10 minutes by default.

If the TCP connection between Tentacle and Server fails and is not closed cleanly by both parties (FIN or RST packets are not sent and received), it can take up to 10 minutes for Tentacle to detect the TCP connection has failed and to terminate a waiting Read or Write. We refer to a connection in this state as dropped.

This leads to undesirable delays when processing Requests and Responses. For example it can take 10 minutes for the Octopus Server to detect the dropped connection. Since 10 minutes exceeds the default RPC retry duration of 2.5 minutes, the communication to Tentacle will not be retried resulting in a failed Deployment, Runbook or Health Check.

Solution

Recently new lower TCP read/write timeouts were added to halibut. These settings enabling Octopus Server to detected a dropped connections in less time, allowing a OctopusServer to potentially recover from a dropped connection by retrying the RPC call.

This is available using an opt-in feature toggle since version 2024.1.3308

Links

@LukeButters LukeButters self-assigned this Dec 5, 2023
@LukeButters LukeButters changed the title Reduce the time it takes for Octopus Server to determine a network connections to Tentacle are dropped, by using Halibut's new recommended timeouts. Reduce the time it takes for Octopus Server to determine a network connections to Tentacle are dropped. Dec 5, 2023
@LukeButters LukeButters changed the title Reduce the time it takes for Octopus Server to determine a network connections to Tentacle are dropped. Reduce the time it takes for Octopus Server to determine a network connections to Tentacle is dropped. Dec 5, 2023
@octoreleasebot
Copy link

Release Note: Improved detection of Network Errors when communicating with Tentacles, allowing Octopus Server to re-attempt communication.

@Octobob
Copy link
Member

Octobob commented Feb 19, 2024

🎉 The fix for this issue has been released in:

Release stream Release
2024.1 2024.1.8781
2024.2+ all releases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants