Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polling worker Tentacles are slower when running many concurrent deployments (or concurrent steps). #8670

Closed
LukeButters opened this issue Mar 6, 2024 · 3 comments
Assignees
Labels
kind/bug This issue represents a verified problem we are committed to solving

Comments

@LukeButters
Copy link

LukeButters commented Mar 6, 2024

Severity

No response

Version

2023.3.8103 (with feature toggle) or since at least 2023.4.1479

Latest Version

None

What happened?

The problem

When running concurrent deployments which all use a single polling worker, deployments take longer.

Context:

  1. Polling tentacles are limited to a single TCP connection to Octopus Server and can do at most one RPC at a time.
  2. ScriptServiceV2 introduced a concept of waiting for the script to finish. see also PR
  3. Octopus is using that durationToWaitForScriptToFinish with a value of 5s.
  4. Tentacle workers usually run many scripts concurrently for many deployments concurrently.

Cause

The delay of 5s set on durationToWaitForScriptToFinish results in a bottle neck on starting scripts where multiple deployments may send a script to be executed by the worker, but must wait upto 5s for each other script the polling worker is starting.

Suggested Fix

Set durationToWaitForScriptToFinish to null when starting scripts on polling workers.

Reproduction

Create many deployments to a single polling worker, which have a mix of many steps bit each step takes longer than 5 secomds.

Error and Stacktrace

Start octopus with env var: OCTOPUS__Feature__LogTentacleRpcTimedOperationsWhenLongerThan_ms=0

Run the deployments as above.

Look for logs like:

Halibut RPC to tentacle calling IScriptServiceV2.StartScript succeeded after 11321ms.

That took 11.3s, which suggest it had to wait for two other scripts to start (2 x 5s) and then itself took 1.3s to run the script resulting in 10x worse performance.

More Information

No response

Workaround

Configure the polling tentacle to poll octopus server over many TCP connections by specifying other urls/ports to poll. For hosted users that can additionally poll the standard 443 port

by configuring their tentacle with:

/path/to/tentacle/Tentacle poll-server --server https://<yoururl>.octopus.app --apikey=API-APIKEY01 --server-comms-address "https://polling.<yoururl>.octopus.app" --server-comms-port=443
@LukeButters LukeButters added the kind/bug This issue represents a verified problem we are committed to solving label Mar 6, 2024
@LukeButters
Copy link
Author

LukeButters commented Jun 5, 2024

Fixed by: https://github.com/OctopusDeploy/OctopusDeploy/pull/23234 available since: 2024.2.2393

@LukeButters
Copy link
Author

Related to: #8860

@Octobob
Copy link
Member

Octobob commented Jun 12, 2024

🎉 The fix for this issue has been released in:

Release stream Release
2024.2 2024.2.2393
2024.3+ all releases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This issue represents a verified problem we are committed to solving
Projects
None yet
Development

No branches or pull requests

3 participants