Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout support to SilentProcessRunner via special variable to terminate stuck processes #468

Closed
wants to merge 3 commits into from

Conversation

miguelelvir
Copy link

Could use a documentation update if accepted?

Issue

Tentacles wait indefinitely on actions which run scripts. For the most part, this is desirable by design and handled here. There are many valid cases were a timeout on scripts should be supported to prevent tentacles from being indefinitely stuck processing action scripts (and thus preventing other operations such as clean up via Script Console, Runbooks, or other deployments).

One example of this is described in this SO post and this IIS post when calling CommitChanges on IIS's ServerManager objects. This is important for Octopus scripts such as IISWebSite_BeforePostDeploy that utilize the WebAdministration PowerShell module and are run on systems with high load. Cmdlets from this module internally call appcmd, which then calls the CommitChanges method of the ServerManager object. CommitChanges and appcmd may exit before the Set-ItemProperty method has called a Wait on it. As a result, random deployments can get stuck waiting for appcmd.
We have seen this behavior in many site deployments when using virtual paths.

Samples

We captured some sample dumps and images showing this behavior. Images are included below:

Debugging Calamari dump showing that it gets stuck in SilentProcessRunner

Calamari Shared_WaitsForever

Powershell dump showing that it gets stuck in Set-ItemProperty setting a virtual path

powershell_IISWebSite_BeforePostDeploy_Set-ItemProperty_VirtualPath_Stuck

ProcessExplorer investigation to confirm IISWebSite_BeforePostDeploy is stuck on appcmd.
ProcessExplorerTree

PR Changes

This PR adds the special variable Octopus.Action.Script.Timeout to allow a millisecond timeout to be defined via a process's variables that will be used to limit the amount of time a process can wait. If no timeout is defined, the default behavior of waiting indefinitely will be used. When it is defined, a message will be displayed on the Verbose log indicating a timeout value was used.
If the timeout is hit

Tests

A test was added using Windows' Timeout command to test that the timeout values are honored. Also deployed a Calamari package with the changes internally to our deployment server and confirmed it honors the timeout values when present.

Output after timeout (redacted):
image

Copy link
Contributor

@droyad droyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. A few changes please.

@droyad
Copy link
Contributor

droyad commented Jan 22, 2020

These kinds of timeouts have a habit of not working on Linux, a test should uncover that though.

…on workflow.

Add *Nix test to CommandLineRunnerFixture that is functional equivalent of Windows timeout test.
@miguelelvir
Copy link
Author

These kinds of timeouts have a habit of not working on Linux, a test should uncover that though.

Do you mean the timeout test command or the use of timeout in the Process.WaitForExit call? In *Nix the timeout one does indeed behave different, so the test would have failed. If needed, I'll have to spin up a linux tentacle container to test the Process.WaitForExit though.

@miguelelvir
Copy link
Author

Thanks for the contribution. A few changes please.

Thanks for taking the time @droyad :) Really appreciate it.

Copy link
Author

@miguelelvir miguelelvir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made requested changes

@CLAassistant
Copy link

CLA assistant check
All committers have signed the CLA.

@johnsimons johnsimons removed their request for review August 31, 2020 23:01
@droyad droyad removed their request for review September 24, 2020 23:00
@miguelelvir
Copy link
Author

Closing in favor of #688

Copy link
Contributor

@droyad droyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I commented but didn't submit

@@ -149,11 +152,21 @@ static SilentProcessRunner()
process.BeginOutputReadLine();
process.BeginErrorReadLine();

process.WaitForExit();
// TimeSpan.TotalMilliseconds can have a value > int.MaxValue, so we just assume wait for ever if this happens.
var timeoutMilliseconds = timeout == null || timeout.Value.TotalMilliseconds > int.MaxValue ? -1 : (int)(timeout.Value.TotalMilliseconds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work:

Suggested change
var timeoutMilliseconds = timeout == null || timeout.Value.TotalMilliseconds > int.MaxValue ? -1 : (int)(timeout.Value.TotalMilliseconds);
var timeoutMilliseconds = (int) (timeout ?? Timeout.InfiniteTimeSpan).TotalMilliseconds;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout.InfiniteTimeSpan is -1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants