Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature idea: signal failure if command completes too quickly #5

Closed
cuu508 opened this issue May 13, 2020 · 5 comments
Closed

Feature idea: signal failure if command completes too quickly #5

cuu508 opened this issue May 13, 2020 · 5 comments

Comments

@cuu508
Copy link
Contributor

cuu508 commented May 13, 2020

Sometimes apps and scripts fail early, but still return exit code 0. And legacy systems can be hard to fix.

Perhaps there could be an optional feature where runitor measures the execution time of the command, and signals failure if the command completes too quickly? Something like:

# signals success
runitor -min-time=5s -uuid 6e1fbf8f-c17e-4749-af44-0c81461bdd19 -- sleep 1

# signals failure
runitor -min-time=5s -uuid 6e1fbf8f-c17e-4749-af44-0c81461bdd19 -- sleep 6
@cuu508
Copy link
Contributor Author

cuu508 commented May 13, 2020

PS. This could also be done on server side, there's a ticket for that: healthchecks/healthchecks#236

One benefit of doing it on client side is more precise timing. HTTP requests have latency – if you measure second-long events using HTTP requests that can also sometimes take seconds, you'll sometimes get false positives and false negatives.

@bdd
Copy link
Owner

bdd commented May 13, 2020

Huh. Initially I thought about a possible max-runtime feature to catch runaway processes but that's not very Unix when we already have the right tool for it, timeout from GNU coreutils.

This one is the other way around. I'm curious, how much of a common problem is this?

bdd added a commit that referenced this issue May 14, 2020
First pass implementation of the feature idea from #5.

README.md and doc.go still needs updates.
bdd added a commit that referenced this issue May 14, 2020
First pass implementation of the feature idea from #5.

README.md and doc.go still needs updates.
@bdd
Copy link
Owner

bdd commented May 14, 2020

Here's a one minute implementation e0af841

Honestly it doesn't feel like this feature fits in along with the others. Other than stdout & stderr routing, runitor just implements healthchecks.io Ping API features. Nothing less, nothing more.

Would you consider client measured run duration to be passed as a parameter for success pings (and for symmetry also failure)? This way users can define "inverse of grace period" as mentioned in healthchecks/healthchecks#236, and the act of lifting the signal to failure is done on the server side.

@cuu508
Copy link
Contributor Author

cuu508 commented May 14, 2020

I'm curious, how much of a common problem is this?

It's not a common request. It has been requested a few times (in #236 and in email). Measuring run time on client was never explicitly suggested, that was my idea.

Also, if you have a legacy environment where you cannot fix the exit code, chances are you also cannot add a wrapper (runitor) around your command.

I think this feature would fit in if you're OK with Swiss army knife kind of a tool. Various niche but sometimes useful features (examples: uwsgi, curl, ImageMagick, caddy). If you'd rather keep it small and focused, then it's not a good fit.

Would you consider client measured run duration to be passed as a parameter for success pings (and for symmetry also failure)?

The API currently doesn't support that, but that could be added. Reported execution time would take priority over the execution time measured on server. Dead Man Snitch's API and Field Agent works like that (no start signal, execution time measured on client and reported after the job completes).

@cuu508
Copy link
Contributor Author

cuu508 commented Oct 27, 2022

Honestly it doesn't feel like this feature fits in along with the others.

Yep, and there's always the option to wrap the legacy app in a custom shell script, with custom success/failure testing criteria (how much time it took? what output did it produce? did it produce file X on the filesystem? etc.).

@cuu508 cuu508 closed this as not planned Won't fix, can't repro, duplicate, stale Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants