Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

METRON-670 Monit Incorrectly Reports Status #422

Closed
wants to merge 1 commit into from

Conversation

nickwallen
Copy link
Contributor

In a constrained environment, like 'Quick Dev', Monit will often incorrectly report the status of a Metron topology. This occurs when the environment is under load and a query of topology status exceeds the default timeout of 30 seconds.

Added a parameter so that the timeout for a status check can be extended under these conditions. This was previously done for starting and stopping a topology, but not for a status checks.

This was tested in 'Quick Dev' and made starting, stopping and reporting status of the topologies using Monit work much better. Previously Monit would erroneously report some of the topologies as not running when they were. This would also interfere with your ability to start/stop the same topologies.

For example, starting all of the services required to consume Bro telemetry works much better with this change.

monit -g bro start

@ottobackwards
Copy link
Contributor

There are also problems with shutting down topologies, since we don't pass in the wait to storm ( at least we didn't. I need to find the pr/jira for that. All external calls from monit need to have clear timeout accounting from both monit's pov and the external agent

@justinleet
Copy link
Contributor

+1, spun this up in quick dev, and it seems to work well and the UI reports the timeout on status is 60 seconds. Thanks for grabbing this.

@dlyle65535
Copy link
Contributor

All of these files will go away as a result of PR-436. Would you guys be willing to hold off until I can get it completed and through the process?

@justinleet
Copy link
Contributor

@dlyle65535 I'm fine with holding off on it. I wasn't sure of the timing on that, and I have been annoyed by this issue in my own testing in the interim.

@nickwallen You okay with holding off on this?

@nickwallen
Copy link
Contributor Author

I found myself having to manually do this all the time. I thought it might be worthwhile to put the fix in, so that at least we have a record of it working at some point in time. Then after @dlyle65535 gets his PR done, he can remove it.

I don't have a strong opinion either way though. Meh.

@dlyle65535
Copy link
Contributor

It creates a bunch of rebase conflicts for me when I bring it in. Fwiw, I've got one more hard problem + a review until I'm done.

@nickwallen
Copy link
Contributor Author

No longer needed.

@nickwallen nickwallen closed this Mar 17, 2017
@nickwallen nickwallen deleted the METRON-670 branch June 5, 2017 19:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants