-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a "survailance system" for detecting anomalies #4
Comments
For nodes we have finished an alarm system that can send email when a nodes modem stops working and/or a node is gone (alarms trigger is the same intervals as for the node and link colors in the maintenance portal/inventory interface). This is probably only useful for stationary nodes though. The alarms can be sent directly to both users of the maintenance portal and the contact for the node. We also have slack integration but not sure if I broke it on the last round of changes or not, if someone is using slack for monitoring I can have a look, if not I will spend the time on other things. Changes will be pushed once I'm back in Norway. |
I think it would also be good to have a human readable overview (in the visualization?) over which nodes are running the base experiments and reporting results. If we have to check the database, then no one will check with the frequency we need. |
Can your system be extended to report the running docker containers, how many files where last sycnhed etc ? I do not think slack integration is top priority right now, in my view an minimal alert system that can inform me (monroe-devel?) that things have gone pear-shaped without me having to monitor a surveillance system (which I will forget) would satisfy me needs (other may have other needs). |
The thing with alert systems is that you have to train them what your pears look like. |
I understand the problem but to start with I think we have pretty easy to Top of my head I come up with these :
2016-08-30 9:27 GMT+02:00 Thomas Hirsch notifications@github.com:
Jonas Karlsson Karlstad University twitter.com/kau |
None of these will be implemented in the inventory alert system, as the inventory alert system is only concerned with node state (i.e., node up, modems up + connectivity).. However, implementing your own alert system shouldn't be do hard. Doesnt Cassandra have all these nice triggers/events you can listen to? |
Yes, I think you should be able to monitor all of these from the database side. The nodes can send SYSEVENT metadata, e.g. whenever they try to restart a container (when it is not up). If you get these events once a minute, the container is crashing. |
Be careful depending on timing ("once a minute"), since you might a slow, congested, ... connection. It is better to say not received withtin X. |
I can write a small script to count the number of pings/https/etc that On 30-Aug-16 09:27, Thomas Hirsch wrote:
Miguel Peón-Quirós |
Maybe you can also list which nodes inserted the entries so we get a 2016-08-30 12:37 GMT+02:00 mikepeon-imdea notifications@github.com:
Jonas Karlsson Karlstad University twitter.com/kau |
I am getting more and more found of your @mikepeon-imdea idea of extracting the data out of the (cassandra) database. The node tests might still be interesting from a debugging viewpoint but if we can get a alert system based on the data we import that would be great. |
I'll work on it a bit today and tomorrow... On 30-Aug-16 16:30, Jonas Karlsson wrote:
Miguel Peón-Quirós |
"As described on the mailing list" The script parses the database and extract the data inserted into the db by node/operator and validates the timestamps against what it should be (gps information, modem metadata, HTTP download and RTT results for 3 operators etc) for the specified timespan. |
I think a mail or similar should go out if eg:
1 . We receive or import no "jsons" in an hour
2. Diskspace runs out on the nodes
3. The "well known" docker containers are not running
4. ....
Lets discuss options.
For point 1, I can easily implement this in the importer and I am sure the other checks are juts as easily implemented (although on the nodes we might have trouble sending emails if we have no internet connectivity, or maybe can we get some of this info from the inventory ?)
The other way is to implement some more generic option that parse the database and search for the relevant info, if it exist in the db(s).
The text was updated successfully, but these errors were encountered: