New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SensorCacheJob issue #67
Comments
This has been an ongoing issue for me. I worked with mephux a bit on IRC awhile back and it sounded like the issue might be that I have a large number of alerts in my database (millions). I looked at the stack traces and tracked the problem to the fetch_src_ip_metrics and fetch_dst_ip_metrics functions in lib/snorby/jobs/cache_helper.rb. For me, the issue is on the second-to-last line of these two functions:
It seems that there are cases where x.ip can be nil, so I get errors about the ip_src method not existing on a NilClass object. I didn't dig too deeply to find out the cause, but I implemented the following to fix this (and my Ruby/Rails skills are very rusty, so I'm sure this is not an optimal fix; also by the way this is against git version 2.2.5): In lib/snorby/jobs/cache_helper.rb, I changed the above-mentioned line to the following sequence (and also to the corresponding line in the fetch_dst_ip_metrics function):
This may be redundant, but I also commented out some lines in the load function in lib/snorby/model/types/numeric_ip_addr.rb in the interest of making absolutely sure that the NumericIPAddr variables inside of the IP objects belonging to Event objects are never instantiated to nil:
Just for good measure, I also setup cronjobs to run every minute and restart any of the workers that may have crashed, and email me whenever this happens. Since implementing the two fixes above, I have not seen a crash, however the worker processes do use up a lot of CPU and RAM now (usually 100% of one CPU for each one, and I've seen sustained memory usage regularly spike above 8gb - for the time being I just moved my snorby install over to its own VM with a ton of RAM and four virtual CPUs and things seem to be all good now). On the plus side, my dashboard is now stable and I've got pretty graphs to show to management. Our CISO loves the dashboard by the way, and one of his first feature requests was of course to make the graphs clickable so that you can interactively drill down into the event data (already open as an earlier feature request) :-) |
I may have spoken too soon... after some further rule tuning I decided to clear out the cache using "Snorby::Jobs.clear_cache(true)" and let things rebuild. Unfortunately this seems to have wiped out my events too (at least in Snorby... they still seem to exist in my Snort database). [EDIT: The events are back now after the workers ran for a bit, the rest is still rebuilding] Since then, the worker jobs have crashed a few more times leaving my dashboard nearly empty; this time with a different error that I'm not sure how to fix - I've increased the size of the relevant fields in mysql from TEXT to MEDIUMTEXT to no avail:
|
This issue has been fixed in Snorby 2.3.1 |
Hi,
I have a problem with SensorCacheJob. When this job should start it is going down and I have to turn on it manually in administration menu. So, I dont have actual dashboard. Please help.
Thanks
The text was updated successfully, but these errors were encountered: