Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dashboard / Caching #54

Closed
corelanc0d3r opened this issue Feb 28, 2011 · 48 comments
Closed

Dashboard / Caching #54

corelanc0d3r opened this issue Feb 28, 2011 · 48 comments

Comments

@corelanc0d3r
Copy link

it looks like the dashboard / caching is not really working stable for me.
Events are delivered to the database (table events), and I can see them in the events view in snorby.
The worker processes are running... but the dashboard does not get updated.

When I remove the jobs & add them again, nothing changes.

When I clear the caches table, remove the jobs & add them again, the cache is rebuilt.
After the cache is rebuilt, it does not seem to update anymore.

How can I troubleshoot this ?

@djcas9
Copy link
Contributor

djcas9 commented Feb 28, 2011

@corelanc0d3r the cache is built in 30 minute increments. Can you check the caches/daily_caches table and make sure the timestamps are correct and in the proper timezone.

@corelanc0d3r
Copy link
Author

timestamps in caches are correct and in the proper timezone.
daily_caches seems to be empty

Sample database entry : 2011-02-28 22:30:00

The dashboard, however, shows date & time differently :
Last Updated: 02/28/11 10:30:00 PM

@djcas9
Copy link
Contributor

djcas9 commented Feb 28, 2011

Are the run_at times being calculated properly in the delayed_jobs table? When a job runs and complets it will increment it's completion time by 30 minutes for the caches table and 24 hours for the daily_caches table.

@corelanc0d3r
Copy link
Author

yeah, it looks like they are incremented properly, and the record shows the date & time in the correct format.

@corelanc0d3r
Copy link
Author

I also noticed that the Last updated time is ahead of the current time... so looks like there is some kind of date/time formatting/interpretation issue

@matherej
Copy link

I had a similar problem, just try this:
go to the main snorby directory (in my case it is /var/www/snorby)
then execute this command: sudo rails c
and then use this commands:
Snorby::Jobs.clear_cache(true)
Snorby::Jobs.run_now!

and then it should work.

Good luck

@cyberconsole
Copy link

I wrote about this the other day on the IRC...not sure if you got a chance to read it. I actually tracked the problem down to the query that the job runs to update the cache. It looks for events that happen in the future...instead of between the present time and 30 minutes in the past which i assume is the intended behabior. Clearing the cache and manually invoking the job will make the worker look through your entire event list and update the cache table and subsequently the dashboard however any future events will still be plaqued with the problem. You can do a tail on your log to see this behavior. However you will only notice the erroneos query with the worker runs its job on the 30 minute schedule... not when manually envoked.

Also,
I noticed that the worker actually uses the database defined under development in your database.yml. I assume this isnt a problem for most people since the leave them all at the default of snorby. You'll also notice that the worker logs show up in the development.log as well. I"m not sure if this is a one off with my build but it may warrent looking in to.

Hope this helps.

@corelanc0d3r
Copy link
Author

what would be the fix for looking for events in the future ?
also, why is the Last Updated timestamp in a wrong format ?

@corelanc0d3r
Copy link
Author

any ideas on how to fix the issue & make the dashboard work properly ?
tx

@dcarrith
Copy link

dcarrith commented Mar 3, 2011

I tried to fix this all weekend. I think I just came across a post that could explain what's happening. Apparently there is a bug in DataMapper and how it reads dates from MySQL. So, it can write the dates correctly, but when it reads the dates back out (i.e. when it tries to use the value @sensor.cache.last.ran_at) it is off by one. I'm on the East Coast, so when the sensor_cache_job.rb script gets the time from Time.now it shows -5:00 since it's EST. However, the value read from the ran_at field shows -4:00. I believe it's because this bug that was just recently discovered and fixed: datamapper/do@9e369b7#commitcomment-271516

@dcarrith
Copy link

dcarrith commented Mar 3, 2011

Well, now I'm confused. It looks like Snorby is using do_mysql-0.10.3.gem. That's the latest with the DST fix.

@dcarrith
Copy link

dcarrith commented Mar 3, 2011

I went ahead and tried to adjust it by one anyway even though the DST bug was supposedly fixed. I can share this sensor_cache_job.rb file with anyone that's interested, but I need to clean it up first. I tried a lot of stuff before I got to this point. But, basically, doing this seems to have fixed it. I'm still waiting for the SensorCacheJob to run on it's own (in 6 minutes) but it worked in my manual test. I think this will enable the dashboard to refresh as it is supposed to.

start_time = Time.parse("#{@sensor.cache.last.ran_at}") + 1.hour

@dcarrith
Copy link

dcarrith commented Mar 3, 2011

I just wanted to follow up on the comments I posted last night. I made the changes as noted above (for the dst thing) as well as some small changes to some of the logic of the sensor_cache_job.rb. My dashboard has been working consistently since those changes were put into place. After I clean up the code and do a diff on the original file, I can recommend these minor changes to the Snorby devs.

@corelanc0d3r
Copy link
Author

that's good news - and as soon as the devs have reviewed it, I would be more than happy to test it

@djcas9
Copy link
Contributor

djcas9 commented Mar 4, 2011

@dcarrith good work.. I look forward to seeing your changes.

@djcas9
Copy link
Contributor

djcas9 commented Mar 4, 2011

@dcarrith not sure on the +1.hour issue.. this just sounds like your system time/sensor times are not on the correct timezones. Example, if i added start_time = Time.parse("#{@sensor.cache.last.ran_at}") + 1.hour to my setup all my times would be an hour in the future.

@dcarrith
Copy link

dcarrith commented Mar 4, 2011

Snort and Snorby are on the same box. And, the timezone is set correctly to EST. I did notice that my router (running dd-wrt) is set to UTC. So, I'll need to fix that, but I don't think that's what was causing the issues I was seeing. I'm not sure what would happen if I just made that one change. It's likely that you would also need to make the other changes I made. How should I go about sending this updated file? I guess I could create a git repository and upload it to that. I'll look into it tonight and let you know.

@idawson21
Copy link

In my setup everything is on one box. The timestamps for the system as well as the DB are correct.

When I logged in today it was showing last updated at 2 hours in the future and no results for today although there were events listed.

I ran:
Snorby::Jobs.clear_cache(true)
Snorby::Jobs.run_now!

Which forced it to update correctly however it now shows the last updated time 1 hour in the future just as dcarrith describes.

@idawson21
Copy link

30 minutes rolled around and the last updated time (Though still 1 hour in the future) incremented by 30 minutes and I can see that the job ran. The dashboard however does not contain new events that occurred since I last forced an update.

So I cleared the cache again and ran the job as described above and the dashboard updated and still shows 1 hour in the future as the last updated time.

Hopefully some of this info helps. I think Snorby 2 is awesome and just want to help out.

@dcarrith
Copy link

dcarrith commented Mar 4, 2011

I determined earlier that my dd-wrt based router was set to the wrong DST setting. So, I fixed it, and it didn't have an impact on the "fix" I put into place on my Snorby box. I still get events updated every 30 minutes as I should. So, I don't think that was the cause of the problem. I really think it was the datamapper issue I mentioned earlier. Perhaps they thought they fixed it, but only fixed it for whatever test case they were using. I'm still going to do some more digging and I seem to have broken the event counts on the top signatures list...so I need to fix that too.

@matherej
Copy link

matherej commented Mar 7, 2011

I am really confused from this discussion. So, how can we fix this problem?

@dcarrith
Copy link

dcarrith commented Mar 7, 2011

I've been meaning to reply to this post, but haven't had time to fix the counts on the "Last 5 Unique Events" that I seem to have broken while trying to fix the dashboard. My dashboard is regularly updating though. I'll try to track down the counts issue and reply back within the week.

@djcas9
Copy link
Contributor

djcas9 commented Mar 7, 2011

@dcarrith thank you for all your hard work on this issue. I'm still a bit confused on the fix, I have been in production since I released snorby and all of my times have been accurate (numerous installs on pretty much very OS besides windows). I am not disputing that there is an issue but i'm confused by what causes it and why I have never experienced it.

Let's work together on this once you post you fix. Thanks a lot dcarrith and great work man.

@cyberconsole
Copy link

I wonder if this is a glitch that only appears for certain users based on their timezone or time configurations. I'm using NTP and have my time zone set appropriately. I wonder if this bug will dissapear if I just use UTC. I will try that tonight and report back.
btw I'm using Ubuntu Server 10.04

@dcarrith
Copy link

dcarrith commented Mar 7, 2011

I'm running Ubuntu Desktop 10.04 64-bit.

cat /etc/timezone

America/New_York

date +"%:z"

-05:00

dpkg-reconfigure tzdata

Local time is now: Mon Mar 7 11:16:33 EST 2011.
Universal Time is now: Mon Mar 7 16:16:33 UTC 2011.

Does anyone see anything wrong with that output? I live in VA.

Here is some more info about my install:

apache2ctl -v

Server version: Apache/2.2.14 (Ubuntu)
Server built: Nov 18 2010 21:19:09

passenger -v

Phusion Passenger version 3.0.3
"Phusion Passenger" is a trademark of Hongli Lai & Ninh Bui.

mysql --version

mysql Ver 14.14 Distrib 5.1.41, for debian-linux-gnu (x86_64) using readline 6.1

ruby -v

ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-linux]

gem -v

1.3.7

rails -v

Rails 3.0.3

DataMapper MySQL adapter:

dm-mysql-adapter-1.0.2.gem

@betteroff
Copy link

Doubt I can be much help, but I've been experiencing the same issue for some time now. My dashboard is currently 30 minutes into the future and will increase by 30 min every time I run the jobs manually. It resets back to only 30 minutes in the future overnight. I don't have much of an understanding of the Snorby backend, but I'll be glad to provide any information to help track down the cause.

Fedora 12
mysql Ver 14.14 Distrib 5.1.47, for redhat-linux-gnu (i386) using readline 5.1
ruby 1.9.2p0 (2010-08-18) [i686-linux]
Rails 3.0.3
Have 4 snort sensors (1 of which also hosts Snorby and database)
CST -06:00

@matherej
Copy link

matherej commented Mar 8, 2011

I was playing with the timezones at machine where is Snorby and I noticed that dashboard is working corectly when the timezone is set to anything except CET. When the timezone is set to CET, LastUpdate information of dashboard is 1 hour in the future. So I think the bug must by in timezones.

@cyberconsole
Copy link

I reset my timezone to UTC with sudo dpkg-reconfigure tzdata, dropped the snorbyDB and rebuilt with sudo rake snorby:setup RAILS_ENV=production.

It seems to be working properly now.

I still would prefer to have things display in the approapriate time zone so I will play around some more to see if I can narrow the problem down.

Maybe it has something to do with DST

@matherej - I was using CST and dcarrith was using EST so I dont think its necessarily just one time zone that is having the issue.

@dcarrith
Copy link

dcarrith commented Mar 8, 2011

@cyberconsole: take a look at my post above: https://github.com/Snorby/snorby/issues/#issue/54/comment/828861

I also think it's a DST issue. If that's the case, then this Sunday when the DST switches back (or forth...whatever) the issue should go away. At that point, I may have to revert the changes I made. We'll find out this weekend I guess.

@matherej
Copy link

matherej commented Mar 8, 2011

My timezone is CET. When I set timezone CET in ubuntu, snorby dashboard dont work corectly. I always have lastupdate 1 hour in the future. But when I change timezone to GMT-1 what is the same time, dashboard work. So, it must by something wrong with setting of timezones.

@ermalm
Copy link

ermalm commented Mar 9, 2011

@matherej: This does fix (GMT-2 for me) the update issue on the dashboard, but the alerts are still in the future. The graf shows alerts 2 hours ahead (for me).

@betteroff
Copy link

Just wanted to jump in here and say that now that I've moved from CST to CDT, my dashboard seems to be working correctly and has not jumped to future timestamps since.

@corelanc0d3r
Copy link
Author

what is the fix ? I don't want to mess up timestamps or timezones and I am still left with a broken system

@ermalm
Copy link

ermalm commented Mar 23, 2011

Yeah, won't snorby team release a fix to this soon? I can change the time on my server manually, but that's not what i really want to do since this seems more of a snorby issue.

@ermalm
Copy link

ermalm commented Mar 27, 2011

Nevermind. Due to the change of the time here in europe this night, the problem solved itself.

@wmjosiah
Copy link

wmjosiah commented Apr 5, 2011

This problem still exists for me. I set my timezone to UTC, dropped the snorby database and rebuilt with rake snorby:setup RAILS_ENV=production in /var/www/snorby. I then rebooted the box. Still wasn't working, so I removed the caches and updated them via the method above (post #6 in this thread). I waited for them to update, then even restarted the worker for good measure. I still have nothing in the dashboard, but everything else seems to work fine.

@wmjosiah
Copy link

wmjosiah commented Apr 8, 2011

So I started over and rolled my own - had been using Insta-Snorby before. I did the bundle install of the gems it wanted, so I'm using do_mysql 0.10.3. I'm in EST, and my time shows correctly on the dashboard (last updated is the immediately previous 30 minute time tick - correct). However, I have nothing in the dashboard. My jobs are running properly, as far as I can tell, and I have events and severities working properly, they just don't show up in the dashboard. My caches table gets a new entry every half hour, but my daily_caches table never gets anything. I tried the Snorby::Jobs.clear_cache(true)
Snorby::Jobs.run_now!
fix, but it didn't change anything. Any ideas?

@wmjosiah
Copy link

I think I'm having entirely different problem. My dashboard times are now correct, but when I try to manually start the sensor cache job, I get:

Loading development environment (Rails 3.0.5)
irb(main):001:0> Snorby::Jobs::SensorCacheJob.new(true).perform;
irb(main):002:0* Snorby::Jobs::SensorCacheJob.new(false).perform;
irb(main):003:0* Snorby::Jobs::SensorCacheJob.new(true).perform
Sensor 1: Looking for events...
Sensor 1: Found events - processing...
Sensor 1: Found last cache...
Sensor 1: Building cache attributes
Sensor 1: - fetching sensor metrics
Sensor 1: - building proto counts
Sensor 1: - fetch_event_count
Sensor 1: - fetching tcp count
Sensor 1: - fetching udp count
Sensor 1: - fetching icmp count
Sensor 1: - fetching severity metrics
Sensor 1: - fetching src ip metrics
undefined method ip_src' for nil:NilClass undefined methodip_src' for nil:NilClass
Sensor 1: Looking for events...
Sensor 1: Found events - processing...
Sensor 1: Found last cache...
Sensor 1: Building cache attributes
Sensor 1: - fetching sensor metrics
Sensor 1: - building proto counts
Sensor 1: - fetch_event_count
Sensor 1: - fetching tcp count
Sensor 1: - fetching udp count
Sensor 1: - fetching icmp count
Sensor 1: - fetching severity metrics
Sensor 1: - fetching src ip metrics
undefined method `ip_src' for nil:NilClass
=> nil
irb(main):004:0>

Any ideas as to why?

@rvineyard
Copy link

I've discussed my progress fighting with that particular issue ^^^ here:
#67 (comment)

Now my workers don't crash, but they also don't seem to work properly (the graphs are broken) unless I'm supervising them, it's bizarre.

I have noticed a lot of NULL src_ips and dst_ips records in the caches table in my database, and when I delete them things seem to improve a bit but I feel like I'm shooting in the dark.

@wmjosiah
Copy link

Thanks! At least that lets me know that it's NULL src_ips and dest_ips in the database that causes the problem. It seems to me that this is a bug. Unfortunately I don't know ruby, but I would think it would be trivial to add code that wouldn't barf on them. I suppose I could make a cron job that automatically ran a mysql query like "use snorby; delete * from caches where src_ips IS NULL;" but it sure seems like there would be a better fix! Maybe just change the database schema so that the default value for src_ips and dst_ips is no longer null? However, I think that there is actually a different problem, because those entries where src_ips and dst_ips are NULL are clearly junk entries. They have no data in them, and both are always NULL together. So really the problem is that those entries are somehow junk, and whatever is producing them has a bug in it. Thanks for the tip.

@wmjosiah
Copy link

That doesn't work anyway, as the junk data just gets regenerated by the SensorCache job as soon as I delete it from the database. I guess it's just corruption elsewhere. But why? Doing a hard reset with rake snorby:hard_reset in the snorby root dir fixes it temporarily, but them I lose all my old data! I'm starting to think about moving to a different front-end, but this one is so nice in so many ways!

@rvineyard
Copy link

Actually, that's exactly what I'm doing right now. I have scripts set to run every minute to restart any of the worker jobs if they're not running, and a mysql delete nulls query running every minute as well... so far, so good. I'm also running the SensorCacheJob with (true) instead of (false) and everything's dandy - it's like the watched pot that never boils. It's a horrendous hack, but it seems to work for me to some extent.

@rvineyard
Copy link

also to wmjosiah: I think the database NULLs are a side effect of the real problem. The objects that the database records correspond to should not be getting created with these properties set to nil (the ruby equivalent to NULL).
to mephux: Thanks for the awesome work so far, this is giving me an excuse to roll up my sleeves and dust off my ruby skills. I can't remember the last time I've put this much effort into keeping something running just because I liked the results so much. If the graphs were reliable and interactive, this would be the killer app - but even with a few bugs it's way better than anything else I've used.

@wmjosiah
Copy link

Yes, I agree - this is indeed an awesome frontend. I've already used it to find many, many problems on our network and fix them. It would be nice if the graphs worked, but not essential - it's still better than anything else. @Vineyard: when I delete the NULL records from the caches table, then run the SensorCacheJob, it comes right back and crashes the SensorCacheJob again, so I might have a slightly different problem than you, though clearly quite related. If you fix it in the ruby code (I agree with you about the root of the problem), then I'd love to hear about it! Until then, I'll just live without the dashboard.

@rvineyard
Copy link

For me the graphs are pretty much essential to manage my volume of alerts. I've made many modifications to the code in hopes of fixing this, but I can't seem to get anything to stay working for more than a few days. It's to the point that I'm doing my best to get up to speed on the current version of Rails so that I can hopefully have a better idea what I'm looking at - this is such a critical feature for me that trying to keep these graphs generating has become almost a full-time job for me :-(

@dcarrith
Copy link

Since I haven't had anymore time to work on this, I thought I would just make the file I modified (sensor_cache_jobs.rb) available to people. That way, you can give it a shot and see if it gets your dashboard to a working state. My dashboard is currently working and has been for a few weeks now.

Here's the git repo I set up to contribute this file. It should be public.
https://github.com/dcarrith/Snorby_contributions

I think the following steps will be necessary to get your dashboard back up and running after copying this file into place:

cd /var/www/snorby (or wherever you have put the snorby directory)

Before you do anything, you might want to stop snort and barnyard2 if you have those guys running.

sudo rails c

Snorby::Worker.stop
Snorby::Jobs.clear_cache(true)
Snorby::Jobs::SensorCacheJob.new(true).perform
Snorby::Jobs::DailyCacheJob.new(true).perform
Snorby::Worker.start

If you run into the ip_src = nil issue during processing and the job terminates, you will have to connect to your database and find the most recent caches table entry and delete it. Then, re-run either the SensorCacheJob or DailyCacheJob, whichever one terminated prematurely. That's the only way I've found to get past the ip_src = nil thing.

@wmjosiah
Copy link

Even if I delete the entire caches table, I still run into the "undefined method `ip_src' for nil:NilClass
=> nil" thing, unfortunately. I tried "delete from caches where src_ips IS NULL;", I tried deleting the most recent one, etc, but it just keeps coming back.
I don't understand enough about what is going on to delete whatever data is causing this in the first place. Thanks for putting that up and thanks for all of your effort. Perhaps it does fix my problem, but I don't know enough to allow it to do so :)

@djcas9
Copy link
Contributor

djcas9 commented Jul 25, 2011

This issue has been fixed in Snorby 2.3.1

@djcas9 djcas9 closed this as completed Jul 25, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants