Skip to content
This repository

rake snorby:soft_reset task is broken #102

Closed
vineyard opened this Issue July 30, 2011 · 10 comments

4 participants

Robert Vineyard Dustin Webber vrou Chase Venters
Robert Vineyard

So I logged in to Snorby this morning and noticed that yet again my dashboard was "stuck" - events were flowing in but the numbers and graphs were not getting updated, this time not even after manually restarting the delayed_job and CacheJob tasks. I did a bit of digging and ran across the :soft_reset task in lib/tasks/snorby.rake, however in its present form it is broken.

On line 63, the code currently reads:
Snorby::Worker.reset_cache(:all, true)

Considering that the :reset_cache method is defined in the Jobs class, this is probably more appropriate:
Snorby::Jobs.reset_cache(:all, true)

Now, after making that change, things initially look as I would expect, but then it errors out with the dreaded "undefined method `ip_src' for nil:NilClass" error that I haven't otherwise seen since the upgrade to 2.3.2.

I say upgrade, but really it was a pave and rebuild - I dropped my snort database, deleted my old snorby directory, pulled a fresh copy from the git master and started clean with a "rake snorby:setup".

Here's a trace of my modified :soft_reset task:

bash-3.2$ rake --trace snorby:soft_reset

<--- snip --->

Processing Batch 1 of 1...
Sensor 12: - fetching sensor metrics
Sensor 12: - building proto counts
Sensor 12: - fetch_event_count
Sensor 12: - fetching tcp count
Sensor 12: - fetching udp count
Sensor 12: - fetching icmp count
Sensor 12: - fetching severity metrics
Sensor 12: - fetching src ip metrics
rake aborted!
undefined method `ip_src' for nil:NilClass
/var/www/snorby/ezprint/ruby/1.9.1/gems/activesupport-3.0.5/lib/active_support/whiny_nil.rb:48:in `method_missing'
/var/www/snorby/lib/snorby/jobs/cache_helper.rb:144:in `block in fetch_src_ip_metrics'
/var/www/snorby/ezprint/ruby/1.9.1/gems/dm-core-1.1.0/lib/dm-core/collection.rb:507:in `block in each'
/var/www/snorby/ezprint/ruby/1.9.1/gems/dm-core-1.1.0/lib/dm-core/support/lazy_array.rb:411:in `block in each'
/var/www/snorby/ezprint/ruby/1.9.1/gems/dm-core-1.1.0/lib/dm-core/support/lazy_array.rb:411:in `each'
/var/www/snorby/ezprint/ruby/1.9.1/gems/dm-core-1.1.0/lib/dm-core/support/lazy_array.rb:411:in `each'
/var/www/snorby/ezprint/ruby/1.9.1/gems/dm-core-1.1.0/lib/dm-core/collection.rb:504:in `each'
/var/www/snorby/lib/snorby/jobs/cache_helper.rb:144:in `group_by'
/var/www/snorby/lib/snorby/jobs/cache_helper.rb:144:in `fetch_src_ip_metrics'
/var/www/snorby/lib/snorby/jobs/daily_cache_job.rb:140:in `block in build_cache'
/var/www/snorby/ezprint/ruby/1.9.1/gems/dm-chunked_query-0.3.0/lib/dm-chunked_query/chunks.rb:99:in `block in each'
/var/www/snorby/ezprint/ruby/1.9.1/gems/dm-chunked_query-0.3.0/lib/dm-chunked_query/chunks.rb:98:in `times'
/var/www/snorby/ezprint/ruby/1.9.1/gems/dm-chunked_query-0.3.0/lib/dm-chunked_query/chunks.rb:98:in `each'
/var/www/snorby/ezprint/ruby/1.9.1/gems/dm-chunked_query-0.3.0/lib/dm-chunked_query/mixin.rb:46:in `each_chunk'
/var/www/snorby/lib/snorby/jobs/daily_cache_job.rb:125:in `build_cache'
/var/www/snorby/lib/snorby/jobs/daily_cache_job.rb:41:in `block in perform'
/var/www/snorby/ezprint/ruby/1.9.1/gems/dm-core-1.1.0/lib/dm-core/collection.rb:507:in `block in each'
/var/www/snorby/ezprint/ruby/1.9.1/gems/dm-core-1.1.0/lib/dm-core/support/lazy_array.rb:411:in `block in each'
/var/www/snorby/ezprint/ruby/1.9.1/gems/dm-core-1.1.0/lib/dm-core/support/lazy_array.rb:411:in `each'
/var/www/snorby/ezprint/ruby/1.9.1/gems/dm-core-1.1.0/lib/dm-core/support/lazy_array.rb:411:in `each'
/var/www/snorby/ezprint/ruby/1.9.1/gems/dm-core-1.1.0/lib/dm-core/collection.rb:504:in `each'
/var/www/snorby/lib/snorby/jobs/daily_cache_job.rb:39:in `perform'
/var/www/snorby/lib/snorby/jobs.rb:103:in `reset_cache'
/var/www/snorby/lib/tasks/snorby.rake:63:in `block (2 levels) in <top (required)>'
/usr/local/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/task.rb:205:in `call'
/usr/local/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/task.rb:205:in `block in execute'
/usr/local/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/task.rb:200:in `each'
/usr/local/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/task.rb:200:in `execute'
/usr/local/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/task.rb:158:in `block in invoke_with_call_chain'
/usr/local/lib/ruby/1.9.1/monitor.rb:201:in `mon_synchronize'
/usr/local/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/task.rb:151:in `invoke_with_call_chain'
/usr/local/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/task.rb:144:in `invoke'
/usr/local/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/application.rb:112:in `invoke_task'
/usr/local/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/application.rb:90:in `block (2 levels) in top_level'
/usr/local/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/application.rb:90:in `each'
/usr/local/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/application.rb:90:in `block in top_level'
/usr/local/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/application.rb:129:in `standard_exception_handling'
/usr/local/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/application.rb:84:in `top_level'
/usr/local/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/application.rb:62:in `block in run'
/usr/local/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/application.rb:129:in `standard_exception_handling'
/usr/local/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/application.rb:59:in `run'
/usr/local/lib/ruby/gems/1.9.1/gems/rake-0.9.2/bin/rake:32:in `<top (required)>'
/usr/local/bin/rake:19:in `load'
/usr/local/bin/rake:19:in `<main>'
Tasks: TOP => snorby:soft_reset
Robert Vineyard

Oh, and I technically have 32 sensors (really one box with 32 copies of snort) and about 8000 alerts in the database now, for what that's worth. Most of the problems I was experiencing previously may have been due to having several million alerts, but I've tuned my signatures substantially since then. Notice the cache rebuild dies on Sensor 12...

Dustin Webber
Owner
mephux commented July 31, 2011

@vineyard It looks like you have an event with no source_ip - I'll need to figure out a way to store events without all the proper information. I can't seem to figure out why some peoples setup stores events like that. What was the signature for that event?

Robert Vineyard

@mephux - I did a bit of digging and there are quite a few impossible events in my database. I'm guessing it's something wonky in barnyard2 - either the fact that I've got IPv6 turned on (and the schema's not ready for it yet) or that my sensors don't always shut down gracefully, especially while I'm developing / debugging new code. I don't see any v6 addresses in my alert logs though, so I'm leaning toward the latter. This may take some doing to track down - you mentioned others are having this problem too?

Robert Vineyard

@mephux, since you asked, here are what I believe the 43 offending events (assuming my sql is correct here):

mysql> select event.sid, event.cid, signature.sig_sid, signature.sig_name, inet_ntoa(iphdr.ip_src) as ip_src, inet_ntoa(iphdr.ip_dst) as ip_src, event.timestamp from iphdr natural join event join signature on (event.signature=signature.sig_id) where (event.sid, event.cid) in (select iphdr.sid, iphdr.cid from iphdr where ip_src = 0 or ip_dst = 0) order by event.timestamp asc;
+-----+------+---------+--------------------------------------------------------+---------+---------+---------------------+
| sid | cid  | sig_sid | sig_name                                               | ip_src  | ip_src  | timestamp           |
+-----+------+---------+--------------------------------------------------------+---------+---------+---------------------+
|   1 |  156 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 4.0.0.0 | 0.0.0.0 | 2011-07-30 05:32:33 |
|  15 |  224 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-30 15:14:23 |
|  15 |  225 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-30 15:14:23 |
|  17 |  265 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-30 15:14:23 |
|  17 |  266 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-30 15:14:23 |
|  15 |  226 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-30 15:14:26 |
|  15 |  227 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-30 15:14:26 |
|  17 |  267 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-30 15:14:26 |
|  17 |  268 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-30 15:14:26 |
|  17 |  269 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-30 15:14:28 |
|  15 |  228 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-30 15:14:29 |
|  15 |  229 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-30 15:14:29 |
|  17 |  270 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-30 15:14:29 |
|  15 |  230 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-30 15:14:31 |
|  15 |  231 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-30 15:14:31 |
|  17 |  271 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-30 15:14:31 |
|  17 |  272 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-30 15:14:31 |
|  22 |  830 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 4.0.0.0 | 0.0.0.0 | 2011-07-30 16:42:07 |
|  29 |  243 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 4.0.0.0 | 0.0.0.0 | 2011-07-30 16:42:07 |
|  22 |  831 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 4.0.0.0 | 0.0.0.0 | 2011-07-30 16:42:08 |
|  22 |  832 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 4.0.0.0 | 0.0.0.0 | 2011-07-30 16:42:35 |
|  29 |  244 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 4.0.0.0 | 0.0.0.0 | 2011-07-30 16:42:36 |
|   6 | 1173 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 4.0.0.0 | 0.0.0.0 | 2011-07-31 00:33:26 |
|   7 |  364 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 00:58:31 |
|   7 |  365 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 00:58:31 |
|  25 |  328 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 02:58:36 |
|  25 |  329 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 02:58:36 |
|  17 |  434 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 13:09:29 |
|  15 |  361 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 13:09:30 |
|  15 |  362 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 13:09:30 |
|  17 |  435 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 13:09:30 |
|  15 |  363 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 13:09:32 |
|  15 |  364 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 13:09:32 |
|  17 |  436 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 13:09:32 |
|  17 |  437 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 13:09:33 |
|  17 |  438 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 13:09:35 |
|  17 |  439 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 13:09:35 |
|  15 |  365 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 13:09:36 |
|  15 |  366 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 13:09:36 |
|  15 |  367 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 13:09:38 |
|  15 |  368 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 13:09:38 |
|  17 |  440 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 13:09:38 |
|  17 |  441 |      10 | frag3: Bogus fragmentation packet. Possible BSD attack | 0.0.0.0 | 0.0.0.0 | 2011-07-31 13:09:39 |
+-----+------+---------+--------------------------------------------------------+---------+---------+---------------------+
43 rows in set (0.12 sec)
Robert Vineyard

Ok, that first one can't be me messing around... there's no way I was up at 5:32am on Saturday :-p

Robert Vineyard

Another dirty hack (I still haven't figured out why I'm getting the bogus records in my database) which seems to make the cache-building workers happy... I'm sure there's a better way to do this but I got tired of working through all the JOINs you have to do to in order to accomplish this with a single query. Since I'm using InnoDB on the backend, the next best thing is to run it all as a single transaction. I've got a cronjob feeding this into mysql every ten minutes:

begin;
delete from iphdr where ip_src = 0 or ip_dst = 0;
delete from caches where (sid, cid) not in (select iphdr.sid as sid, iphdr.cid as cid from iphdr);
delete from daily_caches where (sid, cid) not in (select iphdr.sid as sid, iphdr.cid as cid from iphdr);
delete from data where (sid, cid) not in (select iphdr.sid as sid, iphdr.cid as cid from iphdr);
delete from event where (sid, cid) not in (select iphdr.sid as sid, iphdr.cid as cid from iphdr);
delete from favorites where (sid, cid) not in (select iphdr.sid as sid, iphdr.cid as cid from iphdr);
delete from notes where (sid, cid) not in (select iphdr.sid as sid, iphdr.cid as cid from iphdr);
delete from opt where (sid, cid) not in (select iphdr.sid as sid, iphdr.cid as cid from iphdr);
delete from icmphdr where (sid, cid) not in (select iphdr.sid as sid, iphdr.cid as cid from iphdr);
delete from tcphdr where (sid, cid) not in (select iphdr.sid as sid, iphdr.cid as cid from iphdr);
delete from udphdr where (sid, cid) not in (select iphdr.sid as sid, iphdr.cid as cid from iphdr);
commit;
Dustin Webber
Owner

@vineyard I am also going to build proper handeling of this into the cache jobs. Just to make sure.

Robert Vineyard

@mephux when you say "proper handling" - do you mean deleting the bogus records like I am doing (which is probably the right way to handle them, since none of their other attributes make sense), or just skipping over them when generating the caches?

I used to experience the NilClass crash issue much more often, and before I realized the connection between this error and the bogus events in my snort database one of the fixes I implemented was to ensure that required fields were always checked for Nil before populating, ensuring that an invalid event object was never created.

Unfortunately, this had severe performance implications (at least the way I implemented it) and did not completely alleviate the problem.

On the plus side, after running the sql commands above and modifying the code for the soft_reset task, it appears that I am now able to rebuild my caches and charts at will whenever something gets fubared.

Crossing my fingers I won't have to do that very often now that most of the other issues that were affecting me have been resolved. :-)

vrou
vrou commented August 04, 2011

I had the same problem with my install, and after using vineyards query, it all works again, thank you vineyard for sharing this with us :)

Let's hope that somebody locates the problem, and fixes it the right way.

Chase Venters

I have the same problem with Snorby 2.3.11. The SQL query above seems to help, but it looks like I'm also going to need a cron job to re-engage the sensor cache job after it fails.

Dustin Webber mephux closed this May 04, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.