Browse files

Merge branch 'maintenance' into develop

  • Loading branch information...
wagner-certat committed Dec 14, 2018
2 parents 3b2d715 + 5382c11 commit a77d29e39204f29c89d72a65dd9896c9c3218ab5
Showing with 47 additions and 2 deletions.
  1. +4 −1
  2. +9 −0
  3. +1 −0 debian/docs
  4. +1 −1 docs/
  5. +32 −0 docs/
@@ -146,11 +146,14 @@ CHANGELOG
- Handle not installed dependency library `requests` gracefully.

### Documentation
- FAQ: Explanation and solution on orphaned queues.
- Explanation and solution on orphaned queues.
- Section on how and why to remove `raw` data.
- Add or fix the tables of contents for all documentation files.
- Feeds:
- Fix Autoshun Feed URL (#1325).
- Add parameters `name` and `provider` to `intelmq/etc/feeds.yaml`, `docs/` and `intelmq/bots/BOTS` (#1321).
- Add file.

### Packaging
- Change the maintainer from Sasche Wilde to Sebastian Wagner (#1320).
@@ -0,0 +1,9 @@
IntelMQ Security Notes

Found a security issue?

In case you find security-relevant bugs in IntelMQ, please contact
More information including the PGP key can be found on ['s website](

@@ -2,4 +2,5 @@ AUTHORS
@@ -962,7 +962,7 @@ Please check this [README](../intelmq/bots/experts/deduplicator/ file.

#### Configuration Parameters:
* `type` - either `"whitelist"` or `"blacklist"`
* `keys` - a list of key names (strings)
* `keys` - Can be a JSON-list of field names (`["raw", "source.account"]`) or a string with a comma-separated list of field names (`"raw,source.account"`).

##### Whitelist

@@ -41,6 +41,38 @@ In most cases the bottlenecks are look-up experts. In these cases you can easily

See also this discussion on a possible enhanced load balancing:

### Removing raw data for higher performance and less space usage

If you do not need the raw data, you can safely remove it. For events (after parsers), it keeps the original data, eg. a line of a CSV file. In reports it keeps the actual data to be parsed, so don't delete the raw field in Reports - between collectors and parsers.

The raw data consumes about 50% - 30% of the messages' size (Depending of course on how many additional data you add to it and how much data the report includes). Dropping it, will improve the speed as less data needs to be transferred and processed at each step.

#### In a bot

You can do this for example by using the *Field Reducer Expert*. The configuration could be:

* `type`: `blacklist`
* `keys`: `raw`

Other solutions are the *Modify* bot and the *Sieve* bot. The last one is a good choice if you already use it and you only need to add the command:

remove raw

#### In the database

In case you store data in the database and you want to keep its size small, you can (periodically) delete the raw data there.

To remove the raw data for a events table of a PostgreSQL database, you can use something like:

UPDATE events SET raw = NULL WHERE "time.source" < '2018-07-01';

If the database is big, make sure only update small parts of the database by using an appropriate `WHERE` clause. If you do not see any negative performance impact, you can increase the size of the chunks, otherwise the events in the output bot may queue up. The `id` column can also be used instead of the source's time.

## My bot(s) died on startup with no errors logged

Rather than starting your bot(s) with `intelmqctl start`, try `intelmqctl run [bot]`. This will provide valuable debug output you might not otherwise see, pointing to issues like configuration errors.

0 comments on commit a77d29e

Please sign in to comment.