Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redesign PCAP processing pipeline #80

Closed
mmguero opened this issue Nov 13, 2019 · 1 comment
Closed

redesign PCAP processing pipeline #80

mmguero opened this issue Nov 13, 2019 · 1 comment
Assignees

Comments

@mmguero
Copy link
Collaborator

mmguero commented Nov 13, 2019

Currently PCAP files are processed first by moloch-capture and then by zeek. This is not very extensible. A more elegant approach would be to have a PCAP topic that is published to similar to this and just have these other processors subscribe to and pull from that.

Have to work out things like queue size, persistence, workers, etc. but it shouldn't be too compilcated.

@mmguero mmguero added enhancement New feature or request moloch zeek Relating to Malcolm's use of Zeek upload Relating to PCAP and/or Zeek log ingestion capture Relating to pcap-capture container labels Nov 13, 2019
@mmguero mmguero self-assigned this Nov 13, 2019
mmguero added a commit that referenced this issue Nov 14, 2019
…with a more extensible one that is also more parallel (see issue #80 and issue #78)
@mmguero
Copy link
Collaborator Author

mmguero commented Nov 14, 2019

working on this in topic/dynamic-pipelines

mmguero added a commit that referenced this issue Nov 15, 2019
…with a more extensible one that is also more parallel (see issue #80 and issue #78)
mmguero added a commit that referenced this issue Nov 15, 2019
Handling issue #80 and issue #78.

* redesign PCAP processing pipeline so that there is [one service](/idaholab/Malcolm/tree/development/moloch/scripts/pcap_watcher.py) that watches the `/data/pcap/processed` directory and publishes to a ØMQ topic), then [other services](/idaholab/Malcolm/tree/development/moloch/scripts/pcap_moloch_and_zeek_processor.py) can subscribe to that topic and do what they want with the PCAP information they receive. This will make it much easier to add future PCAP processors, and also increases parallel-ness of the code.

* move common Logstash enrichments to a separate pipeline. I've made the [pipelines](/idaholab/Malcolm/tree/development/logstash/pipelines) used for processing Logstash events more modular, and I've also made it more extensible by having the [startup script](/idaholab/Malcolm/tree/development/logstash/scripts/logstash-start.sh) dynamically detect and configure new pipelines on the fly. this will make it easier to add new parsers in the future (need to document how to do that in the [readme](/idaholab/Malcolm/tree/development/README.md) though).
@mmguero mmguero closed this as completed Nov 18, 2019
mmguero added a commit that referenced this issue Nov 20, 2019
* Topic/dynamic pipelines (#81) (Handling issue #80 and issue #78)

* redesign PCAP processing pipeline so that there is [one service](/idaholab/Malcolm/tree/development/moloch/scripts/pcap_watcher.py) that watches the `/data/pcap/processed` directory and publishes to a ØMQ topic), then [other services](/idaholab/Malcolm/tree/development/moloch/scripts/pcap_moloch_and_zeek_processor.py) can subscribe to that topic and do what they want with the PCAP information they receive. This will make it much easier to add future PCAP processors, and also increases parallel-ness of the code.

* move common Logstash enrichments to a separate pipeline. I've made the [pipelines](/idaholab/Malcolm/tree/development/logstash/pipelines) used for processing Logstash events more modular, and I've also made it more extensible by having the [startup script](/idaholab/Malcolm/tree/development/logstash/scripts/logstash-start.sh) dynamically detect and configure new pipelines on the fly. this will make it easier to add new parsers in the future (need to document how to do that in the [readme](/idaholab/Malcolm/tree/development/README.md) though).

* bump version for 1.7.1 release

* set opencontainers-compatible labels on docker containers

* fix path issue with fuser for the filebeat prune cronjob

* fix issue #82, OUI vendor names used by Logstash don't match those used by Moloch

* clean up unused code

* split pcap-monitor into its own image

* breaking out moloch and zeek docker containers into their own

* make sure things run as the right users in new containers

* fix issue with duplicate files not being detected by pcap_watcher.py

* documentation fix

* fix missing geoip section ids

* clean up dockerfiles

* decrease verbosity of moloch-capture since we're not seeing it anyway

* Allow the ability to specify PCAP_PIPELINE_IGNORE_PREEXISTING in order to check and (if needed) reprocess PCAP files that didn't get finished before shutdown. Default is 'false' which meants to do the check, 'true' means ignore anything in there before the container starts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant