FAQ

Andy Wick edited this page Jul 16, 2018 · 63 revisions

Moloch FAQ

Table of Contents

General

Why should I use Moloch?

If you want a standalone open source full packet capture (FPC) system with meta data parsing and searching then Moloch may be your answer! Moloch allows you complete control of deployment and architecture. There are other FPC systems available.

Upgrading Moloch

Upgrading Moloch requires you install versions in order, as described in the chart below. If the version you are currently on isn’t listed please upgrade to the next higher version in the chart, you can then install the major releases in order to catch up. New installs can start from the latest version.

Moloch Version

ES Versions

Special Instructions

Notes

1.6

6.x

Must have upgraded to 1.5 first

1.5

5.x or 6.x

ES 6 instructions

Must have finished the 1.x reindexing

1.1.1

5.x or 6.x (new only)

Instructions

Must be on ES 5 already

0.20.2

2.4, 5.x

ES 5 instructions

What OSes are supported?

We have RPMs/DEBs available at http://molo.ch/#downloads Our deployment is on Centos 6 and Centos 7 with the elrepo 4.x kernel installed for packet performance increases and afpacket support. A large amount of development is done on Mac OS X 10.13 using MacPorts, however, it has never been tested in a production setting. :) Moloch is no longer supported on 32 bit machines.

Moloch requires gcc/g++ 4.8.4 or later to compile fully. This is because some nodejs packages require it.

The following OSes should work out of the box:

  • CentOS 7

  • Ubuntu 14.04, 16.04

  • FreeBSD 9

  • FreeBSD 10.0 (wiseService doesn’t work, but the wise plugin does), 10.3 is known NOT to compile currently

The following OSes should work after installing gcc/g++ 4.8 yourself

''Feel free to update with other known configurations!''

Moloch is not working

Here is the common check list:

  1. Elasticsearch is running and green, curl http://localhost:9200/_cat/health

  2. The db has been initialized with /data/moloch/db/db.pl http://localhost:9200 info

  3. Viewer is reachable by visiting http://viewerhostname:8005 from your browser

  4. No errors in /data/moloch/logs/viewer.log

    1. Check that viewer is running with pgrep -lf viewer

  5. No errors in /data/moloch/logs/capture.log

    1. Check that capture is running with pgrep -lf capture

  6. that the stats page shows the capture nodes you are expecting, http://viewerhostname:8005/stats from your browser

    1. Make sure the nodes are showing packets being received

    2. Make sure the timestamp for nodes is recent (within 5 seconds)

  7. verify that there is no bpf= in config.ini that is dropping all the traffic

  8. If stats tab shows captures, but viewer says "Oh no, Moloch is empty! There is no data to search."

    1. Moloch only writes records when a session has ended, it will take several minutes for session to show up after a fresh start, see config.ini to shorten the timeouts

    2. Verify your time frame for search covers the data (try switching to ALL)

    3. Check that you don’t have a view set

    4. Check that your user doesn’t have a forced expression set

  9. restart moloch-capture and add a --debug option

How do I reset Moloch?

  1. Leave elasticsearch running

  2. Shutdown all running viewer or capture processes so no new data is recorded.

  3. To delete all the SPI data stored in elasticsearch, use the db.pl script with either the init or wipe commands. The only difference between the two commands is that wipe leaves the added users so they don’t need to be re-added.

/data/moloch/db/db.pl ESHOST:ESPORT wipe
  1. Delete the pcap files. The PCAP files are stored on the file system in raw format. You need to do this on all of the capture machines

/bin/rm -f /data/moloch/raw/*

Self Signed Certs

It is possible to get self signed certs to work with both User to Moloch Viewer and Moloch Capture to Elasticsearch, however the core team does not support or recommend it. Use the money you are saving on a commercial product and go buy real certs. Wildcard certs are now under $500 or you can go with free Lets Encrypt certs. There may be folks on the Moloch slack workspace willing to help out.

How do I upgrade to Moloch 1.0

Moloch 1.0 has some large changes and updates that will require all session data to be reindexed. The reindexing is done in the background AFTER upgrading so there is little downtime. Large changes in 1.0 include:

  • All the field names have been renamed, and analyzed fields have been removed

  • Country codes are being changed from 3 character to 2 character

  • Tags will NOT be migrated if added before 0.14.1

  • The data for http.hasheader and email.hasheader will NOT migrate

  • IPv6 is fully supported and uses the Elasticsearch ip type

If you have any special parsers, tagger, plugins or wise source you may have to change configurations.

  • All db fields will need -term removed, capture won’t start and will warn you

To upgrade:

  • First make sure you are using ES 5.5.x (5.6 recommended) and Moloch 0.20.2 or 0.50.x before continuing. Upgrade to those version first!

  • Download 1.1.1 from https://molo.ch/#downloads

  • Shutdown all capture, viewer and wise processes

  • Install Moloch 1.1.1

  • Run /data/moloch/bin/moloch_update_geo.sh on all capture nodes which will download the new mmdb style maxmind files

  • Run db.pl http://localhost:9200 upgrade once

  • Start wise, then capture, then viewers. Especially watch the capture.log file for any warnings/errors.

  • Verify that NEW data is being collected and showing up in viewer, all old data will NOT show up yet.

Once 1.1.1 is working, its time to reindex the old session data:

  • Disable any db.pl expire or optimize jobs, or curator

  • Start screen or tmux since this will take several days

  • In the /data/moloch/viewer directory run /data/moloch/viewer/reindex2.js --slices X

    • The number of slices should be between 2 and the number of shards each index has, the higher the faster but more elasticsearch CPU that will be used. We recommend 1/2 the number of shards.

    • You can optionally add a --index option if there are indices you need to reindex first, otherwise it will work from newest to oldest

    • You can optionally add a --deleteOnDone, which will delete indices as they are converted, but you may want to try a reindex first on 1 index to make sure it is working.

  • As reindex runs old data will show up in viewer

  • Delete ALL old indices with

    curl -XDELETE 'http://localhost:9200/sessions-*'
  • Once the reindex finishes run the db.pl expire/optimize or curator job manually, this will take a while

  • Now can reenable any db.pl expire or optimize jobs, or curator. Do NOT reenable crons until you let them run and finish manually.

Elasticsearch

How many elasticsearch nodes or machines do I need?

The answer, of course, is "it depends". Factors include:

  • How much memory each box has

  • How many days you want to store meta data (SPI data) for

  • How fast the disks are

  • What percentage of the traffic is HTTP

  • The average transfer rate of all the interfaces

  • If the sessions are long lived or short lived

  • How fast response times should be for operators

  • How many operators are querying at the same time

Some important things to remember when designing your cluster:

  • 1Gbps of network traffic requires ~300GB of disk a day. For example, to store 14 days of 2.5Gbps average traffic you need 14*2.5*300 or ~10TB of disk space.

  • SPI data is usually kept longer then PCAP data. For example, you may store pcap for a week but SPI data for a month.

  • Have at least 3% of disk space available in memory. For example, if the cluster has 7TB of data then 7*0.03 or 210GB of memory is the minimum recommended. Note: the more days store the memory ratio can actually decrease, to 2% or 1%.

  • Assign half the memory to elasticsearch (but no more then 30G per node) and half the memory to disk cache.

  • Use version 2.4 of elasticsearch or later.

If you have large machines, they you can run multiple nodes per MACHINE. We have some estimators that may help.

The good news is that it is easy to add new nodes in the future, so feel free to start with less nodes. As a temporary fix to capacity problems, you can reduce the number of days of meta data that are stored. Just use the elasticsearch head interface to delete the oldest sessions-* index.

Data never gets deleted

The SPI data and the PCAP data are not deleted at the same time. The PCAP data is deleted as the disk fills up on the capture machines, more info. The SPI data is deleted when the ./db.pl expire command is run, usually from cron during off peak. There is a sample in the daily.sh script.

So deleting a PCAP file will NOT delete the SPI data, and deleting the SPI data will not delete the PCAP data from disk.

The UI does have commands to delete and scrub individual sessions, but the user must have the Remove Data ability on the users tab. Usually this is used for things you don’t want operators to see, such as bad images.

ERROR - Dropping request /_bulk

This error almost always means that your Elasticsearch cluster can not keep up with the amount of sessions that the capture nodes are trying to send it. You may only see the error message on your busiest capture nodes since capture tries to buffer the requests.

Some things to check:

  • Make sure each Elasticsearch node has 30G of memory (no more) and 30G of disk cache (at least) available to it. So for example if you are on a 64G machine only run 1 Elasticsearch node on the machine

  • If you are running multiple Elasticsearch nodes make sure the disks can support the iops load

  • Make sure you are running the latest Elasticsearch that the version of Moloch supports, for example 5.6.7 if using Elasticesarch 5

  • If using replication on the sessions index, turn off replication of the current day and only replicate past days. This can be done by using --replicas 1 with your daily ./db.pl expire run after turning off replicate in the sessions template using ./db.pl upgrade without the --replicas option

  • Make sure there is at most 1 shard of each sessions per node, if there are more run ./db.pl upgrade again

If these don’t help, you need to add more nodes or reduce the number of sessions being monitored. You can reduce the number of sessions with packet-drop-ips or bpf filters or rules files for example.

When do I add additional nodes? Why are queries slow?

If queries are too slow the easiest fix is to add additional elasticsearch nodes. Elasticsearch doesn’t do well if Java hits an OutOfMemory condition. If you ever have one, you should immediately delete the oldest sessions2-* index, update the daily.sh script to delete more often, and restart the elasticsearch cluster. Then you should order more machines. :)

Removing nodes

  • Go into the Moloch stats page and the ES Shards subtab

  • Click on the nodes you want to remove and exclude them

  • Wait for the shards to be moved off.

  • If no shards move, you may need to config Elasticsearch to allow 2 shards per node, although a large number may be required if removing many nodes.

    curl -XPUT 'localhost:9200/sessions*/_settings' -d '{
        "index.routing.allocation.total_shards_per_node": 2
    }'
  • If there are many shards that need to be redistributed, the defaults might take days, which is good for the cluster, but it might make you crazy. Increase the speed from default 3 streams at 20mb (60mb/sec) to something higher like 6 streams at 50mb (300mb/sec). Obviously, adjust for the speed of the new nodes' disks and network.

    curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{
       "indices.recovery.concurrent_streams":6,
       "indices.recovery.max_bytes_per_sec":"50mb"}
    }'

How to enable elasticsearch replication

Turning on replication will consume twice the disk space on the nodes and increase the network bandwidth between nodes, so make sure you actually need replication.

To change future days (since 0.14.1):

db/db.pl <ESHOST:ESPORT> upgrade --replicas 1

To change past days, but not the current day (since 0.14.1):

db/db.pl <ESHOST:ESPORT> expire <type> <num> --replicas 1

We recommend the second solution since it allows current traffic to be written to ES once, and during off peak the previous days traffic will be replicated.

How do I upgrade elasticsearch?

Rolling upgrades are supported for elasticesearch 2.x when doing a minor upgrade. In all other cases you must shutdown the entire cluster and restart it with the new version.

  • Download latest elasticsearch from http://www.elasticsearch.org/downloads/

  • Uncompress archive and move into /data/moloch (if using single host config)

  • Edit the ES start script so it has the correct version, /data/moloch/bin/run_es.sh (if using single hostconfig)

  • Shutdown current elasticsearch node:

    kill <pid(s)>
  • Start the new version back up:

    /data/moloch/bin/run_es.sh

How do I upgrade to ES 6.x

ES 6.x is supported by Moloch 1.x for NEW clusters and 1.5 for UPGRADING clusters.

NOTE - If upgrading, you must FIRST upgrade to Moloch 1.0 or 1.1 before upgrading to 1.5. Also all reindex operations needs to be finished.

We do NOT provide ES 6 startup scripts or configuration, so if upgrading please make sure you get startup scripts working on test machines before shutting down your current cluster.

Upgrading to ES 6 will REQUIRE two downtimes

First outage: If you are NOT using Moloch DB version 51 (or later) you must follow these steps while still using ES 5.x. To find what DB version you are using, either run db.pl locahost:9200 info or mouse over the (I) in Moloch.

  • Install Moloch 1.5.x

  • Shutdown capture

  • run "./db.pl http://ESHOST:9200 upgrade"

  • Restart capture

  • Verify everything is working

  • Make sure you delete the old indices that db.pl complained about

Second outage: Upgrade to ES 6

  • Make sure you delete the old indices that db.pl complained about

  • Shutdown everything

  • Upgrade ES to 6.x

  • WARNING - path.data will have to be updated to access your old data. If you had path.data: /data/foo you will probably need to change to /data/foo/<clustername>

  • Start ES cluster

  • Wait for cluster to go GREEN, this will take LONGER then usual as ES upgrades things from 5.x to 6.x format. curl http://localhost:9200/_cat/health

  • Start viewers and captures

How do I upgrade to ES 5.x

ES 5.x is supported by Moloch 0.17.1 for NEW clusters and 0.18.1 for UPGRADING clusters.

ES 5.0.x, 5.1.x and 5.3.0 are NOT supported because of ES bugs/issues. We currently use 5.6.7.

WARNING - If you have sessions-* indices created with ES 1.x, you can NOT upgrade. Those indices will need to be deleted.

We do NOT provide ES 5 startup scripts, so if upgrading please make sure you get startup scripts working on test machines before shutting down your current cluster.

Upgrading to ES 5 may REQUIRE 2 downtime periods of about 5-15 minutes each.

First outage: If you are NOT using Moloch DB version 34 (or later) you must follow these steps while still using ES 2.4. To find what DB version you are using, either run db.pl locahost:9200 info or mouse over the (I) in Moloch.

  • Upgrade to ES 2.4.x

  • Check for GREEN ES cluster curl http://localhost:9200/_cat/health

  • Install Moloch 0.18.1 to 0.20.2

  • Shut down all capture nodes

  • Run "./db.pl http://ESHOST:9200 upgrade"

  • Start up captures and make sure everything works

  • You can remain on ES 2.4.x until you want to try ES 5

Second outage: Upgrade to ES 5

  • You MUST be on ES 2.4.x and Moloch DB version 34 (or later) before using ES 5 (see above)

  • Shutdown EVERYTHING (elasticsearch, viewer, capture)

  • Upgrade ES to 5.6.x

  • Start ES cluster

  • Wait for cluster to go GREEN, this will take LONGER then usual as ES upgrades things from 2.x to 5.x format. curl http://localhost:9200/_cat/health

  • Start viewers and captures

How do I upgrade to ES 2.x from ES 1.x?

ES 2.x is only supported by Moloch 0.12.3 and later.

  • Make sure the cluster is GREEN and you are running at least ES 1.7.3, the upgrade has only been tested with 1.7.3. If not using 1.7.3 upgrade to that first.

  • Run the following command twice (just to make sure everything is upgraded :). This makes sure all ES files are using a recent version of lucene data format. This is only really needed if ES 0.x was ever used, but doesn’t hurt.

    curl -XPOST 'http://localhost:9200/_upgrade?pretty&human&only_ancient_segments'
  • Make sure the cluster is GREEN.

  • Upgrade the Moloch db scheme while still on the OLD version of ES.

    db.pl 127.0.0.1:9200 upgrade
  • Make sure the cluster is GREEN.

  • Completely shutdown everything (ES, viewer, cluster). Do NOT do a rolling ES restart.

  • YOU MUST SHUTDOWN EVERYTHING, even if you normally don’t, this time you MUST!

  • Upgrade ES to 2.x (see the instructions above).

  • Make sure that in elasticsearch.yml you’ve set "gateway.recover_after_nodes:" to number of ALL nodes in the cluster (data + master only).

  • Start just 1 or 2 nodes and make sure new ES actually starts. If it doesn’t, you can usually go back to previous version of ES since you haven’t fully started ES yet.

  • DO NOT start the full ES cluster until all capture/viewer process are shutdown AND you tried 1 ES node by itself.

  • Start full ES cluster.

  • Wait for cluster to go GREEN.

  • Start viewers and captures.

ES 2 won’t start - java.lang.IllegalStateException …​ was created before v0.90.0 and wasn’t upgraded

  • Go back to a working version of ES like 1.7.3.

  • Make sure you followed the ES2 instructions above.

  • Run _upgrade one more time.

  • If the index it is complaining that it’s on a non master node and there is a _state directory at the top level of the index (not in a shard directory), delete it. See elasticsearch issue #16044.

ES 2 - org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large

This seems to happen with old indices that are upgraded from ES versions before 2 to ES 2, because old indices didn’t use the doc_values settings. Increasing the size of the fielddata cache will fix the problem.

Either edit your elasticsearch.yml and add:

indices.breaker.fielddata.limit: 80%
or run the command:
curl -XPUT localhost:9200/_cluster/settings -d '{
  "persistent" : {
    "indices.breaker.fielddata.limit" : "80%"
  }
}'

If that doesn’t work, you can try increasing the heap size per node, but don’t go over 31G. For more information, read the heap sizing guide

Capture

What kind of capture machines should we buy?

The goal of Moloch is to use commodity hardware. If you start thinking about using SSDs or expensive NICs, research if it would just be cheaper to buy one more box. This gains more retention and can bring the cost of each machine down.

Some things to remember when selecting a machine:

  • An average of 1Gbps of network traffic requires 11TB of disk a day. For example to store 7 days of 2.5Gbps average traffic you need 7*2.5*11 or 192.5TB of disk space.

  • The total bandwidth number must include both RX and TX bandwidth. For example a 10G link can really produce up to 20G of traffic to capture, 10G in each direction. Include both directions in your calculations.

  • Don’t overload network links to capture. Monitoring a 10G link with an average of 4Gbps RX AND 4Gbps TX should use two 10G moloch-capture links, since 8Gbps is close to the max.

  • Moloch requires all packets from the same 5-tuple to be processed by the same moloch-capture process.

When selecting Moloch capture boxes, standard "Big Data" boxes might be the best bet. ($10k-$25k each) Look for:

  • CASE: There are many 4RU boxes out there. If space is an issue, there are more expensive 2RU that still hold over 20 drives (examples HPE Apollo 4200 or Supermicro 6028R-E1CR24L)

  • MEMORY: 64GB to 96GB

  • OS DISKS: We like RAID 1 small drives. SSDs are nice but not required.

  • CAPTURE DISKS: 20+ x 4TB or larger SATA drives. Don’t waste money on enterprise/SAS/15k drives.

  • RAID: A hardware RAID card with at least 1G cache (2G is better). We like RAID 5 with 1 hot spare or RAID 6 (with better cards)

  • NIC: We like newer Intel base NICs, but most should work fine (might want to get one compatible with PFRING).

  • CPU: At least 2 x 6 cores. The higher the average Gbps, the more speed/cores required.

We are big fans of using Network Packet Brokers ($6k+). They allow multiple taps/mirrors to be aggregated and load balanced across multiple moloch-capture machines. Read more below.

What kind of Network Packet Broker should we buy?

We are big fans of using Network Packet Brokers. If there is one piece of advice we can give medium or large Moloch deployments — use a NPB.

Main Advantages:

  • Easy horizontal scaling of moloch

  • Load balancing of traffic

  • Filtering of traffic before they hit the moloch boxes

  • Easier to add more moloch capacity or other security tools

  • Don’t have to worry as much about new links being added by network team

Features to look for

  • Load balancing

  • Consistent symmetric hashing (this means each direction of the flow goes out the same tool port)

  • MPLS/VLAN/VPN stripped (optional - some tools don’t like all the headers)

  • Tool link detection and fail over

  • Automation capability (can you use ansible/apis or are you stuck using a web ui)

  • Enough ports to support future tap and tool growth

  • If the features desired require an extra (expensive?) component and/or license

Just like with Moloch with commodity hardware, you don’t necessarily have to pay a lot of money for a good NPB. Some switch vendors have switches that can operate in switch mode or npb mode, so you might already have gear laying around you can use.

Sample vendors

What kind of packet capture speeds can moloch-capture handle?

Moloch allows multiple threads to be used to process the packets. On simple commodity hardware, it is easy to get 3Gbps or more, depending on the number of CPUs available to Moloch and what else the machine is doing. Many times the limiting factor can be the speed of the disks and RAID system. See Architecture and Multiple-Host-HOWTO for more information.

To test the RAID device use:

dd bs=256k count=50000 if=/dev/zero of=/THE_MOLOCH_PCAP_DIR/test oflag=direct
This is the MAX disk performance. If you don’t want to drop any packets, you shouldn’t average more then ~80% of the value returned. If using a RAID and want to not drop packets during a future rebuild, you probably don’t want to count on more then 60% of the value returned.

Prior to version 0.14 the recommended assumption was 1.5Gbps.

Moloch requires full packet captures error

When you get an error about the capture length not matching the packet length, it is NOT an issue with Moloch. The issue is with the network card settings.

By default modern network cards offload work that the CPUs would need to do. They will defragment packets or reassemble tcp sessions and pass the results to the host. However this is NOT what we want for packet captures, we want what is actually on the network. So you will need to configure the network card to turn off all the features that hide the real packets from Moloch.

The sample config files (/data/moloch/bin/moloch_config_interfaces.sh) turn off many common features but there are still some possible problems:

  1. If using a VM for Moloch, you need to turn off the features on the physical interface the vm interface is mapped to

  2. If using a fancy card there may be other features that need to be turned off.

    1. You can find them usually with ethtool -k INTERFACE | grep on  — Anything that is still on, turn off and see if that fixes the problem

    2. For example ethtool -K INTERFACE tx off sg off gro off gso off lro off tso off

There are two work arounds:

  1. If you are reading from a file you can set readTruncatedPackets=true in the config file, this is the only solution for saved .pcap files

  2. You can increase the max packet length with snapLen=65536 in the config file, this is not recommended

Why am I dropping packets?

There are several different types of packet drops and reasons for packet drops.

Moloch Version

Please make sure you are using a recent version of Moloch. Constant improvements are made and it is hard for us to support older versions.

Kernel and TPACKET_V3 support

The most common cause of packet drops with Moloch is leaving the reader default of libpcap instead of switching to tpacketv3, pfring or one of the other high performance packet readers. We strongly recommend tpacketv3, but it does required a newer kernel of 3.2 or later. See plugin settings for more information.

For those stuck on Centos 6 use elrepo and install kernel-ml on the machines that will RUN moloch-capture. Install kernel-ml-headers on the machines that will COMPILE moloch. Download the packages from http://elrepo.org/linux/kernel/el6/x86_64/RPMS/ . The rebuilt Moloch RPMs already have been compiled on a machine with newer kernel.

Network Card Config

Make sure the network card is configured correctly by increasing the ring buf to max size and turning off most of the card’s features. The features are not useful anyway, since we want to capture what is on the network instead of what the local OS sees. Example configuration:

# Set ring buf size, see max with ethool -g eth0
ethtool -G eth0 rx 4096 tx 4096
# Turn off feature, see available features with ethtool -k eth0
ethtool -K eth0 rx off tx off gs off tso off gso off

If Moloch was installed from the deb/rpm and the Configure script was used, this should already be done in /data/moloch/bin/moloch_config_interfaces.sh

packetThreads and the Packet Q is overflowing error

The packetThreads config option controls the number of threads processing the packets, not the number of threads reading the packets off the network card. You only need to change the value if you are getting the Packet Q is overflowing error. The packetThreads option is limited to 24 threads, but usually you only need a few.

To increase the number of threads the reader uses please see the documentation for the reader you are using on the settings page.

Disk

Make sure swap has been disabled, or at the very least, isn’t writing to the disk being used for pcap.

Make sure the RAID isn’t in the middle of a rebuild or something worse. Most RAID cards will have a status of OPTIMAL when things are all good and DEGRADED or SUBOPTIMAL when things are bad.

To test the RAID device use:

dd bs=256k count=50000 if=/dev/zero of=/THE_MOLOCH_PCAP_DIR/test oflag=direct

If you are using xfs make sure you use mount options defaults,inode64,noatime

  • Don’t run capture and elasticsearch on the same machine.

  • Make sure you actually have enough disk write thru capacity and disks. For example, for a 1G link with RAID 5 you may need:

    • At least 4 spindles if using a RAID 5 card with write cache enabled.

    • At least 8 spindles (or more) if using a RAID 5 card with write cache disabled.

  • Make sure your RAID card can actually handle the write rate. Many onboard RAID 5 controllers can not handle sustained 1G write rates.

  • Switch to RAID 0 from RAID 5 if you can live with the TOTAL data loss on a single disk failure.

If using EMC for disks:

  • Make sure write cache is enabled for the LUNs.

  • If it is a CX with SATA drives, RAID-3 is optimized for large sequential I/O.

  • Monitor EMC lun queue depth, may be too many hosts sharing it.

To check your disk IO run iostat -xm 5 and look at the following:

  • wMB/s will give you the current write rate, does it match up with what you expect?

  • avgqu-sz should be near or less then 1, otherwise linux is queueing instead of doing

  • await should be near or less then 10, otherwise the IO system is slow, which will slow moloch-capture down.

Other things to do/check:

  • If using RAID 5 make sure you have write cache enabled on the RAID card.

    • Adaptec Example: arcconf SETCACHE 1 LOGICALDRIVE 1 WBB

    • HP Example: hpssacli ctrl slot=0 modify dwc=enable

  • Maybe taskset to give moloch-capture its own CPU, although with the new pcapWriteMethod thread or thread-direct setting, this isn’t needed, and may hurt.

Other

  • There are conflicting reports that disabling irqbalancer may help.

  • Check that the CPU you are giving moloch-capture isn’t handling lots of interrupts (cat /proc/interrupts).

  • Make sure other processes aren’t using the same CPU as moloch-capture.

WISE

  • Cyclical packet drops may be caused by bad connectivity to the wise server. Verify that the wiseService responds quickly curl http://moloch-wise.hostname:8081/views on the moloch-capture host that is dropping packets.

High Performance Settings

See settings

How do I import existing PCAPs?

Think of the 'moloch-capture' binary much like you would 'tcpdump'. 'moloch-capture' can listen to live network interface(s), or read from historic packet capture files.

${moloch_dir}/bin/moloch-capture -c [config_file] -r [pcap_file]

For an entire directory, use -R [pcap directory]

See ${moloch_dir}/bin/moloch-capture --help for more info

How do I monitor multiple interfaces?

Versions since 0.14.2 support a semicolon ';' separated list of interfaces to listen on for live traffic.

Versions prior 0.14.2 require multiple moloch-capture processes, since a capture process will only monitor a single interface. To monitor multiple interfaces on a single machine, you will need multiple capture processes.

  • Since, by default, moloch uses the unqualified hostname as the name of the moloch node, you’ll need to come up with a naming scheme. Appending a, b, c, …​ or the interface number to the hostname are possible methods.

  • Edit /data/moloch/etc/config.ini, and create a section for each of the moloch nodes. Assuming the defaults are correct in the [default] section, the only thing thing that MUST be set is the interface item. It is also common to have each moloch node talk to a different elasticsearch node if running a cluster of elasticsearch nodes.

     [moloch-m01a]
     interface=eth2
     [moloch-m01b]
     interface=eth5
  • If hostname + domainname on the machine doesn’t return a FQDN, you’ll also need to set a viewUrl, or easier use the --host option.

  • Create two start up scripts. You will now need to specify the moloch node name with the -n option and change the log files so they are separate.

     TDIR=/data/moloch
     cd ${TDIR}/bin
     /bin/rm -f ${TDIR}/capturea.log.old
     /bin/mv ${TDIR}/logs/capturea.log ${TDIR}/logs/capturea.log.old
     ${TDIR}/bin/moloch-capture -n moloch-m01a -c ${TDIR}/etc/config.ini > ${TDIR}/logs/capturea.log 2>&1

and

 TDIR=/data/moloch
 cd ${TDIR}/bin
 /bin/rm -f ${TDIR}/captureb.log.old
 /bin/mv ${TDIR}/logs/captureb.log ${TDIR}/logs/captureb.log.old
 ${TDIR}/bin/moloch-capture -n moloch-m01b -c ${TDIR}/etc/config.ini > ${TDIR}/logs/captureb.log 2>&1
* Add both scripts to inittab or upstart

You only need to run one viewer on the machine. Unless it is started with the -n option, it will still use the hostname as the node name, so any special settings need to be set there (although default is usually good enough).

Moloch capture crashes

Please file a ticket on github with the stack trace.

  • You’ll need to allow suid or user changing programs to save core dumps. Use sysctl to change until the next reboot. Setting it to 0 will change it back to the default.

    sysctl -w fs.suid_dumpable=2
  • The user that moloch switches to must be able to write to the directory that moloch-capture is running in.

  • Run moloch-capture and get it to crash.

  • Look for the most recent core file.

  • Run gdb (you may need to install the gdb package first)

    gdb /data/moloch/bin/moloch-capture corefilename
  • Get the back trace using the bt command

If it is easy to reproduce, sometimes it’s easier to just run gdb as root:

  • Run gdb moloch-capture as root.

  • Start moloch in gdb with run ALL_THE_ARGS_USED_FOR_MOLOCH-CAPTURE_GO_HERE.

  • Wait for crash.

  • Get the backtrace using bt command.

  • Sometimes, you need to put a break point in g_log b g_log

ERROR - pcap open failed

Usually moloch-capture is started as root so that it can open the interfaces and then it immediately drops privileges to dropUser and dropGroup, which are by default nobody:daemon. This means that all parent directories need to be either owned or at least executable by nobody:daemon and that the pcapDir itself must be writeable.

How to reduce amount of traffic/pcap?

Listed in order from highest to lowest benefit to Moloch

  1. Setting the bfp= filter will stop Moloch from seeing the traffic.

  2. Adding CIDRs to the packet-drop-ips section will stop Moloch from adding packets to the PacketQ

  3. Using Rules it is possible to control if the packets are written to disk or the SPI data is sent to Elasticsearch

Life of a packet

Moloch capture supports many options for controlling which packets are captured, processed, and saved to disk.

  • The first gatekeeper and most important is the bpf filter bpf=. This filter can be implemented in the kernel, the network card, libpcap or network drivers. It is a single filter and it controls what moloch capture "sees" or doesn’t "see". Any packet that is dropped because of the bpf filter is usually not counted in ANY Moloch stats, but some implementation do expose it.

  • Moloch does a high level decode of the ethernet, ip, ip protocol information and sees if it understands it. If it doesn’t supports it, moloch will discard the packet.

  • Moloch checks the packet-drop-ips section to see if the ips involved are marked to be discarded. If there are only a few ips to drop then bpf= should be used, otherwise this is much more efficient then a huge bpf.

  • Moloch picks a packet queue to send the packet to, if the packet queue is too busy it will drop the packet. Potentially increase increase packetThreads or maxPacketsInQueue if too many packets are being dropped here.

  • A packet queue will start processing a packet and update all the stats and basic information for the session the packet is associated with

  • Since 0.19 it will execute sessionSetup rules for first packets in a session.

  • If this is the first packet of the session the packet queue will then check all the dontSaveBPFs, and if one matches it will save off the max number of packets to save for the session.

  • If this is the first packet of the session AND no dontSaveBPFs matched, the packet queue will then check all the minPacketsSaveBPFs and save off a min number of packets that must be received.

  • Finally moloch goes to save the packet, if it has already saved the max number of packets for the session OR if there was another method (plugin) that said stop saving packets for the session the packet won’t be saved.

  • If the number of packets for the session is greater then maxPackets the session will be saved. Since 0.19 the beforeMiddleSave and beforeBothSave rules will be executed.

  • The packet queue sends the packet off to the various classifiers and parsers to gather more meta data. Since 0.19 the after afterClassify rules will be executed, and if any fields are set during this processing the fieldSet rules will be executed.

  • At some point in the future the session will hit one of the timeouts and the session will be saved if there have been enough packets saved to meet the min number of packets received setting per session. (Defaults to 0) Since 0.19 the beforeFinalSave and beforeBothSave rules will be executed.

PCAP Deletion

PCAP deletion is actually handled by the viewer process, so make sure the viewer process is running on all capture boxes. The viewer process checks on startup and then every minute to see how much space is available, and if it is below freeSpaceG, then it will start deleting the oldest file. Note, freeSpaceG can also be a percentage, newer versions of moloch use freeSpaceG=5% for the default. The viewer process will always leave at least 10 PCAP files on the disk, so make sure there is room for at least maxFileSizeG * 10 capture files on disk, or by default 120G.

If still having pcap delete issues:

  1. Make sure freeSpaceG is set correctly for the environment

  2. Make sure there is free space where viewer is writing its logs

  3. Make sure viewer can reach elasticsearch

  4. Make sure that dropUser or dropGroup can actually delete files in the pcap directory and has read/execute permissions in all parent directories

  5. Make sure the pcap directory is on a filesystem with at least maxFileSizeG * 10 space available

  6. Make sure the files you think should be deleted show up on the files tab, if not use the db.pl sync-files command.

  7. Make sure the files in the file tab don’t have locked set, viewer won’t deleted locked files

  8. Try restarting viewer

dontSaveBPFs doesn’t work

Turns out BPF filters are tricky. :) When the network is using vlans, then at compile time, BPFs need to know that fact. So instead of a nice simple dontSaveBPFs=tcp port 443:10 use something like dontSaveBPFs=tcp port 443 or (vlan and tcp port 443):10. Basically FILTER or (vlan and FILTER). Information from here.

Zero byte pcap files

Moloch buffers writes to disk, which is great for high bandwidth networks, but bad for low bandwidth networks. How much data is buffered is controlled with pcapWriteSize, which defaults to 262144 bytes. An important thing to remember is the buffer is per thread, so set packetThreads to 1 on low bandwidth networks. There is no time limit on the buffer before Moloch 1.0, so you must wait until the entire buffer is full before data will be written to disk.

Starting with Moloch 1.0, we now write a portion of what is buffered after 10 seconds of no writes. However it will still buffer the last pagesize bytes, usually 4096 bytes.

You can also end up with many zero byte pcap files if the disk is full, see PCAP Deletion.

Can I virtualize Moloch with KVM using OpenVswitch?

In small environments with low amounts of traffic this is possible. With Openvswitch you can create mirror port from a physical or virtual adapter and send the data to another virtual NIC as the listening interface. In KVM, one issue is that it isn’t possible to increase the buffer size past 256 on the adapter using the Virtio network adapter (mentioned in another part of the FAQ). Without Moloch capture will continuously crash. To solve this in KVM, use the E1000 adapter, and configure the buffer size accordingly. Set up the SPAN port on Openvswitch to send traffic to it: https://www.rivy.org/2013/03/configure-a-mirror-port-on-open-vswitch/.

Viewer

Where do I learn more about the expressions available

Click on the owl

Exported pcap files are corrupt, sometimes session detail fails

  • The most common cause of this problem is that the timestamps between the moloch machines are different. Make sure ntp is running everywhere, or that the time stamps are in sync.

Map counts are wrong

  • The source and destination ips are each counted, so the map should total twice the number of sessions.

  • Currently elasticsearch only has accurate counts up to 2 billion uniques.

  • Some countries aren’t shown, but can still be searched using their ISO-3 (< 1.0) or ISO-2 (>= 1.0).

What browsers are supported?

Recent versions of Chrome, Firefox, and Safari should all work fairly equally. Development is done mostly with Chrome on a Mac, so it gets the most attention.

  • Chrome 53+ (All development is done with Chrome Stable)

  • Firefox 54+

  • Opera 40+

  • Safari 10+

  • Edge 14+

  • IE is not supported

Viewer doesn’t run after upgrading Node.js

The packages that viewer depends on must be reinstalled after upgrading Node.js on a machine.

cd /data/moloch/viewer
mv node_modules node_modules.save
npm cache clean
npm install

Error: getaddrinfo EADDRINFO

This seems to be caused when proxying requests from one viewer node to another and the machines don’t use FQDNs for their hostnames and the short hostnames are not resolvable by DNS. You can check if your machine uses FQDNs by running the hostname command. There are several options to resolve the error:

  1. Use the --host option on capture

  2. Configure the OS to use FQDNs.

  3. Make it so DNS can resolve the shortnames or add the shortnames to the hosts file.

  4. Edit config.ini and add a viewUrl for each node. This part of the config file must be the same on all machines. (We recommend you just use the same config file everywhere). Example:

    [node1_eth0]
    interface=eth0
    viewUrl=http://node1.fqdn
    [node1_eth1]
    interface=eth1
    viewUrl=http://node1.fqdn
    [node2]
    interface=eth1
    viewUrl=http://node2.fqdn

How do I proxy Moloch using Apache

Apache, and other web servers, can be used to provide authentication or other services for Moloch when setup as a reverse proxy. When a reverse proxy is used for authentication it must be inline, and authentication in moloch will not be used, however moloch will still do the authorization. Moloch will use a username that the reverse proxy passes to moloch as a http header for settings and authorization. See the architecture page for diagrams. While operators will use the proxy to reach the moloch viewer, the viewer processes still need direct access to each other.

  • Install apache, turn on the auth method of your choice. This example also uses https from apache to Moloch, but if on localhost that isn’t required. Configure it to set a special header for Moloch to check:

    AuthType your_auth_method
    Require valid-user
    RequestHeader set MOLOCH_USER %{your_auth_method_concept_of_username_variable}e
  • Make sure mod_ssl is loaded, and set up a SSL proxy:

    SSLProxyEngine On
    #ProxyRequests On # You probably don't want this line
    ProxyPass        /moloch/ https://localhost:8005/ retry=0
    ProxyPassReverse /moloch/ https://localhost:8005/
  • Restart apache.

  • Using the Moloch UI (by going directly to a non proxy moloch) make sure the "Web Auth Header" is checked for the users.

  • Edit Moloch’s config.ini

    • Create a new section for the moloch proxy.

    • Set userNameHeader to the lower case version of the header apache is setting.

    • Set the webBasePath to the ProxyPath location used above. All other sections should NOT have a webBasePath.

    • Add a viewHost, so externals can’t just set the userNameHeader and access moloch with no auth:

      [moloch-proxy]
      userNameHeader=moloch_user
      webBasePath = /moloch/
      viewPort = 8005
      viewHost = localhost
  • Start the moloch-proxy viewer.

  • To prevent the users from going directly to moloch in the future, scramble their passwords. You might want to leave an admin user that doesn’t use the apache auth. Or you can temporarily add one with the addUser.js script.

How do I search multiple Moloch clusters

It is possible to search multiple moloch clusters by setting up a special multiple moloch viewer and a special multies process. Multies is similiar to Elasticsearch tribe nodes, except it was created before tribe nodes and can deal with multiple indexes having the same name. One big limitation currently is that all the moloch clusters have to use the same rotateIndex setting.

To use, create another config.ini file or section in a shared config file. Both multies.js and the special "all" viewer can use the same node name.

# viewer/multies node name (-n allnode)
[allnode]
# The host and port multies is running on, set with multiESHost:multiESPort usually just run on the same host
elasticsearch=127.0.0.1:8200
# This is a special multiple moloch cluster viewer
multiES=true
# Port the multies.js program is listening on, elasticsearch= must match
multiESPort = 8200
# Host the multies.js program is listening on, elasticsearch= must match
multiESHost = localhost
# Semicolon list of elasticsearch instances, one per moloch cluster.  The first one listed will be used for settings
multiESNodes = es-cluster1.example.com:9200;es-cluster2.example.com:9200

Now you need to start up both the multies.js program and viewer.js with the config file. All other viewer settings, including webBasePath can still be used.

By default, the users table comes from the first cluster listed in multiESNodes. This can be overridden by setting usersElasticsearch and optionally usersPrefix in the multi viewer config file.

How do I reset my password?

An admin can change anyone’s password on the Users tab by clicking the Settings link in the Actions column next to the user.

A password can also be changed by using the addUser script, which will replace the entire account if the same userid is used. All preferences and views will be cleared, so creating a secondary admin account may be a better option if you need to change an admin users password. After creating a secondary admin account, change the users password and then delete the secondary admin account.

node addUser -c <configfilepath> <user id> <user friendly name> <password> [--admin]

Error: Couldn’t connect to remote viewer, only displaying SPI data

Viewers have the ability to proxy traffic for each other. The ability relies on moloch node names that are mapped to hostnames. Common problems are when systems don’t use FQDNs or certs don’t match.

How do viewers find each other

First the SPI records are created on the moloch-capture side.

  1. Each moloch-capture gets a nodename, either by the -n command line option or everything in front of the first period of the hostname

  2. Each moloch-capture writes a stats record every few seconds that has the mapping from the nodename to the FDQN

  3. Each SPI record has a nodename in it.

When pcap is retrieved from the viewer it looks up the nodename associated with the record.

  1. Each moloch-viewer process gets a nodename, either by the -n command line option or evertying in front of the first period of the hostname.

  2. If the SPI nodename is the same as the moloch-viewer nodename it can be processed locally, STOP HERE. This is the common case with one capture process per capture node.

  3. If the stats[nodename].hostname is the same as the moloch-viewer’s hostname (exact match) then it can be processed locally, STOP HERE. Remember this is written by capture above. This is the common case with multiple capture processes per capture node.

  4. If we make it here, the pcap data isn’t local and it must be proxied

  5. If --host was used on the capture node use that

  6. If there is a viewUrl set in the [nodename] section use that

  7. If there is a viewUrl set in the [default] section use that

  8. Use stats[nodename].hostname:[nodename section - viewPort setting]

  9. Use stats[nodename].hostname:[default section - viewPort setting]

  10. Use stats[nodename].hostname:8005

Possible fixes

First, look at viewer.log on both the viewer machine and the remote machine and see if there are any obvious errors. The most common problems are:

  1. Not using the same config.ini on all nodes can make things a pain to debug and sometimes not even work. It is best to use the same config with different sections for each node name [nodename]

  2. The remote machine doesn’t return a FQDN from the hostname command AND the viewer machine can’t resolve just the hostname. Before version 0.14.1 the domainname command was ignore. To fix this, do ONE of the following:

    1. Make it so the remote machines returns a FQDN (hostname "fullname" as root and edit /etc/sysconfig/network)

    2. Set a viewUrl in each node section of the config.ini. If you don’t have a node section for each host, you’ll need to create one.

    3. Edit /etc/resolv.conf and add search foo.example.com, where foo.example.com is the subdomain of the hosts. Basically, you want it so "telnet shortname 8005" works on the viewer machine to the remote machine.

  3. The remote machine’s FQDN doesn’t match the CN or SANs in the cert it is presenting. The fixes are the same as #2 above.

  4. The remote machine is using a self signed cert. To fix this, either turn off https or see the certificate answer above.

  5. The remote machine can’t open the pcap. Make sure the dropUser user can read the pcap files. Check the directories in the path too.

  6. Make sure all viewers are either using https or not using https, if only some are using https then you need to set viewUrl for each node.

    1. When trouble shooting this issue, it is sometimes easier to disable https everywhere

  7. If you want to change the hostname of a capture node

    1. Change your mind :)

    2. Reuse the same node name as previously with a -n option

    3. Use the viewUrl for that old nodename that points to the new host.

compiled against a different Node.js version error

Moloch uses Node.js for the viewer component, and requires many packages to work fully. These packages must be compiled with and run using the same version of Node.js. An error like …​ was compiled against a different Node.js version using NODE_MODULE_VERSION 48. This version of Node.js requires NODE_MODULE_VERSION 57. means that the version of Node.js used to install the packages and run the packages are different.

This shouldn’t happen when using the prebuilt Moloch releases. If it does, then double check that /data/moloch/bin/node is being used to run viewer.

If you built Moloch yourself, this usually happens if you have a different version of node in your path. You will need to rebuild moloch and either

  • Remove the OS version of node

  • Make sure /data/moloch/bin is in your path before the OS version of node

  • Use the --install option to easybutton which will add to the path for you

Parliament

Sample Apache Config

Parliament is designed to run behind a reverse proxy such as apache. Basically you just need to tell apache to send all root requests and any /parliament requests to the parliament server.

ProxyPassMatch   ^/$ http://localhost:8008/parliament retry=0
ProxyPass        /parliament/ http://localhost:8008/parliament/ retry=0