-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add performance debugging section #4525
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
Performance Debugging | ||
===================== | ||
|
||
There are many possibilities that could be the reason for performance issues. | ||
In this section we will guide you through some options. | ||
|
||
General | ||
------- | ||
|
||
First of all you should check all the log files with a focus on stats.log and | ||
suricata.log if any obvious issues are seen. There are several tools that can | ||
help to find a root cause. | ||
|
||
A first step is to run a tool like **htop** to get an overview of the system | ||
load and if there is a bottleneck with the traffic distribution. For example if | ||
you can see that only a small number of cpu cores hit 100% all the time and | ||
others don't, it could be related to a bad traffic distribution or elephant | ||
flows. In the first case try to improve the configuration, in the other case | ||
try to filter or shunt those big flows with either bpf filter, bypass rules or | ||
eBPF/XDP. | ||
|
||
Another helpful tool is **perf** which helps to spot performance issues. Make | ||
sure you have it installed and also the debug symbols installed for suricata or | ||
the output won't be very helpful. This output is also helpful when you report | ||
performance issues as the Suricata Development team can narrow down possible | ||
bugs with that. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: s/bugs/issues/ ? |
||
|
||
:: | ||
|
||
sudo perf top -p $(pidof suricata) | ||
|
||
If you see specific function calls at the top and red it's a hint that those | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: s/at top and red/at top in red/ |
||
are the bottlenecks. For example if you see **IPOnlyMatchPacket** it can be | ||
either a result of high drop rates or incomplete flows which result in | ||
decreased performance. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It can be helpful to add in text for checking out the perf top for a specific cpu and or a thread. ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will add those. |
||
|
||
Another recommendation is to run Suricata without any rules to see if it's | ||
mainly related to the traffic. It can also be helpful to use rule-profiling | ||
and/or packet-profiling at this step. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It can be worth mentioning that for that part Suricata needs to be compiled with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will add this note |
||
|
||
Traffic | ||
------- | ||
|
||
In most cases where the hardware is fast enough to handle the traffic but the | ||
drop rate is still high it's related to specific traffic issues. | ||
|
||
First steps to check are: | ||
|
||
- Check if the traffic is bidirectional, if it's mostly unidirectional you're missing relevant parts of the flow (see **tshark** example at the bottom) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could also check if there is a big discrepancy between SYN vs SYN-ACKs and RSTs in the stats/eve logs. |
||
- Check for encapsulated traffic, while GRE, MPLS etc. are supported they could also lead to performance issues. Especially if there are several layers of encapsulation | ||
- Use tools like **iftop** to spot elephant flows. Flows that have a rate of over 1Gbit/s for a long time can result in one cpu core at 100% all the time and increasing the droprate while it doesn't make sense to dig deep into this traffic. | ||
- If VLAN is used it might help to disable **vlan.use-for-tracking** especially in scenarios where only one direction of the flow has the VLAN tag | ||
- If VLAN QinQ (IEEE 802.1ad) is used be very cautious if you use **cluster_qm** in combinatin with Intel drivers. While the RFC expects ethertype 0x8100 and 0x88A8 in this case (see https://en.wikipedia.org/wiki/IEEE_802.1ad) most implementations only add 0x8100 on each layer. If the first seen layer has the same VLAN tag but the inner one has different VLAN tags it will still end up in the same queue in **cluster_qm** mode. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should be mention what kernel level and specific Intel drivers (ex i40/ixgbe etc..) this is observed under. It may not be true for all Intell/all kernel versions. Mentioning There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I won't be able to test all old ones but I can at least add "up to version/firmware XY" |
||
- Check for other unusual or complex protocols that aren't supported very well. In several cases we've seen that Cisco Fabric Path (ethertype 0x8903) causes performance issues. It's recommended to filter it, one option would be a bpf filter with **not ether proto 0x8903** | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A useful addition could be mentioning that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will add a section for this as well |
||
|
||
Suricata also provides several specific traffic related signatures in the rules | ||
folder that could be enabled for testing to spot specific traffic issues. | ||
|
||
If you want to use **tshark** to get an overview of the traffic direction use this command: | ||
|
||
:: | ||
|
||
sudo tshark -i $INTERFACE -q -z conv,ip -a duration:10 | ||
|
||
The output will show you all flows within 10s and if you see 0 for one | ||
direction you have unidirectional traffic, thus you don't see the ACK packets | ||
for example. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,3 +13,4 @@ Performance | |
packet-profiling | ||
rule-profiling | ||
tcmalloc | ||
debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/suricata/Suricata/