Performance impact question #80

jcdaniel14 · 2020-06-19T03:47:40Z

Hi there folks, here's the picture, I'm receiving flows from different routers, for some reason I receive a "copy" of the flows sent to another collector (I'm just another exporter on IOS-XR) so I get data from ports that I don't really need and I can't change the network configuration.

In order to save disk space I decided to hardcode these ports into goflow (specifically SendKafkaFlowMessage func) so when the flow comes from port "X" it does not send it to Kafka, it just ignores the message.

And it does the job but I'm not quite sure if I may still be overloading the server, in networking when you discard a packet before processing it this does not represent a performance issue for the router, would this be the case too? perhaps the impact is meaningless and I should not worry at all?

My netflow traffic is considerable and continuously growing so I can't tell if the server is overloading due to traffic peaks or because I introduced a little bit more processing overhead.

I would appreciate some insight into this. Thank you.

lspgn · 2020-06-19T03:56:39Z

Hi @jcdaniel14
What metrics are you looking at that make you say your server is overloaded?
Most of the processing of GoFlow is done when decoding Filtering on the interface should be negligible.

jcdaniel14 · 2020-06-22T16:29:35Z

Hi, I have a bare metal server, 128GB RAM 24 cores AMD Opteron 6174, CPU processing seems to be around 50% utilization with spikes up to 70%.
I also have in this server kafka and ELK Stack so it is difficult to tell if I'm putting pressure by adding these lines of code inside goflow methods.

lspgn · 2020-06-22T19:17:33Z

Do you have per-process monitoring? Something like prometheus-node-exporter and process-node-exporter/cadvisor?
For just goflow, this should be able to handle thousands of flows per second.

If you want to dive on the effect of adding this function, I would suggest using pprof.

mirsblog · 2020-06-23T19:48:39Z

@lspgn FWIW, I ran some tests on a VM with 4vCPU 2.3GHz and 32GB RAM and compared it with nfacct. The maximum rate at which I could decode IPFIX and publish to kafka, without dropped packets, was at 15000 packets/second. Anything more than that caused a significant packet loss. nfacct in comparison could easily scale up to 60000 packets/second.

I ran the tests with -kafka=false and it seemed to not have any effect. I also increased the number of workers but I did not find any marked difference in packet drops with 1 or 100 workers.

Test setup:
Host type: VM
CPU: 4 vCPU Intel Xeon E312xx
Memory: 32GB

I used IPFIX PCAP with tcpreplay to send packets from one host to another.
$ sudo tcpreplay -i ens3 -K --loop=50000 -p 15000 ipfix.pcap

I monitored packet drops @ /proc/net/udp6.

lspgn · 2020-06-23T20:03:37Z

@mirsblog interesting, thanks for the insights.
Does it use all the processors?

mirsblog · 2020-06-23T20:07:14Z

@mirsblog interesting, thanks for the insights.
Does it use all the processors?

I assume so given runtime.GOMAXPROCS(runtime.NumCPU()) is set in goflow.go. Would that be a correct assumption?

lspgn · 2020-06-23T20:12:25Z

Yes it should use all processors, was just curious if the load distribution would be the same when looking at htop. Did you compile GoFlow or did you get a specific binary?

mirsblog · 2020-06-23T20:17:46Z

Yes it should use all processors, was just curious if the load distribution would be the same when looking at htop. Did you compile GoFlow or did you get a specific binary?

GoFlow: v3.4.2
GoLang: 1.14
Built Alpine image using Dockerfile found in the v3.4.2. Ran using the instructions from README.

Edit: Tested just now and checked in htop to confirm CPU load distribution is even with workers=4

lspgn · 2020-06-23T20:29:14Z

The first thing I can think of affecting NetFlow decoding performance would be the shared template cache.
Protobuf encoding (+memory allocations) may also be the reason. Would need to compile a specific version which bypasses this. Also would need to pprof.

mirsblog · 2020-06-23T20:38:14Z

Ok. I will consider that and start another thread when I have more to share.

jcdaniel14 · 2020-07-06T18:49:29Z

Do you have per-process monitoring? Something like prometheus-node-exporter and process-node-exporter/cadvisor?
For just goflow, this should be able to handle thousands of flows per second.

If you want to dive on the effect of adding this function, I would suggest using pprof.

Don't really have per-process monitoring but will dive into it, the server is processing 25k flows/sec atm according to logstash and hasn't been affected noticeable by the changes I made. I was just worried that it handles the processing of flows/kafka/elasticsearch/logstash at the same time and I could put some stress by adapting the code the way I did, thanks for clarifying and giving some good practices advice.

jcdaniel14 closed this as completed Jul 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance impact question #80

Performance impact question #80

jcdaniel14 commented Jun 19, 2020

lspgn commented Jun 19, 2020

jcdaniel14 commented Jun 22, 2020

lspgn commented Jun 22, 2020

mirsblog commented Jun 23, 2020

lspgn commented Jun 23, 2020

mirsblog commented Jun 23, 2020

lspgn commented Jun 23, 2020

mirsblog commented Jun 23, 2020 •

edited

Loading

lspgn commented Jun 23, 2020

mirsblog commented Jun 23, 2020

jcdaniel14 commented Jul 6, 2020

Performance impact question #80

Performance impact question #80

Comments

jcdaniel14 commented Jun 19, 2020

lspgn commented Jun 19, 2020

jcdaniel14 commented Jun 22, 2020

lspgn commented Jun 22, 2020

mirsblog commented Jun 23, 2020

lspgn commented Jun 23, 2020

mirsblog commented Jun 23, 2020

lspgn commented Jun 23, 2020

mirsblog commented Jun 23, 2020 • edited Loading

lspgn commented Jun 23, 2020

mirsblog commented Jun 23, 2020

jcdaniel14 commented Jul 6, 2020

mirsblog commented Jun 23, 2020 •

edited

Loading