Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sending through Logstash shows corrupted data #43

Closed
jcdaniel14 opened this issue Oct 17, 2019 · 2 comments
Closed

Sending through Logstash shows corrupted data #43

jcdaniel14 opened this issue Oct 17, 2019 · 2 comments

Comments

@jcdaniel14
Copy link

Hi, not sure if I'm not understanding how to properly use goflow.
I ran with -kafka=false flag and it shows flows correctly

Type:NETFLOW_V9 TimeReceived:1571325708 SequenceNum:82208684 SamplingRate:0 SamplerAddress:10.101.11.211 TimeFlowStart:1571325692 TimeFlowEnd:1571325692 Bytes:72 Packets:1 SrcAddr:2607:f8b0:4008:80f::200a DstAddr:2800:bf0:2a7:1122:14bd:7ee3:e8a8:c58d Etype:34525 Proto:6 SrcPort:443 DstPort:59056 SrcIf:216 DstIf:219 SrcMac:00:00:00:00:00:00 DstMac:00:00:00:00:00:00 SrcVlan:0 DstVlan:0 VlanId:0 IngressVrfID:1610612739 EgressVrfID:1610612736 IPTos:0 ForwardingStatus:64 IPTTL:0 TCPFlags:17 IcmpType:0 IcmpCode:0 IPv6FlowLabel:592069 FragmentId:0 FragmentOffset:0 BiFlowDirection:0 SrcAS:0 DstAS:0 NextHop:10.101.21.192 NextHopAS:0 SrcNet:48 DstNet:48

Then I enable kafka and I'm using logstash to send the data to elasticsearch for further analysis, I don't filter the data inside logstash, but stdout shows something like this:

{ "@timestamp" => 2019-10-17T15:19:50.121Z, "message" => "\b\u0003\u0010����\u0005 ��'(��\u00052\u0004\u001F\rC\u0014:\u0004-�4\nH�\u0016P\u0002Z\u0004\ne\v�b\u0004\ne5��\u0001\u0018�\u0001\u0018�\u0001�\u0001�\u0001�\u0001�\u0001\u0006�\u0001�\u0003�\u0001��\u0002�\u0001@�\u0001\u0010�\u0001�\u0010�\u0002�\u0005�\u0002����\u0006�\u0002����\u0006", "@version" => "1" } { "@timestamp" => 2019-10-17T15:19:50.121Z, "message" => "\b\u0003\u0010����\u0005 ��'(�\u00052\u0004\u0011���:\u0004��1�H�\vP\u0001Z\u0004\ne\v�b\u0004\u0000\u0000\u0000\u0000�\u0001\u0018�\u0001�\u0001�\u0001\u0006�\u0001�\u0003�\u0001��\u0002�\u0001\u0002�\u0001@�\u0001\u0010�\u0001�\u0010�\u0002�\u0005�\u0002����\u0006", "@version" => "1" }

I want to be able to send the data to elasticsearch to create graphs and query ES for obtaining data related to traffic and outgoing interfaces.

I'm barely noob when it comes to Go and Kafka so I'll look forward to any suggestions made in this thread.

@jcdaniel14
Copy link
Author

For anyone falling into this, after reading 300 articles about Kafka and Go and taking a peek at the code I realized flows were being "encoded" by something called Protocol Buffers, which is something I didn't know about until yesterday but in simple terms is JSON on steroids, luckily there was a plugin in Logstash for "deserializing" protocol buffers and I tried it but it didn't quite work, Logstash forums say they're a bit outdated on protobuf tech. Logstash console below.

{ "ForwardingStatus" => 64, "TimeReceived" => 1571337358, "IcmpType" => 0, "TCPFlags" => 16, "DstVlan" => 0, "FragmentId" => 0, "FlowDirection" => 0, "SrcVlan" => 0, "SequenceNum" => 82825472, "SrcAS" => 0, "SrcAddr" => "4S#:", "Packets" => 1, "DstIf" => 219, "SrcPort" => 443, "DstAddr" => "-?=+-?=-?=", "IPTTL" => 0, "Type" => "NETFLOW_V9", "NextHopAS" => 0, "SamplerAddress" => "\ne\v-?=", "Etype" => 2048, "DstPort" => 62650, "IngressVrfID" => 1610612739, "EgressVrfID" => 1610612736, "SamplingRate" => 0, "NextHop" => "\nek-?=",

I decided to create my own function in Golang to send directly to Elasticsearch, and it did work, but I'm not sure if it will have an impact in performance given that I know nothing about how Go implements concurrency, I'm hoping to receive around 15k flows/sec and hardware is not an issue so I'll keep improving as I learn Go.

@lspgn
Copy link
Contributor

lspgn commented Oct 18, 2019

Hi @jcdaniel14,
Yes, at the moment, it's only Protobuf. In the future, other serialization format may be enabled.

There was a derived technique to do this that another user showed me, would be to plug the logging output to logstash: #8

I usually advise to use Kafka as it provides automatic message distribution and I am more confident on the performance. Nonetheless, you can use a different "transport" to format into JSON (which is introduced here and here).

The way of creating your own function in Golang to send to ElasticSearch is one way. Kafka helps managing the flow of incoming messages by partitioning it and provides a buffer in case of delays. At 15kflows/sec you should be fine anyway.
If it's in a custom GoFlow version, the risk is to lose UDP packets due to CPU being busy pushing the current ones to ES. If it's an external program plugged on Kafka, it's fine.

@lspgn lspgn closed this as completed Dec 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants