New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved NetFlowV9 support #21

Merged
merged 19 commits into from Aug 25, 2017

Conversation

Projects
None yet
5 participants
@kroepke
Member

kroepke commented Aug 21, 2017

This PR changes the way templates are handled.

Since for V9 template data flows are not sent with every packet, the implementation must buffer packets until it receives the necessary templates to know how to parse them. The same is true for the option template.

This implementation moves the buffering and template aggregation into a custom codec aggregator, so that the codec itself, which runs after journalling the message, can assume that it has all the templates it needs to successfully parse a packet. This is even more important when processing a journal after a restart.
The RFC requires not to write templates to disk (or otherwise store them independent of data flows), because they might change at any time. We therefore colocate them with the data itself, taking the performance hit of writing more bytes to disk, but with the benefit of a safer implementation.

Thus this implementation does not lose data in the case it doesn't have templates yet. Those are resent regularly by the exporter, for each observation domain.

To be compatible with Graylog 2.3, this change comes with a custom codec aggregator, for 3.0 we can migrate the code back into the server.

fixes #18
fixes #19
fixes #20

kroepke added some commits Aug 15, 2017

wip
fix v9 parsing by preserving the complete packets during buffering
aggregating the data flowsets does not work, because all records are based on the packet's timestamp
to simplify parsing in the codec, the aggregator now collects all templates/option templates into a protobuf
and adds received and buffered data flows to be parsed.
the netflow packets are preserved completely and can also contain templates. the codec will not use them, though, only the aggregator does

@kroepke kroepke added this to the 2.3.1 milestone Aug 21, 2017

}
}
private void queueBufferedPackets(Set<TemplateKey> templates, Set<ChannelBuffer> packetsToSend, TemplateKey templateKey) {

This comment has been minimized.

@kroepke

kroepke Aug 21, 2017

Member

This actually requires checking all templates used by a packet, so this code will change soon and is not the final version.

@kroepke kroepke requested a review from bernd Aug 21, 2017

kroepke added some commits Aug 22, 2017

fix handler setup
the codec-aggregator is null in super class, so that the put call added it after the raw-message handler
thus the code never ran and screwed up parsing

@kroepke kroepke changed the title from [WIP] V9 fixes to Improved NetFlowV9 support Aug 22, 2017

kroepke added some commits Aug 22, 2017

change how the packet cache works
the previous implementation only checked for a single template id to be present for each packet, which in general is wrong if not all templates arrive at the same time (which might happen for large numbers of active templates)
the new implementation manually checks each packet's template requirements agains the ids of received templates, for the current remoteaddress/source id combination

@bernd bernd self-assigned this Aug 24, 2017

@bernd

I am seeing a field nf_nf_field_153 when ingesting v9 via pmacctd. Is this a dynamically generated field? Also, the nf prefix is duplicated.

@kroepke

This comment has been minimized.

Member

kroepke commented Aug 24, 2017

@bernd

This comment has been minimized.

Member

bernd commented Aug 25, 2017

I also saw this. Can this still happen? I thought we are waiting until we get a template before we pass on the packet.

2017-08-24 20:24:01,298 ERROR: org.graylog.plugins.netflow.codecs.NetFlowCodec - Error parsing NetFlow packet <66e509d0-88f9-11e7-8379-024278d7ba18> received from <127.0.0.1:42934>
org.graylog.plugins.netflow.flows.EmptyTemplateException: Unable to parse NetFlow 9 records without template. Discarding packet.
	at org.graylog.plugins.netflow.v9.NetFlowV9Parser.parsePacket(NetFlowV9Parser.java:65) ~[classes/:?]
	at org.graylog.plugins.netflow.codecs.NetFlowCodec.lambda$decodeV9Packets$7(NetFlowCodec.java:186) ~[classes/:?]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_144]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_144]
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) ~[?:1.8.0_144]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[?:1.8.0_144]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[?:1.8.0_144]
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[?:1.8.0_144]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_144]
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) ~[?:1.8.0_144]
	at org.graylog.plugins.netflow.codecs.NetFlowCodec.decodeV9Packets(NetFlowCodec.java:187) ~[classes/:?]
	at org.graylog.plugins.netflow.codecs.NetFlowCodec.decodeV9(NetFlowCodec.java:156) ~[classes/:?]
	at org.graylog.plugins.netflow.codecs.NetFlowCodec.decodeMessages(NetFlowCodec.java:134) [classes/:?]
	at org.graylog2.shared.buffers.processors.DecodingProcessor.processMessage(DecodingProcessor.java:144) [classes/:?]
	at org.graylog2.shared.buffers.processors.DecodingProcessor.onEvent(DecodingProcessor.java:87) [classes/:?]
	at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:74) [classes/:?]
	at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:42) [classes/:?]
	at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143) [disruptor-3.3.6.jar:?]
	at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [metrics-core-3.2.2.jar:3.2.2]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
@kroepke

This comment has been minimized.

Member

kroepke commented Aug 25, 2017

That's weird. Do you have a config that causes this?
If nothing else I'll try to get better logging in there (e.g. printing the packet structure for when that error happens).

@kroepke

This comment has been minimized.

Member

kroepke commented Aug 25, 2017

The field 153 is because our default field definition list is missing the type. It is "flow end milliseconds" of https://www.iana.org/assignments/ipfix/ipfix.xhtml
We should probably ship with a better default list, taking the one from IPFIX.

@joschi

This comment has been minimized.

Contributor

joschi commented Aug 25, 2017

We should probably ship with a better default list, taking the one from IPFIX.

I would strongly advise against this, as IPFIX ("NetFlow version 10") has some incompatible fields with NetFlow version 9.

For example, id 1 is "octetDeltaCount" in IPFIX (8 bytes), while it's the number of incoming bytes in NetFlow 9 (4 bytes) (see http://netflow.caligare.com/netflow_v9.htm).

@kroepke

This comment has been minimized.

Member

kroepke commented Aug 25, 2017

@joschi Fair point, the names sometimes differ, but semantically IPFIX is the same as NetFlow v9 when it comes to the 127 fields v9 defined (as stated in RFC 5102)

At the very least the definition list we ship should be consistent in the IPFIX extensions, while retaining the original v9 fields.

kroepke added some commits Aug 25, 2017

@bernd

bernd approved these changes Aug 25, 2017

LGTM 👍

@bernd bernd merged commit 4e9d68f into master Aug 25, 2017

3 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
license/cla Contributor License Agreement is signed.
Details

@bernd bernd deleted the v9-fixes branch Aug 25, 2017

bernd added a commit that referenced this pull request Aug 25, 2017

Improved NetFlowV9 support (#21)
* move template caching and flow buffering in custom message aggregator

much of this is still wip

* remove template cache

* wip for codec aggregation and parsing custom format of v9

currently fixing tests

* wip

* migrate tests

* fix v9 parsing by preserving the complete packets during buffering

aggregating the data flowsets does not work, because all records are based on the packet's timestamp
to simplify parsing in the codec, the aggregator now collects all templates/option templates into a protobuf
and adds received and buffered data flows to be parsed.
the netflow packets are preserved completely and can also contain templates. the codec will not use them, though, only the aggregator does

* remove duplicated license header

* update protobuf comment

* update comment

* tweak license header to avoid diff

* fix guice setup for transport

* fix handler setup

the codec-aggregator is null in super class, so that the put call added it after the raw-message handler
thus the code never ran and screwed up parsing

* change how the packet cache works

the previous implementation only checked for a single template id to be present for each packet, which in general is wrong if not all templates arrive at the same time (which might happen for large numbers of active templates)
the new implementation manually checks each packet's template requirements agains the ids of received templates, for the current remoteaddress/source id combination

* prefix netflow v9 fields with nf_

* remove unused optional template atomic ref

* don't prefix unknown fields names, that is done centrally now

fixes double prefixing of fields

* don't forget to fix test

* add flow timestamp fields from ipfix

* fix test after field definition list update

(cherry picked from commit 4e9d68f)
@michelealbrigo

This comment has been minimized.

michelealbrigo commented Aug 28, 2017

I can confirm that #20 looks solved now: I am running rc5 and it's dealing with our full netflow configuration without any apparent problem since some hours.

@moadiv

This comment has been minimized.

moadiv commented Sep 5, 2017

Hi everybody, i'm having troubles with an Invalid FlowVersion Exception (Invalid NetFlow version 0) when trying to log the flow of a netgear switch.
This is my environment:

Graylog v2.3.1+9f2c6ef
graylog-plugin-netflow-2.3.0-rc.5.jar
switch netgear M5300-28G3 sflow V5

And this is the server.log result:

2017-09-05T12:02:13.080-04:00 ERROR [NetFlowCodec] Error parsing NetFlow packet <94943050-9253-11e7-9ea1-000c29f4e228> received from <192.168.128.254:6343>
org.graylog.plugins.netflow.flows.InvalidFlowVersionException: **Invalid NetFlow version 0**
        at org.graylog.plugins.netflow.v5.NetFlowV5Parser.parseHeader(NetFlowV5Parser.java:67) ~[graylog-plugin-netflow-2.3.0-rc.5.tmp:?]
        at org.graylog.plugins.netflow.v5.NetFlowV5Parser.parsePacket(NetFlowV5Parser.java:33) ~[graylog-plugin-netflow-2.3.0-rc.5.tmp:?]
        at org.graylog.plugins.netflow.codecs.NetFlowCodec.decodeMessages(NetFlowCodec.java:127) [graylog-plugin-netflow-2.3.0-rc.5.tmp:?]
        at org.graylog2.shared.buffers.processors.DecodingProcessor.processMessage(DecodingProcessor.java:144) [graylog.jar:?]
        at org.graylog2.shared.buffers.processors.DecodingProcessor.onEvent(DecodingProcessor.java:87) [graylog.jar:?]
        at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:74) [graylog.jar:?]
        at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:42) [graylog.jar:?]
        at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143) [graylog.jar:?]
        at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]

Thanks in advance !!!!

@joschi

This comment has been minimized.

Contributor

joschi commented Sep 5, 2017

@moadiv This plugin currently doesn't support SFlow, see #3.

Please post questions about this plugin to our discussion forum or join the #graylog channel on freenode IRC.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment