Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support TCP transport for elk flow collector #2387

Merged
merged 1 commit into from Jul 16, 2021

Conversation

zyiou
Copy link
Contributor

@zyiou zyiou commented Jul 12, 2021

Previously we do not support tcp transport for elk flow collector because template expiration is set by default for tcp transport, which does not align with IANA standard.
Now we set expiration for templates from tcp to 365 days as workaround to provide support for tcp transport.

@zyiou zyiou added area/flow-visibility Issues or PRs related to flow visibility support in Antrea area/flow-visibility/elk Issues or PRs related to the reference ELK configuration for flow visualization labels Jul 12, 2021
Copy link
Member

@srikartati srikartati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding TCP support as an alternative to UDP.

@@ -264,7 +264,7 @@ then please use the address:
`<Ipfix-Collector Cluster IP>:<port>:<TCP|UDP>`
* If you have deployed the [ELK
flow collector](#deployment-steps-1), then please use the address:
`<Logstash Cluster IP>:4739:UDP`
`<Logstash Cluster IP>:4739:UDP` or `<Logstash Cluster IP>:4738:TCP`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why a different port for TCP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tired using same port for TCP and UDP and found there will be decoding issues in logstash, which may need further investigation. So I separated it to two different ports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned my local setting and had another try. It is working now with one port.

srikartati
srikartati previously approved these changes Jul 13, 2021
Copy link
Member

@srikartati srikartati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Maybe we can support template refresh in go-ipfix for TCP exporting process too just for the sake of ELK IPFIX flow collector like UDP exporting process, even though it is not IANA standard. We can have an issue in go-ipfix repo for this purpose.

antoninbas
antoninbas previously approved these changes Jul 13, 2021
@antoninbas
Copy link
Contributor

@zyiou do we need to run any Jenkins tests for this?

@zyiou
Copy link
Contributor Author

zyiou commented Jul 14, 2021

@zyiou do we need to run any Jenkins tests for this?

I don't think we need since the changes are on manifests yaml.

Previously we do not support tcp transport for elk flow collector
because of template expiration is set by default for tcp transport,
which does not align with IANA standard. Now we set expiration for
templates from tcp to 365 days as workaround to provide support for
tcp transport.

Signed-off-by: zyiou <zyiou@vmware.com>
@zyiou
Copy link
Contributor Author

zyiou commented Jul 16, 2021

@antoninbas can you approve and merge this PR? Thanks!

@antoninbas
Copy link
Contributor

/skip-all

@antoninbas antoninbas merged commit e4f11f1 into antrea-io:main Jul 16, 2021
@srikartati
Copy link
Member

srikartati commented Jul 21, 2021

I nominate this as a candidate for the v1.2 patch release. This patch is required in setups where the UDP transport is broken because of unconventionally large metadata.

@antoninbas
Copy link
Contributor

I nominate this as a candidate for the v1.2 patch release. This patch is required in setups where the UDP transport is broken because of unconventionally large metadata.

Yes this can be backported to release-1.2, but what do you mean by "unconventionally large metadata"?

@srikartati
Copy link
Member

srikartati commented Jul 22, 2021

I nominate this as a candidate for the v1.2 patch release. This patch is required in setups where the UDP transport is broken because of unconventionally large metadata.

Yes this can be backported to release-1.2, but what do you mean by "unconventionally large metadata"?

If there are many labels for the pods (more than two), then we are going over the minimum MTU of 512 bytes in the case of UDP.
We are planning to move away from the default 512B MTU by using the interface MTU.

@antoninbas
Copy link
Contributor

I think you mean the maximum MTU of 512B when the actual path MTU is not known.

I can see the following in RFC7011:

The maximum size of exported messages MUST be configured such that
the total packet size does not exceed the PMTU. If the PMTU is
unknown, a maximum packet size of 512 octets SHOULD be used.

Do you interpret this as "the exporting process should not rely on IP fragmentation by the OS / network"? If yes, then I am not sure why this is the case. This seems like a hard restriction, especially when the message size is not fixed (variable-length IEs, which we have for Antrea).

@srikartati
Copy link
Member

I think you mean the maximum MTU of 512B when the actual path MTU is not known.

Yes, I meant maximum. My bad. When the pathMTU is not known, we are setting it as default.

The maximum size of exported messages MUST be configured such that
the total packet size does not exceed the PMTU. If the PMTU is
unknown, a maximum packet size of 512 octets SHOULD be used.

Do you interpret this as "the exporting process should not rely on IP fragmentation by the OS / network"? If yes, then I am not sure why this is the case.

Yes, from the RFC MTU excerpt, I interpreted that IP fragmentation is not recommended for UDP transport in IPFIX protocol. This may have to do nothing with IPFIX, but with UDP and IP fragmentation, we may have reorder issues and packet loss. One straightforward thing is to ignore the flow records that are in malformed packets after reassembly.
However, there is no discussion in RFC on how to deal with this situation in the collector. As we have to support third-party IPFIX collectors, I am not sure if we can go against the standard and send the UDP packets of any MTU.

Fo in-cluster traffic, relying on flow aggregator Pod's interface MTU will probably be ok. We are doing the same for Flow Exporter to Flow Aggregator IPFIX channel.

This seems like a hard restriction, especially when the message size is not fixed (variable-length IEs, which we have for Antrea).

Yes, we were thinking of managing this with errors in the code and documentation on when to move to TCP transport. Do you have suggestions in mind?

@antoninbas
Copy link
Contributor

Do you think it make sense to keep supporting UDP? Maybe we should just remove UDP support altogether and focus on TCP (and possibly SCTP) testing.

@srikartati
Copy link
Member

Do you think it make sense to keep supporting UDP? Maybe we should just remove UDP support altogether and focus on TCP (and possibly SCTP) testing.

Before making this decision, I think it is better to test out Antrea with the ELK flow collector (third-party) with a bigger MTU UDP packet say 9000 on ~1500 interface. In this scenario, we can see how Logstash is receiving the flow records; if there is any packet loss leading to missing records at the collector.

What do you think of this experiment?

@antoninbas
Copy link
Contributor

Do you think it make sense to keep supporting UDP? Maybe we should just remove UDP support altogether and focus on TCP (and possibly SCTP) testing.

Before making this decision, I think it is better to test out Antrea with the ELK flow collector (third-party) with a bigger MTU UDP packet say 9000 on ~1500 interface. In this scenario, we can see how Logstash is receiving the flow records; if there is any packet loss leading to missing records at the collector.

What do you think of this experiment?

I'm not sure I understand. "a bigger MTU UDP packet say 9000 on ~1500 interface" would mean that you are fragmenting the packet and it seems this is strongly discouraged by the RFC.

If the following are true:

  • we don't want to fragment IPFIX UDP packets (IP fragmentation)
  • IPFIX records can easily be larger than 1500B or so with the addition of labels, etc.
    Then I think we should drop UDP support altogether.

If we want to fragment the UDP packets, then we just need to remove the check in the ipfix code that drops large UDP packets.
If we don't want to fragment the UDP packets but we expect IPFIX records to stay smaller than 1500B most of the time, we should log an error when we drop a record because of its large size.

In any case, we should probably default to TCP for transport.

@srikartati
Copy link
Member

srikartati commented Aug 5, 2021

Sorry for the late response. I planned to do some experiments and hence the delay.

I'm not sure I understand. "a bigger MTU UDP packet say 9000 on ~1500 interface" would mean that you are fragmenting the packet and it seems this is strongly discouraged by the RFC.

Yes, I wanted to go against IPIFX RFC's recommendation of path MTU and see what happens with the ELK flow collector that is there inside the cluster. I tried to send large IPFIX packets by bloating up the Pod Labels in the workload traffic from the Flow Aggregator with 1450 MTU and saw how the ELK flow collector responds. I see that flow records are received properly at the ELK collector even after the fragmentation and reassembly--there is no loss of records over the time frame of 30mins for 15 workload flows.
Not sure what will happen with a burst of records. In addition, the scenario where the ELK collector is outside the cluster with connectivity over a public network.

Maybe my interpretation of IPFIX RFC recommendation of using path MTU is not entirely correct. Do you have an alternative interpretation?
Considering that the fragmentation is working fine with the ELK flow collector, I would like to go with the route, where we remove the check for large UDP packets and let the fragmentation take care of it. What is your opinion?
Defaulting to TCP is fine by me.

[EDIT] In addition, going forward we want to make sure that the number of fields is restricted in IPFIX packet to make sure the size in acceptable limits. This is will help in the processing of records at the collector as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/flow-visibility/elk Issues or PRs related to the reference ELK configuration for flow visualization area/flow-visibility Issues or PRs related to flow visibility support in Antrea
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants