Enhance Egress support in Traceflow #6125

Atish-iaf · 2024-03-20T05:53:35Z

Add EgressNodeIP field in Traceflow observations.
Add EgressNode field in observations from Egress Node as well when Egress Node is different from source Node. Previously, EgressNode field was available only in observations from source Node.

tnqn · 2024-03-25T13:41:41Z

pkg/agent/controller/traceflow/packetin.go

-	ob.EgressNode = egressNode
+	ob.EgressNode = egressNodeName
+	ob.EgressNodeIP = egressNodeIP
+	ob.SrcPodIP = srcPodIP


If there is a new field named SrcPodIP, it doesn't make sense to only set it for Egress observation as the name is very generic.

Yes, the new field name is generic and we can use it for other observations as well. I didn't do it in this PR as it is specific for Egress observations.
When implementing SrcPodIP for other observations we can have some discussions as well like add DstPodIP also.

The partial support would look like a bug and confuse users who see it in one scenario but don't see it in another scenario when the field applies to both. We need to think about the whole when adding something generic.

I can remove SrcPodIP field from this PR and can implement it in another one with DstPodIP field and support for other observations as well. We can have more discussions on that if required in separate issue.

Hi @tnqn
I have removed SrcPodIP field from this PR and I plan to create another PR for SrcPodIP field for all types of observations after merge of this PR.
PTAL, thanks

tnqn

cc @jianjuns and @antoninbas to check their opinions on this new field as well.

tnqn · 2024-04-02T14:03:45Z

pkg/agent/controller/traceflow/packetin.go

 				}
-				obEgress := getEgressObservation(true, egressIP, egressName, egressNode)
+				obEgress := getEgressObservation(true, egressIP, egressName, egressNodeName, c.nodeConfig.NodeIPv4Addr.IP.String())


It will panic in a IPv6 cluster.

antoninbas · 2024-04-02T18:01:05Z

cc @jianjuns and @antoninbas to check their opinions on this new field as well.

I don't really understand the value of this field. The Node is already identified by its name, which we include in the observation.
A Node also can have multiple IPs. Here the IP being used is the "management IP" used to connect to the K8s control plane, which I guess is fine, but it's not really relevant in the context of Egress?

rajnkamr · 2024-04-03T05:00:52Z

cc @jianjuns and @antoninbas to check their opinions on this new field as well.

I don't really understand the value of this field. The Node is already identified by its name, which we include in the observation. A Node also can have multiple IPs. Here the IP being used is the "management IP" used to connect to the K8s control plane, which I guess is fine, but it's not really relevant in the context of Egress?

the egress node IP refers to the IP address of the node through which the egress traffic is routed. Unlike the egress IP, which can be dynamically allocated from a pool, the egress node IP typically corresponds to the IP address of the specific node that handles the egress traffic. In this case, traceflow will be performed from platform managing cluster, more info #6099

jianjuns · 2024-04-03T05:17:47Z

I still did not get why egress Node IP is useful in the Traceflow results. Node name is not enough for user to identify the Node?

Atish-iaf · 2024-04-03T05:41:34Z

I still did not get why egress Node IP is useful in the Traceflow results. Node name is not enough for user to identify the Node?

Node name is enough for user to identify the Node.
egressIP sometimes can be equal to egressNodeIP (static egress case), and in other cases egreesIP may not be equal to egressNodeIP. This info we cannot get from Node name but we can get it if both egressIP and egressNodeIP are visible in Traceflow results. #6099 (comment)

jianjuns · 2024-04-03T16:57:25Z

So you meant the intention is for users to know Egress IP is Node IP or not? Why that is useful?
We have Egress name in the Traceflow results too and users can already know the applied Egress.

antoninbas · 2024-04-03T18:03:45Z

@rajnkamr

the egress node IP refers to the IP address of the node through which the egress traffic is routed

The Node can have many IP addresses. As I pointed out, this is just one IP address reported by K8s. I would call this IP the management IP, but it can be different from the transport IP, etc. This IP address is never used by the Egress traffic at any point, which is why it doesn't really seem related to the Egress feature in any way?

rajnkamr · 2024-04-04T05:22:26Z

As we know Egress IP addresses are used to ensure that traffic from pods to external has a consistent source IP.(Preferably static)
However, there are many external devices and software that use IP based access control lists to restrict incoming traffic for security reasons. These access control lists outside k8s cluster will block packets, which causes a connectivity issue and in this case only solution is to configure static egress ip.
Specially in above case while debugging, main motivation is to let user know that Egress IP and Egress Node IP is different which could help user identify the issue and may adopt back to use static egress ip.

tnqn · 2024-04-04T13:00:21Z

As we know Egress IP addresses are used to ensure that traffic from pods to external has a consistent source IP.(Preferably static)
However, there are many external devices and software that use IP based access control lists to restrict incoming traffic for security reasons.

This is the motivation of the Egress feature, not why Egress Node IP needs to be in Traceflow result.

These access control lists outside k8s cluster will block packets, which causes a connectivity issue and in this case only solution is to configure static egress ip.

This is not correct. Static egress IP is not the only solution, any type of egress IP can be the solution. The only difference between HA egress and static egress is how Egress IP is assigned to a Node, by Antrea or by users. In production we always recommend the former as it provides HA.

Specially in above case while debugging, main motivation is to let user know that Egress IP and Egress Node IP is different which could help user identify the issue and may adopt back to use static egress ip.

I don't quite get what the explaination means. The point is, Egress Node IP plays no role in the whole trace and the datapath of such scenario, regardless of whether it's the same as the Egress IP or not. If users encounter an external connectity issue and they trace the packet, they should check whether the Egress IP is whitelisted, and never need to know the Egress Node IP.

rajnkamr · 2024-04-05T05:38:47Z

This is the motivation of the Egress feature, not why Egress Node IP needs to be in Traceflow result.

Explained in Egress context since we are doing traceflow and actual packet will egress using egress node ip.
Mainly Egress Node ip could be helpful when egress node ip and egress ip are different
For example for traceflow case, As Actual traffic exits from egress node ip and in case traceflow to a destination is problematic via egress ip, letting user know about egress node ip can be helpful .

This is not correct. Static egress IP is not the only solution, any type of egress IP can be the solution. The only difference between HA egress and static egress is how Egress IP is assigned to a Node, by Antrea or by users. In production we always recommend the former as it provides HA.

Egress ip can be assigned to a dummy interface, wherein egress node ip will always be the actual interface of node so the information can be helpful in above context. Also in HA case wherein there could be multiple nodes and multiple interfaces(transport and management), in that case egress node ip will be the interface ip where traffic is actually exiting.

I don't quite get what the explaination means. The point is, Egress Node IP plays no role in the whole trace and the datapath of such scenario, regardless of whether it's the same as the Egress IP or not. If users encounter an external connectity issue and they trace the packet, they should check whether the Egress IP is whitelisted, and never need to know the Egress Node IP.

when trying to do traceflow with Egress IP where in node interface is not reachable and still traceflow to a destination could work but actual traffic will be blocked due egress node ip interface not reachable .

antoninbas · 2024-04-06T00:01:02Z

Egress ip can be assigned to a dummy interface, wherein egress node ip will always be the actual interface of node so the information can be helpful in above context. Also in HA case wherein there could be multiple node interfaces(transport and management), in that case egress node ip will be the interface ip where traffic is actually existing.

I think you mean exiting and not existing?

This is not what Quan was referring to when he was talking about Egress HA. Egress HA is the ability to fail over an Egress IP to a different Node if the first Node fails. This requires using ExternalIPPools (as opposed to static Egress IPs).

But saying "egress node ip will be the interface ip where traffic is actually exiting" is not correct. The current implementation uses c.nodeConfig.NodeIPv4Addr. As I pointed out before, this is the "management" IP of the Node in the context of K8s. There is no guarantee that the Egress traffic will exit the Node through the interface to which this IP is assigned. This is determined by host routing on the Node. Based on what the destination IP is, traffic can exit the Node through multiple possible interfaces, and these interfaces will have different IPs (which are different from the Egress IP and potentially different from c.nodeConfig.NodeIPv4Addr). Antrea doesn't even know which interface it will be.

It may be easier to discuss this in person at the next Antrea community meeting if there is confusion.

rajnkamr · 2024-04-08T05:56:04Z

@antoninbas ,
Most of time in actual deployment, Highly likely we can always find Egress Node IP as Management IP of cluster !

Egress Node IP can be used as the Management IP address of the cluster(not node) !
Although there are other places in platform software, where we can get the Node IP of each k8 node, however Egress Node IP will always be the management IP of the cluster from the node Egress traffic is exiting and management IP could be the only IP which might be exposed externally for the management of the cluster . It might make sense to keep it in that context !
We might want to include it during community meet.
+@tschwaller

- Add "EgressNodeIP" field in Traceflow observations. - Add "EgressNode" field in observations from Egress Node as well when Egress Node is different from source Node. Previously, "EgressNode" field was available only in observations from source Node. Fixes antrea-io#6099 Signed-off-by: Kumar Atish <kumar.atish@broadcom.com>

luolanzone · 2024-04-22T07:04:52Z

Synced with @tnqn, we may need another way to support this kind of feature, move it to next release.

rajnkamr · 2024-04-22T07:27:50Z

@luolanzone , Egress Node IP is relevant wrt management ip of the cluster, however we would like to see the actual traffic path in traceflow

Atish-iaf added the area/ops/traceflow Issues or PRs related to the Traceflow feature label Mar 20, 2024

Atish-iaf force-pushed the enhnace-egress-in-traceflow branch from 70107f9 to 5762c45 Compare March 20, 2024 06:59

Atish-iaf marked this pull request as ready for review March 20, 2024 07:01

Atish-iaf requested review from rajnkamr and tnqn March 20, 2024 07:01

rajnkamr added this to the Antrea v2.0 release milestone Mar 21, 2024

tnqn reviewed Mar 26, 2024

View reviewed changes

Atish-iaf force-pushed the enhnace-egress-in-traceflow branch from 5762c45 to 2e2c579 Compare March 27, 2024 16:12

Atish-iaf requested a review from tnqn March 27, 2024 16:21

tnqn added the api-review Categorizes an issue or PR as actively needing an API review. label Apr 2, 2024

tnqn reviewed Apr 2, 2024

View reviewed changes

rajnkamr added the action/release-note Indicates a PR that should be included in release notes. label Apr 16, 2024

Atish-iaf force-pushed the enhnace-egress-in-traceflow branch from 2e2c579 to 624173c Compare April 17, 2024 09:28

luolanzone modified the milestones: Antrea v2.0 release, Antrea v2.1 release Apr 22, 2024

rajnkamr removed this from the Antrea v2.1 release milestone May 3, 2024

rajnkamr removed action/release-note Indicates a PR that should be included in release notes. api-review Categorizes an issue or PR as actively needing an API review. labels May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance Egress support in Traceflow #6125

Enhance Egress support in Traceflow #6125

Atish-iaf commented Mar 20, 2024 •

edited

tnqn Mar 25, 2024

Atish-iaf Mar 27, 2024

tnqn Mar 27, 2024

Atish-iaf Mar 27, 2024

Atish-iaf Apr 2, 2024

tnqn left a comment

tnqn Apr 2, 2024

antoninbas commented Apr 2, 2024

rajnkamr commented Apr 3, 2024

jianjuns commented Apr 3, 2024

Atish-iaf commented Apr 3, 2024 •

edited

jianjuns commented Apr 3, 2024

antoninbas commented Apr 3, 2024

rajnkamr commented Apr 4, 2024 •

edited

tnqn commented Apr 4, 2024

rajnkamr commented Apr 5, 2024 •

edited

antoninbas commented Apr 6, 2024

rajnkamr commented Apr 8, 2024 •

edited

luolanzone commented Apr 22, 2024

rajnkamr commented Apr 22, 2024

Enhance Egress support in Traceflow #6125

Are you sure you want to change the base?

Enhance Egress support in Traceflow #6125

Conversation

Atish-iaf commented Mar 20, 2024 • edited

tnqn Mar 25, 2024

Choose a reason for hiding this comment

Atish-iaf Mar 27, 2024

Choose a reason for hiding this comment

tnqn Mar 27, 2024

Choose a reason for hiding this comment

Atish-iaf Mar 27, 2024

Choose a reason for hiding this comment

Atish-iaf Apr 2, 2024

Choose a reason for hiding this comment

tnqn left a comment

Choose a reason for hiding this comment

tnqn Apr 2, 2024

Choose a reason for hiding this comment

antoninbas commented Apr 2, 2024

rajnkamr commented Apr 3, 2024

jianjuns commented Apr 3, 2024

Atish-iaf commented Apr 3, 2024 • edited

jianjuns commented Apr 3, 2024

antoninbas commented Apr 3, 2024

rajnkamr commented Apr 4, 2024 • edited

tnqn commented Apr 4, 2024

rajnkamr commented Apr 5, 2024 • edited

antoninbas commented Apr 6, 2024

rajnkamr commented Apr 8, 2024 • edited

luolanzone commented Apr 22, 2024

rajnkamr commented Apr 22, 2024

Atish-iaf commented Mar 20, 2024 •

edited

Atish-iaf commented Apr 3, 2024 •

edited

rajnkamr commented Apr 4, 2024 •

edited

rajnkamr commented Apr 5, 2024 •

edited

rajnkamr commented Apr 8, 2024 •

edited