-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix NodePort traffic bypassing ingress NetworkPolicy on Linux #1816
Conversation
@jianjuns @antoninbas I haven't added e2e test, want to check with you if this approach/code look good to you overall first. |
295510b
to
910e4c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the flow in this doc need to be updated: https://github.com/vmware-tanzu/antrea/blob/main/docs/design/ovs-pipeline.md#ingressruletable-90, along with the description of the flow potentially
I also think it would make sense to document somewhere the behavior for NodePort traffic. I know you said it is common to most CNIs, but that still may come as a surprise to some of our users.
pkg/agent/route/route_linux.go
Outdated
// that will be sent to OVS so we can identify them later in the OVS pipeline. | ||
// It must match source address because kube-proxy ipvs mode will redirect ingress packets to output chain, and they | ||
// will have non local source addresses. | ||
writeLine(iptablesData, []string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that will be before SNAT right?
I noticed that you are adding this to the raw table, and not in the mangle table, as you described in the issue (#280 (comment)). Intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, SNAT is performend in POSTROUTING, we mark it in OUTPUT so it's before it.
Both raw and mangle can work, moved it to mangle in latest patch as raw is used mainly for configuring exemptions from connection tracking in combination with the NOTRACK target, while mangle is used for specialized packet alteration and we already have a rule in mangle to mark packets.
f8692b0
to
3c2143c
Compare
Codecov Report
@@ Coverage Diff @@
## main #1816 +/- ##
=======================================
Coverage ? 42.26%
=======================================
Files ? 196
Lines ? 16751
Branches ? 0
=======================================
Hits ? 7079
Misses ? 8673
Partials ? 999
Flags with carried forward coverage won't be shown. Click here to find out more. |
Updated pipeline.md.
I haven't found a good place to add in existing docs. Where would you recommend? The content will probably be something like a note that NodePort traffic with ExternalTrafficPolicy set to Cluster will be SNATed and NetworkPolicies are enforced after that. If people wants to limit ingress access to Pods from external while exposing them via a NodePort service, they may use Local ExternalTrafficPolicy to preserve client IPs. |
/test-all |
I found the e2e test doesn't work on kind clusters, suspecting there is an issue in OVS netdev datapath that pkt_mark doesn't take effect, will figure out the reason before merging this. |
But do you have another solution in mind? Not just this one, but also for SNAT policy, which we discussed to pass SNAT IP info to iptables using packet mark. Just for this one, I think it should be ok, if with netdev datapath we do not fix the NP for NodePort. |
We can change to mark traffic not generated from the Node, then at least liveness/readiness probe will not be affected with netdev DP? |
|
Thanks for the input @jianjuns @antoninbas.
Do you mean we enforce networkpolicy if it has mark because we mark non local traffic in iptables? Then there is another issue that external traffic will bypass policies with netdev datapath? I will make netdev datapath work using previous approach and report to OVS community after I confirm it's indeed the case. |
Yes, that is what I mean. It is at least better than breaking kubelet traffic for netdev DP. |
521d305
to
60872df
Compare
Then it seems the original way of matching source IP is still better as it can enforce networkpolicies for all traffic except NodePort's with netdev datapath. |
60872df
to
6739ded
Compare
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works for me.
To allow kubelet probe liveness and readiness of local Pods, a local probe flow with higher priority than ingress flows is configured. However, previously it identifies the locally generated traffic by matching source IP, which caused the issue that accessing a Pod via its NodePort service can bypass ingress NetworkPolicies too as the packet will be SNATed to local gateway IP. This patch fixes it by using fwmark to identify the locally generated traffic in the iptables output chain, so that external traffic forwarded by iptables/ipvs won't have the mark. Then the local probe flow uses pkt_mark to identify the eligible traffic. This approach works for Linux only and kernel datapath only. For Windows and netdev datapath, we may fix it after using AntreaProxy to serve NodePort service.
6739ded
to
83b406b
Compare
/test-all |
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving again, after resolving merge conflict
libOpenflow panics if a packet-in message is sent by OVS with a NXM_NX_PKT_MARK match field. Since antrea-io#1816, it is a possible situation: Traceflow requests for the Node IP can lead to reply traffic with the packet mark set, which are sent to the Antrea Agent as a PacketIn message. To resolve this issue, we first switch temporarily to a patched libOpenflow version without this issue. When the patch is ported in upstream libOpenflow, we can remove the replace directive from go.mod. However, it should be noted that reply packets for Traceflow requests are generally meaningless and should be ignored. In encapMode, The Traceflow implementation should also not timeout when a Traceflow request leaves the overlay: as soon as the request is forwarded through the gateway port, we should consider the request complete, and ignore any potential reply packet. So we include the following changes: * add a new "ForwardedOutOfOverlay" Traceflow action when a request is forwarded out of the network managed by Antrea in encapMode. The Controller can then mark the request as "succeeded". In theory, something similar could be done for other traffic modes, but it would be much more complex. * add support for Traceflow requests for which the destination is the gateway's IP, by reporting a "Delivered" action. * add an OVS flow in charge of dropping reply traffic for Traceflow requests (using the conntrack state to match this traffic), thus ensuring it is not set to the Agent. In our testing, this is especially useful when the destination IP is the local Node's IP, as the IP ToS field seems to be preseved in that case, causing the reply packet to be treated as a Traceflow request. We add end-to-end tests for both cases (external destination IP and Antrea gateway destination IP). Fixes antrea-io#1878
libOpenflow panics if a packet-in message is sent by OVS with a NXM_NX_PKT_MARK match field. Since antrea-io#1816, it is a possible situation: Traceflow requests for the Node IP can lead to reply traffic with the packet mark set, which are sent to the Antrea Agent as a PacketIn message. To resolve this issue, we first switch temporarily to a patched libOpenflow version without this issue. When the patch is ported in upstream libOpenflow, we can remove the replace directive from go.mod. Fixes antrea-io#1878
libOpenflow panics if a packet-in message is sent by OVS with a NXM_NX_PKT_MARK match field. Since #1816, it is a possible situation: Traceflow requests for the Node IP can lead to reply traffic with the packet mark set, which are sent to the Antrea Agent as a PacketIn message. To resolve this issue, we first switch temporarily to a patched libOpenflow version without this issue. When the patch is ported in upstream libOpenflow, we can remove the replace directive from go.mod. Fixes #1878
…-io#1816) To allow kubelet probe liveness and readiness of local Pods, a local probe flow with higher priority than ingress flows is configured. However, previously it identifies the locally generated traffic by matching source IP, which caused the issue that accessing a Pod via its NodePort service can bypass ingress NetworkPolicies too as the packet will be SNATed to local gateway IP. This patch fixes it by using fwmark to identify the locally generated traffic in the iptables output chain, so that external traffic forwarded by iptables/ipvs won't have the mark. Then the local probe flow uses pkt_mark to identify the eligible traffic. This approach works for Linux only and kernel datapath only. For Windows and netdev datapath, we may fix it after using AntreaProxy to serve NodePort service.
…-io#1816) To allow kubelet probe liveness and readiness of local Pods, a local probe flow with higher priority than ingress flows is configured. However, previously it identifies the locally generated traffic by matching source IP, which caused the issue that accessing a Pod via its NodePort service can bypass ingress NetworkPolicies too as the packet will be SNATed to local gateway IP. This patch fixes it by using fwmark to identify the locally generated traffic in the iptables output chain, so that external traffic forwarded by iptables/ipvs won't have the mark. Then the local probe flow uses pkt_mark to identify the eligible traffic. This approach works for Linux only and kernel datapath only. For Windows and netdev datapath, we may fix it after using AntreaProxy to serve NodePort service.
libOpenflow panics if a packet-in message is sent by OVS with a NXM_NX_PKT_MARK match field. Since antrea-io#1816, it is a possible situation: Traceflow requests for the Node IP can lead to reply traffic with the packet mark set, which are sent to the Antrea Agent as a PacketIn message. To resolve this issue, we first switch temporarily to a patched libOpenflow version without this issue. When the patch is ported in upstream libOpenflow, we can remove the replace directive from go.mod. Fixes antrea-io#1878
…-io#1816) To allow kubelet probe liveness and readiness of local Pods, a local probe flow with higher priority than ingress flows is configured. However, previously it identifies the locally generated traffic by matching source IP, which caused the issue that accessing a Pod via its NodePort service can bypass ingress NetworkPolicies too as the packet will be SNATed to local gateway IP. This patch fixes it by using fwmark to identify the locally generated traffic in the iptables output chain, so that external traffic forwarded by iptables/ipvs won't have the mark. Then the local probe flow uses pkt_mark to identify the eligible traffic. This approach works for Linux only and kernel datapath only. For Windows and netdev datapath, we may fix it after using AntreaProxy to serve NodePort service.
libOpenflow panics if a packet-in message is sent by OVS with a NXM_NX_PKT_MARK match field. Since antrea-io#1816, it is a possible situation: Traceflow requests for the Node IP can lead to reply traffic with the packet mark set, which are sent to the Antrea Agent as a PacketIn message. To resolve this issue, we first switch temporarily to a patched libOpenflow version without this issue. When the patch is ported in upstream libOpenflow, we can remove the replace directive from go.mod. Fixes antrea-io#1878
To allow kubelet probe liveness and readiness of local Pods, a local probe flow with higher priority than ingress flows is configured. However, previously it identifies the locally generated traffic by matching source IP, which caused the issue that accessing a Pod via its NodePort service can bypass ingress NetworkPolicies too as the packet will be SNATed to local gateway IP. This patch fixes it by using fwmark to identify the locally generated traffic in the iptables output chain, so that external traffic forwarded by iptables/ipvs won't have the mark. Then the local probe flow uses pkt_mark to identify the eligible traffic. This approach works for Linux only and kernel datapath only. For Windows and netdev datapath, we may fix it after using AntreaProxy to serve NodePort service.
libOpenflow panics if a packet-in message is sent by OVS with a NXM_NX_PKT_MARK match field. Since #1816, it is a possible situation: Traceflow requests for the Node IP can lead to reply traffic with the packet mark set, which are sent to the Antrea Agent as a PacketIn message. To resolve this issue, we first switch temporarily to a patched libOpenflow version without this issue. When the patch is ported in upstream libOpenflow, we can remove the replace directive from go.mod. Fixes #1878
To allow kubelet probe liveness and readiness of local Pods, a local probe flow with higher priority than ingress flows is configured. However, previously it identifies the locally generated traffic by matching source IP, which caused the issue that accessing a Pod via its NodePort service can bypass ingress NetworkPolicies too as the packet will be SNATed to local gateway IP. This patch fixes it by using fwmark to identify the locally generated traffic in the iptables output chain, so that external traffic forwarded by iptables/ipvs won't have the mark. Then the local probe flow uses pkt_mark to identify the eligible traffic. This approach works for Linux only and kernel datapath only. For Windows and netdev datapath, we may fix it after using AntreaProxy to serve NodePort service.
libOpenflow panics if a packet-in message is sent by OVS with a NXM_NX_PKT_MARK match field. Since #1816, it is a possible situation: Traceflow requests for the Node IP can lead to reply traffic with the packet mark set, which are sent to the Antrea Agent as a PacketIn message. To resolve this issue, we first switch temporarily to a patched libOpenflow version without this issue. When the patch is ported in upstream libOpenflow, we can remove the replace directive from go.mod. Fixes #1878
libOpenflow panics if a packet-in message is sent by OVS with a NXM_NX_PKT_MARK match field. Since antrea-io#1816, it is a possible situation: Traceflow requests for the Node IP can lead to reply traffic with the packet mark set, which are sent to the Antrea Agent as a PacketIn message. To resolve this issue, we first switch temporarily to a patched libOpenflow version without this issue. When the patch is ported in upstream libOpenflow, we can remove the replace directive from go.mod. Fixes antrea-io#1878
libOpenflow panics if a packet-in message is sent by OVS with a NXM_NX_PKT_MARK match field. Since #1816, it is a possible situation: Traceflow requests for the Node IP can lead to reply traffic with the packet mark set, which are sent to the Antrea Agent as a PacketIn message. To resolve this issue, we first switch temporarily to a patched libOpenflow version without this issue. When the patch is ported in upstream libOpenflow, we can remove the replace directive from go.mod. Fixes #1878
To allow kubelet probe liveness and readiness of local Pods, a local
probe flow with higher priority than ingress flows is configured.
However, previously it identifies the locally generated traffic by
matching source IP, which caused the issue that accessing a Pod via its
NodePort service can bypass ingress NetworkPolicies too as the packet
will be SNATed to local gateway IP.
This patch fixes it by using fwmark to identify the locally generated
traffic in the iptables output chain, so that external traffic forwarded
by iptables/ipvs won't have the mark. Then the local probe flow uses
pkt_mark to identify the eligible traffic. This approach works for Linux
only. For Windows, we can only fix it after using AntreaProxy to serve
NodePort service.
For #280
TODO: