Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NodePort traffic bypassing ingress NetworkPolicy on Linux #1816

Merged
merged 2 commits into from
Feb 11, 2021

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Feb 4, 2021

To allow kubelet probe liveness and readiness of local Pods, a local
probe flow with higher priority than ingress flows is configured.
However, previously it identifies the locally generated traffic by
matching source IP, which caused the issue that accessing a Pod via its
NodePort service can bypass ingress NetworkPolicies too as the packet
will be SNATed to local gateway IP.

This patch fixes it by using fwmark to identify the locally generated
traffic in the iptables output chain, so that external traffic forwarded
by iptables/ipvs won't have the mark. Then the local probe flow uses
pkt_mark to identify the eligible traffic. This approach works for Linux
only. For Windows, we can only fix it after using AntreaProxy to serve
NodePort service.

For #280

TODO:

  • Replace ofnet once it's merged upstream
  • Add e2e test

@tnqn
Copy link
Member Author

tnqn commented Feb 4, 2021

@jianjuns @antoninbas I haven't added e2e test, want to check with you if this approach/code look good to you overall first.

@tnqn tnqn force-pushed the bypass-networkpolicy branch 2 times, most recently from 295510b to 910e4c0 Compare February 4, 2021 16:41
jianjuns
jianjuns previously approved these changes Feb 4, 2021
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits.

pkg/agent/config/node_config.go Outdated Show resolved Hide resolved
pkg/agent/config/node_config.go Outdated Show resolved Hide resolved
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the flow in this doc need to be updated: https://github.com/vmware-tanzu/antrea/blob/main/docs/design/ovs-pipeline.md#ingressruletable-90, along with the description of the flow potentially

I also think it would make sense to document somewhere the behavior for NodePort traffic. I know you said it is common to most CNIs, but that still may come as a surprise to some of our users.

// that will be sent to OVS so we can identify them later in the OVS pipeline.
// It must match source address because kube-proxy ipvs mode will redirect ingress packets to output chain, and they
// will have non local source addresses.
writeLine(iptablesData, []string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that will be before SNAT right?

I noticed that you are adding this to the raw table, and not in the mangle table, as you described in the issue (#280 (comment)). Intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, SNAT is performend in POSTROUTING, we mark it in OUTPUT so it's before it.
Both raw and mangle can work, moved it to mangle in latest patch as raw is used mainly for configuring exemptions from connection tracking in combination with the NOTRACK target, while mangle is used for specialized packet alteration and we already have a rule in mangle to mark packets.

@tnqn tnqn force-pushed the bypass-networkpolicy branch 2 times, most recently from f8692b0 to 3c2143c Compare February 7, 2021 10:01
@codecov-io
Copy link

codecov-io commented Feb 7, 2021

Codecov Report

❗ No coverage uploaded for pull request base (main@44b884f). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1816   +/-   ##
=======================================
  Coverage        ?   42.26%           
=======================================
  Files           ?      196           
  Lines           ?    16751           
  Branches        ?        0           
=======================================
  Hits            ?     7079           
  Misses          ?     8673           
  Partials        ?      999           
Flag Coverage Δ
kind-e2e-tests 42.26% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

@tnqn
Copy link
Member Author

tnqn commented Feb 7, 2021

I think the flow in this doc need to be updated: https://github.com/vmware-tanzu/antrea/blob/main/docs/design/ovs-pipeline.md#ingressruletable-90, along with the description of the flow potentially

Updated pipeline.md.

I also think it would make sense to document somewhere the behavior for NodePort traffic. I know you said it is common to most CNIs, but that still may come as a surprise to some of our users.

I haven't found a good place to add in existing docs. Where would you recommend? The content will probably be something like a note that NodePort traffic with ExternalTrafficPolicy set to Cluster will be SNATed and NetworkPolicies are enforced after that. If people wants to limit ingress access to Pods from external while exposing them via a NodePort service, they may use Local ExternalTrafficPolicy to preserve client IPs.

jianjuns
jianjuns previously approved these changes Feb 7, 2021
@tnqn
Copy link
Member Author

tnqn commented Feb 8, 2021

/test-all

@tnqn
Copy link
Member Author

tnqn commented Feb 8, 2021

I found the e2e test doesn't work on kind clusters, suspecting there is an issue in OVS netdev datapath that pkt_mark doesn't take effect, will figure out the reason before merging this.
If it's the case, do you think it's risky to use this approach as liveness/readiness probe may not work if pods are isolated when deploying antrea with OVS netdev datapath? @jianjuns @antoninbas

@jianjuns
Copy link
Contributor

jianjuns commented Feb 8, 2021

But do you have another solution in mind? Not just this one, but also for SNAT policy, which we discussed to pass SNAT IP info to iptables using packet mark.

Just for this one, I think it should be ok, if with netdev datapath we do not fix the NP for NodePort.

@jianjuns
Copy link
Contributor

jianjuns commented Feb 8, 2021

We can change to mark traffic not generated from the Node, then at least liveness/readiness probe will not be affected with netdev DP?

@antoninbas
Copy link
Contributor

  • Netdev datapath: if Jianjun's proposal is easy to implement (switch from marking traffic originating from the Node to marking masqueraded traffic) then we can go with that. Otherwise, if the issue is indeed in the OVS netdev datapath I suggest we keep the original bypass flow for the netdev datapath only. My reasoning being that some folks have started investigating using the netdev datapath (with DPDK) for the primary network interface. And in that context - and for folks using Kind to test Antrea too - having host probes work is more important that NP enforcement on NodePort traffic. We can just skip the new e2e test on Kind. And we should report the issue to the OVS team after we confirm it.
  • Documentation: let's just mention the caveat in the OVS pipeline documentation along with your current changes. It's not really a user-facing document but we can start there. If users start asking about this behavior, we will start a new document to explain it.

@tnqn
Copy link
Member Author

tnqn commented Feb 9, 2021

Thanks for the input @jianjuns @antoninbas.

We can change to mark traffic not generated from the Node, then at least liveness/readiness probe will not be affected with netdev DP?

Do you mean we enforce networkpolicy if it has mark because we mark non local traffic in iptables? Then there is another issue that external traffic will bypass policies with netdev datapath?

I will make netdev datapath work using previous approach and report to OVS community after I confirm it's indeed the case.

@jianjuns
Copy link
Contributor

jianjuns commented Feb 9, 2021

Do you mean we enforce networkpolicy if it has mark because we mark non local traffic in iptables? Then there is another issue that external traffic will bypass policies with netdev datapath?

Yes, that is what I mean. It is at least better than breaking kubelet traffic for netdev DP.

@tnqn tnqn force-pushed the bypass-networkpolicy branch 2 times, most recently from 521d305 to 60872df Compare February 10, 2021 04:55
@tnqn
Copy link
Member Author

tnqn commented Feb 10, 2021

Do you mean we enforce networkpolicy if it has mark because we mark non local traffic in iptables? Then there is another issue that external traffic will bypass policies with netdev datapath?

Yes, that is what I mean. It is at least better than breaking kubelet traffic for netdev DP.

Then it seems the original way of matching source IP is still better as it can enforce networkpolicies for all traffic except NodePort's with netdev datapath.
I only applied the new approach for Linux+kernel datapath scenario in latest patch. PTAL.

@tnqn
Copy link
Member Author

tnqn commented Feb 10, 2021

/test-all

jianjuns
jianjuns previously approved these changes Feb 10, 2021
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works for me.

pkg/agent/openflow/pipeline.go Outdated Show resolved Hide resolved
pkg/agent/openflow/pipeline.go Outdated Show resolved Hide resolved
pkg/agent/openflow/pipeline.go Outdated Show resolved Hide resolved
To allow kubelet probe liveness and readiness of local Pods, a local
probe flow with higher priority than ingress flows is configured.
However, previously it identifies the locally generated traffic by
matching source IP, which caused the issue that accessing a Pod via its
NodePort service can bypass ingress NetworkPolicies too as the packet
will be SNATed to local gateway IP.

This patch fixes it by using fwmark to identify the locally generated
traffic in the iptables output chain, so that external traffic forwarded
by iptables/ipvs won't have the mark. Then the local probe flow uses
pkt_mark to identify the eligible traffic. This approach works for Linux
only and kernel datapath only. For Windows and netdev datapath, we may
fix it after using AntreaProxy to serve NodePort service.
jianjuns
jianjuns previously approved these changes Feb 10, 2021
@tnqn
Copy link
Member Author

tnqn commented Feb 10, 2021

/test-all

antoninbas
antoninbas previously approved these changes Feb 10, 2021
@antoninbas antoninbas dismissed stale reviews from jianjuns and themself via bb44bfd February 10, 2021 21:12
@antoninbas
Copy link
Contributor

/test-all

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving again, after resolving merge conflict

@antoninbas antoninbas merged commit 51002a0 into antrea-io:main Feb 11, 2021
@antoninbas antoninbas added this to the Antrea v0.13.0 release milestone Feb 11, 2021
antoninbas added a commit to antoninbas/antrea that referenced this pull request Feb 18, 2021
libOpenflow panics if a packet-in message is sent by OVS with a
NXM_NX_PKT_MARK match field. Since antrea-io#1816, it is a possible situation:
Traceflow requests for the Node IP can lead to reply traffic with the
packet mark set, which are sent to the Antrea Agent as a PacketIn
message. To resolve this issue, we first switch temporarily to a patched
libOpenflow version without this issue. When the patch is ported in
upstream libOpenflow, we can remove the replace directive from go.mod.

However, it should be noted that reply packets for Traceflow requests
are generally meaningless and should be ignored. In encapMode, The
Traceflow implementation should also not timeout when a Traceflow
request leaves the overlay: as soon as the request is forwarded through
the gateway port, we should consider the request complete, and ignore
any potential reply packet. So we include the following changes:

* add a new "ForwardedOutOfOverlay" Traceflow action when a request is
  forwarded out of the network managed by Antrea in encapMode. The
  Controller can then mark the request as "succeeded". In theory,
  something similar could be done for other traffic modes, but it would
  be much more complex.
* add support for Traceflow requests for which the destination is the
  gateway's IP, by reporting a "Delivered" action.
* add an OVS flow in charge of dropping reply traffic for Traceflow
  requests (using the conntrack state to match this traffic), thus
  ensuring it is not set to the Agent. In our testing, this is
  especially useful when the destination IP is the local Node's IP, as
  the IP ToS field seems to be preseved in that case, causing the reply
  packet to be treated as a Traceflow request.

We add end-to-end tests for both cases (external destination IP and
Antrea gateway destination IP).

Fixes antrea-io#1878
antoninbas added a commit to antoninbas/antrea that referenced this pull request Feb 18, 2021
libOpenflow panics if a packet-in message is sent by OVS with a
NXM_NX_PKT_MARK match field. Since antrea-io#1816, it is a possible situation:
Traceflow requests for the Node IP can lead to reply traffic with the
packet mark set, which are sent to the Antrea Agent as a PacketIn
message. To resolve this issue, we first switch temporarily to a patched
libOpenflow version without this issue. When the patch is ported in
upstream libOpenflow, we can remove the replace directive from go.mod.

Fixes antrea-io#1878
antoninbas added a commit that referenced this pull request Feb 18, 2021
libOpenflow panics if a packet-in message is sent by OVS with a
NXM_NX_PKT_MARK match field. Since #1816, it is a possible situation:
Traceflow requests for the Node IP can lead to reply traffic with the
packet mark set, which are sent to the Antrea Agent as a PacketIn
message. To resolve this issue, we first switch temporarily to a patched
libOpenflow version without this issue. When the patch is ported in
upstream libOpenflow, we can remove the replace directive from go.mod.

Fixes #1878
antoninbas pushed a commit to antoninbas/antrea that referenced this pull request Feb 25, 2021
…-io#1816)

To allow kubelet probe liveness and readiness of local Pods, a local
probe flow with higher priority than ingress flows is configured.
However, previously it identifies the locally generated traffic by
matching source IP, which caused the issue that accessing a Pod via its
NodePort service can bypass ingress NetworkPolicies too as the packet
will be SNATed to local gateway IP.

This patch fixes it by using fwmark to identify the locally generated
traffic in the iptables output chain, so that external traffic forwarded
by iptables/ipvs won't have the mark. Then the local probe flow uses
pkt_mark to identify the eligible traffic. This approach works for Linux
only and kernel datapath only. For Windows and netdev datapath, we may
fix it after using AntreaProxy to serve NodePort service.
antoninbas pushed a commit to antoninbas/antrea that referenced this pull request Feb 25, 2021
…-io#1816)

To allow kubelet probe liveness and readiness of local Pods, a local
probe flow with higher priority than ingress flows is configured.
However, previously it identifies the locally generated traffic by
matching source IP, which caused the issue that accessing a Pod via its
NodePort service can bypass ingress NetworkPolicies too as the packet
will be SNATed to local gateway IP.

This patch fixes it by using fwmark to identify the locally generated
traffic in the iptables output chain, so that external traffic forwarded
by iptables/ipvs won't have the mark. Then the local probe flow uses
pkt_mark to identify the eligible traffic. This approach works for Linux
only and kernel datapath only. For Windows and netdev datapath, we may
fix it after using AntreaProxy to serve NodePort service.
antoninbas added a commit to antoninbas/antrea that referenced this pull request Feb 26, 2021
libOpenflow panics if a packet-in message is sent by OVS with a
NXM_NX_PKT_MARK match field. Since antrea-io#1816, it is a possible situation:
Traceflow requests for the Node IP can lead to reply traffic with the
packet mark set, which are sent to the Antrea Agent as a PacketIn
message. To resolve this issue, we first switch temporarily to a patched
libOpenflow version without this issue. When the patch is ported in
upstream libOpenflow, we can remove the replace directive from go.mod.

Fixes antrea-io#1878
@antoninbas antoninbas mentioned this pull request Feb 26, 2021
antoninbas pushed a commit to antoninbas/antrea that referenced this pull request Feb 26, 2021
…-io#1816)

To allow kubelet probe liveness and readiness of local Pods, a local
probe flow with higher priority than ingress flows is configured.
However, previously it identifies the locally generated traffic by
matching source IP, which caused the issue that accessing a Pod via its
NodePort service can bypass ingress NetworkPolicies too as the packet
will be SNATed to local gateway IP.

This patch fixes it by using fwmark to identify the locally generated
traffic in the iptables output chain, so that external traffic forwarded
by iptables/ipvs won't have the mark. Then the local probe flow uses
pkt_mark to identify the eligible traffic. This approach works for Linux
only and kernel datapath only. For Windows and netdev datapath, we may
fix it after using AntreaProxy to serve NodePort service.
antoninbas added a commit to antoninbas/antrea that referenced this pull request Feb 26, 2021
libOpenflow panics if a packet-in message is sent by OVS with a
NXM_NX_PKT_MARK match field. Since antrea-io#1816, it is a possible situation:
Traceflow requests for the Node IP can lead to reply traffic with the
packet mark set, which are sent to the Antrea Agent as a PacketIn
message. To resolve this issue, we first switch temporarily to a patched
libOpenflow version without this issue. When the patch is ported in
upstream libOpenflow, we can remove the replace directive from go.mod.

Fixes antrea-io#1878
antoninbas pushed a commit that referenced this pull request Feb 26, 2021
To allow kubelet probe liveness and readiness of local Pods, a local
probe flow with higher priority than ingress flows is configured.
However, previously it identifies the locally generated traffic by
matching source IP, which caused the issue that accessing a Pod via its
NodePort service can bypass ingress NetworkPolicies too as the packet
will be SNATed to local gateway IP.

This patch fixes it by using fwmark to identify the locally generated
traffic in the iptables output chain, so that external traffic forwarded
by iptables/ipvs won't have the mark. Then the local probe flow uses
pkt_mark to identify the eligible traffic. This approach works for Linux
only and kernel datapath only. For Windows and netdev datapath, we may
fix it after using AntreaProxy to serve NodePort service.
antoninbas added a commit that referenced this pull request Feb 26, 2021
libOpenflow panics if a packet-in message is sent by OVS with a
NXM_NX_PKT_MARK match field. Since #1816, it is a possible situation:
Traceflow requests for the Node IP can lead to reply traffic with the
packet mark set, which are sent to the Antrea Agent as a PacketIn
message. To resolve this issue, we first switch temporarily to a patched
libOpenflow version without this issue. When the patch is ported in
upstream libOpenflow, we can remove the replace directive from go.mod.

Fixes #1878
antoninbas pushed a commit that referenced this pull request Feb 26, 2021
To allow kubelet probe liveness and readiness of local Pods, a local
probe flow with higher priority than ingress flows is configured.
However, previously it identifies the locally generated traffic by
matching source IP, which caused the issue that accessing a Pod via its
NodePort service can bypass ingress NetworkPolicies too as the packet
will be SNATed to local gateway IP.

This patch fixes it by using fwmark to identify the locally generated
traffic in the iptables output chain, so that external traffic forwarded
by iptables/ipvs won't have the mark. Then the local probe flow uses
pkt_mark to identify the eligible traffic. This approach works for Linux
only and kernel datapath only. For Windows and netdev datapath, we may
fix it after using AntreaProxy to serve NodePort service.
antoninbas added a commit that referenced this pull request Feb 26, 2021
libOpenflow panics if a packet-in message is sent by OVS with a
NXM_NX_PKT_MARK match field. Since #1816, it is a possible situation:
Traceflow requests for the Node IP can lead to reply traffic with the
packet mark set, which are sent to the Antrea Agent as a PacketIn
message. To resolve this issue, we first switch temporarily to a patched
libOpenflow version without this issue. When the patch is ported in
upstream libOpenflow, we can remove the replace directive from go.mod.

Fixes #1878
antoninbas added a commit to antoninbas/antrea that referenced this pull request Mar 11, 2021
libOpenflow panics if a packet-in message is sent by OVS with a
NXM_NX_PKT_MARK match field. Since antrea-io#1816, it is a possible situation:
Traceflow requests for the Node IP can lead to reply traffic with the
packet mark set, which are sent to the Antrea Agent as a PacketIn
message. To resolve this issue, we first switch temporarily to a patched
libOpenflow version without this issue. When the patch is ported in
upstream libOpenflow, we can remove the replace directive from go.mod.

Fixes antrea-io#1878
antoninbas added a commit that referenced this pull request Mar 12, 2021
libOpenflow panics if a packet-in message is sent by OVS with a
NXM_NX_PKT_MARK match field. Since #1816, it is a possible situation:
Traceflow requests for the Node IP can lead to reply traffic with the
packet mark set, which are sent to the Antrea Agent as a PacketIn
message. To resolve this issue, we first switch temporarily to a patched
libOpenflow version without this issue. When the patch is ported in
upstream libOpenflow, we can remove the replace directive from go.mod.

Fixes #1878
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants