Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WireGuard] Fix service traffic requiring SNAT #2697

Merged
merged 1 commit into from Sep 2, 2021

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Sep 1, 2021

The service traffic requiring SNAT couldn't be transferred to peer Node
when the endpoint Pod is on another Node. This was because we didn't
set any address on WireGuard device antrea-wg0. Therefore, when
iptables MASQUERADE action took effect, it chose one IP from other
interfaces, which might not be the gateway address on antrea-gw0. This
caused two problems:

  1. Peer wireguard didn't accept the packet as its source address was not
    in its "allowed ips"

  2. Peer Node wouldn't route the response back via the encrypted tunnel
    as the destination IP was not in its "allowed ips"

This patch fixes it by assigning the gateway IPs on the WireGuard
device. But it uses "/32" mask for IPv4 address and "/128" mask for
IPv6 address to avoid impacting routes on Antrea gateway.

Signed-off-by: Quan Tian qtian@vmware.com

Fixes #2696

Note: we cannot just add a rule to "ANTREA-POSTROUTING" chain which SNATs the traffic to the gateway's IP because currently KUBE-SERVICE chain is above ANTREA-POSTROUTING, it has applied MASQUERADE before it hits Antrea's rule.

@jianjuns
Copy link
Contributor

jianjuns commented Sep 1, 2021

Do you mean NodePort and external IP?

@tnqn
Copy link
Member Author

tnqn commented Sep 1, 2021

Do you mean NodePort and external IP?

Not only NodePort and external IP traffic. Any service traffic requiring SNAT were affected.
Even for localhost to cluster IP, by default it will use the Node's IP to access the Cluster IP, if we don't have any IP on antrea-wg0, it will choose an IP from other devices, which is still the Node's IP in my case.

@tnqn
Copy link
Member Author

tnqn commented Sep 1, 2021

Localhost to ClusterIP requires SNAT too, determined by this rule:

-A KUBE-SVC-BOBCSCHPXQRHDLL7 ! -s 192.168.0.0/16 -d 10.99.9.45/32 -p tcp -m comment --comment "dev1/nginx cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ

@tnqn
Copy link
Member Author

tnqn commented Sep 1, 2021

/test-all

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Thanks for the clarification. Two minor comments.

@@ -94,6 +97,27 @@ func (client *client) Init() error {
if err := netlink.LinkSetUp(link); err != nil {
return err
}
// Configure the IP addresses same as Antrea gateway so iptables MASQUERADE target will select it as source address.
// It's necessary to make service traffic requiring SNAT accepted by peer Node and to make their response routed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

service -> Service

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add the types of Service traffic here too? It helps me to understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "service traffic requiring SNAT" explain it? If listing types, I need to list "host-to-ClusterIP, host-to-NodePort, external-to-NodePort, host-to-externalIP, external-to-externalIP", maybe more..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "external IP", "NodePort", and "host to ClusterIP"? Or just "Service traffic from host network" (I know there is externalTrafficPolicy but probably no need to go all details).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

@codecov-commenter
Copy link

codecov-commenter commented Sep 1, 2021

Codecov Report

Merging #2697 (ce98a90) into main (bfa1bc4) will increase coverage by 5.02%.
The diff coverage is 38.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2697      +/-   ##
==========================================
+ Coverage   60.66%   65.69%   +5.02%     
==========================================
  Files         285      285              
  Lines       23006    26367    +3361     
==========================================
+ Hits        13957    17321    +3364     
+ Misses       7550     7428     -122     
- Partials     1499     1618     +119     
Flag Coverage Δ
e2e-tests 57.16% <38.88%> (?)
kind-e2e-tests 48.40% <38.88%> (+0.02%) ⬆️
unit-tests 41.04% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/agent.go 59.09% <0.00%> (+8.17%) ⬆️
pkg/agent/wireguard/client_linux.go 65.07% <50.00%> (-2.27%) ⬇️
pkg/controller/egress/ipallocator/allocator.go 65.00% <0.00%> (-15.42%) ⬇️
pkg/controller/networkpolicy/endpoint_querier.go 77.64% <0.00%> (-13.79%) ⬇️
pkg/legacyapis/core/v1alpha2/register.go 69.23% <0.00%> (-10.77%) ⬇️
pkg/controller/egress/controller.go 74.55% <0.00%> (-10.59%) ⬇️
pkg/apis/stats/register.go 71.42% <0.00%> (-10.39%) ⬇️
pkg/legacyapis/stats/register.go 71.42% <0.00%> (-10.39%) ⬇️
pkg/ovs/openflow/ofctrl_meter.go 33.84% <0.00%> (-10.16%) ⬇️
pkg/legacyapis/security/v1alpha1/register.go 73.33% <0.00%> (-10.00%) ⬇️
... and 271 more

The service traffic requiring SNAT couldn't be transferred to peer Node
when the endpoint Pod is on another Node. This was because we didn't
set any address on WireGuard device antrea-wg0. Therefore, when
iptables MASQUERADE action took effect, it chose one IP from other
interfaces, which might not be the gateway address on antrea-gw0. This
caused two problems:

1. Peer wireguard didn't accept the packet as its source address was not
in its "allowed ips"

2. Peer Node wouldn't route the response back via the encrypted tunnel
as the destination IP was not in its "allowed ips"

This patch fixes it by assigning the gateway IPs on the WireGuard
device. But it uses "/32" mask for IPv4 address and "/128" mask for
IPv6 address to avoid impacting routes on Antrea gateway.

Signed-off-by: Quan Tian <qtian@vmware.com>
@tnqn
Copy link
Member Author

tnqn commented Sep 1, 2021

/test-all

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but I just want to confirm that all ARP requests received by the host which are targeting the antrea-gw0 IP will be replied to on antrea-gw0, and not on antrea-wg0 (as the IP is also assigned to this interface). I don't remember if this guaranteed by the ARP protocol implementation or if this is subject to the arp_announce sysctl parameter.

@tnqn
Copy link
Member Author

tnqn commented Sep 2, 2021

This LGTM, but I just want to confirm that all ARP requests received by the host which are targeting the antrea-gw0 IP will be replied to on antrea-gw0, and not on antrea-wg0 (as the IP is also assigned to this interface). I don't remember if this guaranteed by the ARP protocol implementation or if this is subject to the arp_announce sysctl parameter.

This is guaranteed by the ARP protocol implementation: https://github.com/torvalds/linux/blob/7c636d4d20f8c5acfbfbc60f326fddb0e1cf5daa/net/ipv4/arp.c#L824-L836
None of the ARP related sysctl parameters affects which interface the reply will be sent, it's hardcoded to the interface the request was received.
arp_filter and arp_ignore may affect whether the interface will reply the request.
arp_announce affects which source IP it will use for ARP requests.
Besides, antrea-wg0 disables ARP and doesn't have a MAC address:

337: antrea-wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default
    link/none
    inet 192.168.0.1/32 scope global antrea-wg0
       valid_lft forever preferred_lft forever

@tnqn
Copy link
Member Author

tnqn commented Sep 2, 2021

/test-e2e

@tnqn tnqn merged commit dded211 into antrea-io:main Sep 2, 2021
@tnqn tnqn deleted the wireguard-masquerade branch September 2, 2021 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Service traffic requiring SNAT doesn't work with WireGuard
4 participants