-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WireGuard] Fix service traffic requiring SNAT #2697
Conversation
b75336e
to
7f3873e
Compare
Do you mean NodePort and external IP? |
7f3873e
to
44db2d8
Compare
Not only NodePort and external IP traffic. Any service traffic requiring SNAT were affected. |
Localhost to ClusterIP requires SNAT too, determined by this rule:
|
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Thanks for the clarification. Two minor comments.
pkg/agent/wireguard/client_linux.go
Outdated
@@ -94,6 +97,27 @@ func (client *client) Init() error { | |||
if err := netlink.LinkSetUp(link); err != nil { | |||
return err | |||
} | |||
// Configure the IP addresses same as Antrea gateway so iptables MASQUERADE target will select it as source address. | |||
// It's necessary to make service traffic requiring SNAT accepted by peer Node and to make their response routed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
service -> Service
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add the types of Service traffic here too? It helps me to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "service traffic requiring SNAT" explain it? If listing types, I need to list "host-to-ClusterIP, host-to-NodePort, external-to-NodePort, host-to-externalIP, external-to-externalIP", maybe more..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "external IP", "NodePort", and "host to ClusterIP"? Or just "Service traffic from host network" (I know there is externalTrafficPolicy but probably no need to go all details).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks
Codecov Report
@@ Coverage Diff @@
## main #2697 +/- ##
==========================================
+ Coverage 60.66% 65.69% +5.02%
==========================================
Files 285 285
Lines 23006 26367 +3361
==========================================
+ Hits 13957 17321 +3364
+ Misses 7550 7428 -122
- Partials 1499 1618 +119
Flags with carried forward coverage won't be shown. Click here to find out more.
|
The service traffic requiring SNAT couldn't be transferred to peer Node when the endpoint Pod is on another Node. This was because we didn't set any address on WireGuard device antrea-wg0. Therefore, when iptables MASQUERADE action took effect, it chose one IP from other interfaces, which might not be the gateway address on antrea-gw0. This caused two problems: 1. Peer wireguard didn't accept the packet as its source address was not in its "allowed ips" 2. Peer Node wouldn't route the response back via the encrypted tunnel as the destination IP was not in its "allowed ips" This patch fixes it by assigning the gateway IPs on the WireGuard device. But it uses "/32" mask for IPv4 address and "/128" mask for IPv6 address to avoid impacting routes on Antrea gateway. Signed-off-by: Quan Tian <qtian@vmware.com>
44db2d8
to
ce98a90
Compare
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, but I just want to confirm that all ARP requests received by the host which are targeting the antrea-gw0 IP will be replied to on antrea-gw0, and not on antrea-wg0 (as the IP is also assigned to this interface). I don't remember if this guaranteed by the ARP protocol implementation or if this is subject to the arp_announce
sysctl parameter.
This is guaranteed by the ARP protocol implementation: https://github.com/torvalds/linux/blob/7c636d4d20f8c5acfbfbc60f326fddb0e1cf5daa/net/ipv4/arp.c#L824-L836
|
/test-e2e |
The service traffic requiring SNAT couldn't be transferred to peer Node
when the endpoint Pod is on another Node. This was because we didn't
set any address on WireGuard device antrea-wg0. Therefore, when
iptables MASQUERADE action took effect, it chose one IP from other
interfaces, which might not be the gateway address on antrea-gw0. This
caused two problems:
Peer wireguard didn't accept the packet as its source address was not
in its "allowed ips"
Peer Node wouldn't route the response back via the encrypted tunnel
as the destination IP was not in its "allowed ips"
This patch fixes it by assigning the gateway IPs on the WireGuard
device. But it uses "/32" mask for IPv4 address and "/128" mask for
IPv6 address to avoid impacting routes on Antrea gateway.
Signed-off-by: Quan Tian qtian@vmware.com
Fixes #2696
Note: we cannot just add a rule to "ANTREA-POSTROUTING" chain which SNATs the traffic to the gateway's IP because currently KUBE-SERVICE chain is above ANTREA-POSTROUTING, it has applied MASQUERADE before it hits Antrea's rule.