Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestWireGuard/testServiceConnectivity failing in IPv6 cluster #3437

Closed
antoninbas opened this issue Mar 11, 2022 · 1 comment · Fixed by #3490
Closed

TestWireGuard/testServiceConnectivity failing in IPv6 cluster #3437

antoninbas opened this issue Mar 11, 2022 · 1 comment · Fixed by #3490
Assignees
Labels
area/transit/encryption Issues or PRs related to transit encryption (IPSec, SSL). area/transit/ipv6 Issues or PRs related to IPv6. kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@antoninbas
Copy link
Contributor

Describe the bug
After merging #3336, the e2e test TestWireGuard/testServiceConnectivity is failing for the jenkins-ipv6-only-e2e CI job.

It seems that the new IPv6 route reconciliation logic prevents correct gateway route configuration on the Node.

If we deploy Antrea without Wireguard, the gateway routes look like this:

fd02::1 dev antrea-gw0 metric 1024 pref medium # peer Pod gateway
fd02::/64 via fd02::1 dev antrea-gw0 metric 1024 pref medium # peer Pod CIDR
fd02:0:0:1::/64 dev antrea-gw0 proto kernel metric 256 pref medium # local Pod CIDR

If we deploy Antrea with Wireguard, the routes (antrea-gw0 + antrea-wg0) look like this:

fd02::/64 dev antrea-wg0 src fd02:0:0:1::1 metric 1024 pref medium
fd02:0:0:1::1 dev antrea-wg0 proto kernel metric 256 pref medium
fd02:0:0:1::/64 dev antrea-gw0 proto kernel metric 256 pref medium

However, if we first deploy Antrea without Wireguard, then re-deploy it (apply the yaml again) with Wireguard enabled (which is what the e2e test case does), we get the following routes:

fd02::1 dev antrea-gw0 metric 1024 pref medium
fd02::/64 dev antrea-wg0 src fd02:0:0:1::1 metric 1024 pref medium
fd02:0:0:1::1 dev antrea-wg0 proto kernel metric 256 pref medium
fd02:0:0:1::/64 dev antrea-gw0 proto kernel metric 256 pref medium

Notice how there is an "extra" route (the first one). This could explain why the test is failing.

Versions:
Antrea: top of tree

@antoninbas antoninbas added area/transit/encryption Issues or PRs related to transit encryption (IPSec, SSL). kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. area/transit/ipv6 Issues or PRs related to IPv6. labels Mar 11, 2022
@antoninbas
Copy link
Contributor Author

Test failure:

=== RUN   TestWireGuard/testServiceConnectivity
    service_test.go:183: Created service Pod IPs [fd74:ca9b:172:16:1::1545]
    wireguard_test.go:150: 
        	Error Trace:	wireguard_test.go:150
        	            				wireguard_test.go:75
        	Error:      	Received unexpected error:
        	            	nc stdout: <>, stderr: <nc: fd74:ca9b:172:17::e92d ([fd74:ca9b:172:17::e92d]:80): Connection timed out
        	            	nc: fd74:ca9b:172:17::e92d ([fd74:ca9b:172:17::e92d]:80): Connection timed out
        	            	nc: fd74:ca9b:172:17::e92d ([fd74:ca9b:172:17::e92d]:80): Connection timed out
        	            	nc: fd74:ca9b:172:17::e92d ([fd74:ca9b:172:17::e92d]:80): Connection timed out
        	            	nc: fd74:ca9b:172:17::e92d ([fd74:ca9b:172:17::e92d]:80): Connection timed out
        	            	>, err: <command terminated with exit code 1>
        	Test:       	TestWireGuard/testServiceConnectivity
        	Messages:   	Pod hostnetwork-pod should be able to connect the service's ClusterIP [fd74:ca9b:172:17::e92d]:80, but was not able to connect
=== CONT  TestWireGuard
    fixtures.go:269: Exporting test logs to '/var/lib/jenkins/workspace/antrea-ipv6-only-e2e-for-pull-request/antrea-test-logs/TestWireGuard/beforeTeardown.Mar10-20-48-30'
    fixtures.go:375: Error when exporting kubelet logs: error when running journalctl on Node 'antrea-ipv6-2-0', is it available? Error: <nil>
    fixtures.go:396: Deleting 'antrea-test' K8s Namespace
I0310 20:48:45.848635    6521 framework.go:630] Deleting Namespace antrea-test took 6.005761736s
--- FAIL: TestWireGuard (137.90s)
    --- PASS: TestWireGuard/testPodConnectivity (15.19s)
    --- FAIL: TestWireGuard/testServiceConnectivity (58.18s)

@antoninbas antoninbas added this to the Antrea v1.6 release milestone Mar 11, 2022
@antoninbas antoninbas added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Mar 18, 2022
@xliuxu xliuxu self-assigned this Mar 19, 2022
xliuxu added a commit to xliuxu/antrea that referenced this issue Mar 21, 2022
IPv6 routes to peer gateway should be deleted in IPv6 mode since it
needs to be routed through the tunnel.

Fixes antrea-io#3437

Signed-off-by: Xu Liu <xliu2@vmware.com>
xliuxu added a commit to xliuxu/antrea that referenced this issue Mar 21, 2022
IPv6 routes to peer gateway should be deleted since it needs
to be routed through the tunnel.

Fixes antrea-io#3437

Signed-off-by: Xu Liu <xliu2@vmware.com>
xliuxu added a commit to xliuxu/antrea that referenced this issue Mar 21, 2022
IPv6 routes to the peer gateway should be deleted if WireGuard is
enabled since it needs to be routed through the tunnel.

Fixes antrea-io#3437

Signed-off-by: Xu Liu <xliu2@vmware.com>
xliuxu added a commit to xliuxu/antrea that referenced this issue Mar 21, 2022
Check whether the route to peer gateway is replaced or not when
adding routes and delete the route if it is no longer required. This
can happen when the traffic encryption mode or traffic encapsulation
mode changes.

Fixes antrea-io#3437

Signed-off-by: Xu Liu <xliu2@vmware.com>
xliuxu added a commit to xliuxu/antrea that referenced this issue Mar 22, 2022
Check whether the route to peer gateway is replaced or not when
adding routes and delete the route if it is no longer required. This
can happen when the traffic encryption mode or traffic encapsulation
mode changes.

Fixes antrea-io#3437

Signed-off-by: Xu Liu <xliu2@vmware.com>
xliuxu added a commit to xliuxu/antrea that referenced this issue Mar 25, 2022
Check whether the route and neigh to peer gateway is needed and
delete the route if it is no longer required. This can happen when
the traffic encryption mode or traffic encapsulation mode changes.

Fixes antrea-io#3437

Signed-off-by: Xu Liu <xliu2@vmware.com>
xliuxu added a commit to xliuxu/antrea that referenced this issue Mar 25, 2022
Check whether the route and neigh to peer gateway are still needed
and delete the route and neigh if necessary. This can happen when
the traffic encryption mode or traffic encapsulation mode changes.

Fixes antrea-io#3437

Signed-off-by: Xu Liu <xliu2@vmware.com>
xliuxu added a commit to xliuxu/antrea that referenced this issue Mar 25, 2022
Check whether the route and neigh to peer gateway are still needed
and delete the route and neigh if necessary. This can happen when
the traffic encryption mode or traffic encapsulation mode changes.

Fixes antrea-io#3437

Signed-off-by: Xu Liu <xliu2@vmware.com>
xliuxu added a commit to xliuxu/antrea that referenced this issue Mar 25, 2022
Check whether the route and neigh to peer gateway are still needed
and delete the route and neigh if necessary. This can happen when
the traffic encryption mode or traffic encapsulation mode changes.

Fixes antrea-io#3437

Signed-off-by: Xu Liu <xliu2@vmware.com>
tnqn pushed a commit that referenced this issue Mar 25, 2022
Check whether the route and neigh to peer gateway are still needed
and delete the route and neigh if necessary. This can happen when
the traffic encryption mode or traffic encapsulation mode changes.

Fixes #3437

Signed-off-by: Xu Liu <xliu2@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/transit/encryption Issues or PRs related to transit encryption (IPSec, SSL). area/transit/ipv6 Issues or PRs related to IPv6. kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants