Routing issue in IPv6-Only cluster #1663

YannikSc · 2024-05-06T20:33:50Z

What happened?

Packets bounce around between nodes and the router

What did you expect to happen?

The packets arrive at the desired LoadBalancer.

Also a strange thing, in my novice eyes, is that every node announces all loadbalancer IPs, even though pods behind the loadbalancer are not scheduled on the respective node (example: the control node also announces the route for the echo server loadbalancer).

How can we reproduce the behavior you experienced?

Steps to reproduce the behavior:

Create a IPv6-only cluster using kubeadm following the provided documentation
Follow the documentation to set up kube-router
- IPv6-Only will require to set --enable-ipv4=false and --enable-ipv6=true
Setup BGP
- Have a router which can do BGP and ECMP (I use bird for it)
- Update the config again with --peer-router-ips=[ROUTER_IP] and --peer-router-asns=[ASNS] etc.
- Add a --loadbalancer-ip-range=[YOUR_RANGE] and --advertise-loadbalancer-ip=true
Assuming, that BGP sessions come up and routes arrive at the router, set up some application like the echo server example
- Change the Service.type to LoadBalancer
Routing to the loadbalancer will result in a loop
- an mtr -T -P 80 [LB_IP] will show the loop

Screenshots / Architecture Diagrams / Network Topologies

               ┌────────┐                    
     ┌────────┤  Router ├───────────┐        
     │         └────┬───┘           │        
     │              │                │        
     │              │                │        
     │              │                │        
     │              │                │        
     │              │                │        
┌────┴────┐   ┌────┴────┐  ┌───────┴───────┐
│ Node 01  │   │ Node 02 │   │ Control Plane  │
└─────────┘   └─────────┘  └───────────────┘

System Information (please complete the following information):

Kube-Router Version (kube-router --version): v2.1.1
Kube-Router Parameters:
- --run-router=true
- --run-firewall=true
- --run-service-proxy=false
- --bgp-graceful-restart=true
- --enable-ipv4=false
- --enable-ipv6=true
- --enable-overlay=false
- --nodes-full-mesh=false
- --advertise-external-ip=true
- --advertise-cluster-ip=true
- --router-id=generate
- --kubeconfig=/var/lib/kube-router/kubeconfig
- --service-cluster-ip-range=2001:470:504f:f200::/112
- --peer-router-ips=2001:470:504f:103::1
- --peer-router-asns=65000
- --advertise-loadbalancer-ip=true
- --run-loadbalancer=true
- --loadbalancer-ip-range=2001:470:504f:f300::/64
Kubernetes Version (kubectl version) : v1.29.4 (Client & Server)
Cloud Type: on premise
Kubernetes Deployment Type: kubeadm
Kube-Router Deployment Type: DaemonSet
Cluster Size: 1xControl, 2xNode

Logs, other output, metrics

Router routing table ip -6 r show proto bird

2001:470:504f:f100::/64 via 2001:470:504f:103::5 dev eth0.103 metric 32 pref medium
2001:470:504f:f101::/64 via 2001:470:504f:103::6 dev eth0.103 metric 32 pref medium
2001:470:504f:f102::/64 via 2001:470:504f:103::7 dev eth0.103 metric 32 pref medium
2001:470:504f:f200::1 metric 32 pref medium
        nexthop via 2001:470:504f:103::5 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::6 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::7 dev eth0.103 weight 1 
2001:470:504f:f200::a metric 32 pref medium
        nexthop via 2001:470:504f:103::5 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::6 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::7 dev eth0.103 weight 1 
2001:470:504f:f200::f0e metric 32 pref medium
        nexthop via 2001:470:504f:103::5 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::6 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::7 dev eth0.103 weight 1 
2001:470:504f:f200::78b5 metric 32 pref medium
        nexthop via 2001:470:504f:103::5 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::6 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::7 dev eth0.103 weight 1 
2001:470:504f:f200::9964 metric 32 pref medium
        nexthop via 2001:470:504f:103::5 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::6 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::7 dev eth0.103 weight 1 
2001:470:504f:f200::a917 metric 32 pref medium
        nexthop via 2001:470:504f:103::5 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::6 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::7 dev eth0.103 weight 1 
2001:470:504f:f300:: metric 32 pref medium
        nexthop via 2001:470:504f:103::5 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::6 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::7 dev eth0.103 weight 1 
2001:470:504f:f300::1 metric 32 pref medium
        nexthop via 2001:470:504f:103::5 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::6 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::7 dev eth0.103 weight 1

Routing table on nodes (they all look the same except for the pod network on the kube-bridge)

yannik@kubernetes-02 ~ % ip -6 r
2001:470:504f:103::/64 dev eth0 proto ra metric 1002 pref high
2001:470:504f:f102::/64 dev kube-bridge proto kernel metric 256 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
fe80::/64 dev kube-bridge proto kernel metric 256 pref medium
fe80::/64 dev veth53f3d5d4 proto kernel metric 256 pref medium
fe80::/64 dev veth165b0ac5 proto kernel metric 256 pref medium
fe80::/64 dev veth1d668e87 proto kernel metric 256 pref medium
default via fe80::b62e:99ff:fea9:599c dev eth0 proto ra metric 1002 pref high

To show off the routing cycle I issued an mtr --report command to the echo LoadBalancer

% mtr -T -P 443 2001:470:504f:f300::1 -m 10 -c 1 --report
Start: 2024-05-06T22:23:18+0200
HOST: hypervisor                  Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 2001:470:504f:101::1       0.0%     1    0.2   0.2   0.2   0.2   0.0
  2.|-- 2001:470:504f:103::7       0.0%     1    0.4   0.4   0.4   0.4   0.0
  3.|-- 2001:470:504f:103::1       0.0%     1    0.4   0.4   0.4   0.4   0.0
  4.|-- 2001:470:504f:103::6       0.0%     1    0.7   0.7   0.7   0.7   0.0
  5.|-- 2001:470:504f:103::1       0.0%     1    0.5   0.5   0.5   0.5   0.0
  6.|-- 2001:470:504f:103::7       0.0%     1    0.7   0.7   0.7   0.7   0.0
  7.|-- 2001:470:504f:103::1       0.0%     1    0.7   0.7   0.7   0.7   0.0
  8.|-- 2001:470:504f:103::5       0.0%     1    0.9   0.9   0.9   0.9   0.0
  9.|-- 2001:470:504f:103::1       0.0%     1    0.9   0.9   0.9   0.9   0.0
 10.|-- 2001:470:504f:103::6       0.0%     1    1.0   1.0   1.0   1.0   0.0

The text was updated successfully, but these errors were encountered:

aauren · 2024-05-07T15:29:42Z

Unfortunately, I'm not able to reproduce this result on any of my clusters. I have tested with a cluster peering with a Juniper router that is controlling the next-hop routing. I have also tested with a cluster that is using FRR to peer with kube-router and controlling the Linux ip routing table much the way that bird is doing for you. Neither of them exhibit this issue.

If I had to guess, I would say that it is most likely an artifact of how bird is configured? If you can somehow dig into it more to provide more information, update this issue and I'll take a look again.

aauren · 2024-05-07T15:33:23Z

Routing table on FRR node:

# ip -6 r show 
::1 dev lo proto kernel metric 256 pref medium
2001:db8:42:1000::/64 nhid 16 via 2600:1f18:477d:6900:e20c:7eee:81e1:9014 dev ens5 proto bgp metric 20 onlink pref medium
2001:db8:42:1001::/64 nhid 18 via 2600:1f18:477d:6900:e5e9:d1cd:2e60:6d11 dev ens5 proto bgp metric 20 onlink pref medium
2001:db8:42:1200:: nhid 32 proto bgp metric 20 pref medium
        nexthop via 2600:1f18:477d:6900:e20c:7eee:81e1:9014 dev ens5 weight 1 onlink 
        nexthop via 2600:1f18:477d:6900:e5e9:d1cd:2e60:6d11 dev ens5 weight 1 onlink 
2600:1f18:477d:6900::/64 dev ens5 proto ra metric 100 pref medium
fe80::/64 dev ens5 proto kernel metric 256 pref medium
default via fe80::8ff:f7ff:fe8a:e375 dev ens5 proto ra metric 100 expires 1798sec pref medium

mtr results:

# mtr -T -P 5000 2001:db8:42:1200:: -m 10 -c 1 --report
Start: 2024-05-07T15:30:13+0000
HOST: aws-bgp                     Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 2001:db8:42:1200::         0.0%     1    0.4   0.4   0.4   0.4   0.0

Relevant FRR details:

# vtysh -c "show bgp all"                              
...
   Network          Next Hop            Metric LocPrf Weight Path
*>i2001:db8:42:1000::/64
                    2600:1f18:477d:6900:e20c:7eee:81e1:9014
                                            10    100      0 i
*>i2001:db8:42:1001::/64
                    2600:1f18:477d:6900:e5e9:d1cd:2e60:6d11
                                            10    100      0 i
*>i2001:db8:42:1200::/128
                    2600:1f18:477d:6900:e20c:7eee:81e1:9014
                                            10    100      0 i
*=i                 2600:1f18:477d:6900:e5e9:d1cd:2e60:6d11
                                            10    100      0 i

Displayed  3 routes and 4 total paths

YannikSc · 2024-05-07T17:51:24Z

Hey, thanks for your effort. I just tried to use Quagga BGP instead of Bird to rule this issue out, but I was just unable to get a session established between kube-router and Quagga, so I rather want to stay with Bird.

I don't really know where I can investigate further.
In my eyes everyone is sending packets to the router (as expected), the router routes the packets correctly to the nodes and the nodes don't know where to go with them so they throw them down the default route, back to the router.
So in my eyes the issue is simply, that the nodes do not have routes set up to handle the LoadBalancer/Service traffic. This is, at least in my head, completely unrelated to the BGP.

So maybe you can give me a hint in which direction I have to look now

aauren · 2024-05-07T20:34:55Z

I noticed that you're running --service-proxy=false it is normally this functionality in kube-router that is responsible for routing packets for services. I would assume that if you're running that way then you have something like kube-proxy in the loop that is handling routing service traffic? Maybe something is wrong with kube-proxy or whatever is handling service proxy for you?

YannikSc · 2024-05-07T21:18:21Z

Ohh no. I used the non-Service Proxy config in combination with removing the kube-proxy.
I enabled the service-proxy and while I still had issues out-of-the-box with the firewall blocking everything.
After adding a rule to accept all incoming traffic from the server's internal IP, everything seems to work.

So, I'm sorry for making such trouble and big thanks for your help!

aauren · 2024-05-07T22:38:37Z

No worries. BTW in regards to your other mention of how workers without the workload were advertising VIPs I would recommend that you look into Kubernetes Traffic Policies: https://kubernetes.io/docs/reference/networking/virtual-ips/#traffic-policies

There is also a service.local annotation that kube-router provides for controlling this as well (https://www.kube-router.io/docs/user-guide/#controlling-service-locality-traffic-policies), but I think that its better and more portable to use the upstream service traffic policy definition.

However, either mechanism will allow you to control how kube-router advertises BGP VIPs.

YannikSc added the bug label May 6, 2024

YannikSc closed this as completed May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Routing issue in IPv6-Only cluster #1663

Routing issue in IPv6-Only cluster #1663

YannikSc commented May 6, 2024

aauren commented May 7, 2024

aauren commented May 7, 2024

YannikSc commented May 7, 2024

aauren commented May 7, 2024

YannikSc commented May 7, 2024

aauren commented May 7, 2024

Routing issue in IPv6-Only cluster #1663

Routing issue in IPv6-Only cluster #1663

Comments

YannikSc commented May 6, 2024

aauren commented May 7, 2024

aauren commented May 7, 2024

YannikSc commented May 7, 2024

aauren commented May 7, 2024

YannikSc commented May 7, 2024

aauren commented May 7, 2024