Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Routing issue in IPv6-Only cluster #1663

Closed
YannikSc opened this issue May 6, 2024 · 6 comments
Closed

Routing issue in IPv6-Only cluster #1663

YannikSc opened this issue May 6, 2024 · 6 comments
Labels

Comments

@YannikSc
Copy link

YannikSc commented May 6, 2024

What happened?

Packets bounce around between nodes and the router

What did you expect to happen?

The packets arrive at the desired LoadBalancer.

Also a strange thing, in my novice eyes, is that every node announces all loadbalancer IPs, even though pods behind the loadbalancer are not scheduled on the respective node (example: the control node also announces the route for the echo server loadbalancer).

How can we reproduce the behavior you experienced?

Steps to reproduce the behavior:

  1. Create a IPv6-only cluster using kubeadm following the provided documentation
  2. Follow the documentation to set up kube-router
    • IPv6-Only will require to set --enable-ipv4=false and --enable-ipv6=true
  3. Setup BGP
    • Have a router which can do BGP and ECMP (I use bird for it)
    • Update the config again with --peer-router-ips=[ROUTER_IP] and --peer-router-asns=[ASNS] etc.
    • Add a --loadbalancer-ip-range=[YOUR_RANGE] and --advertise-loadbalancer-ip=true
  4. Assuming, that BGP sessions come up and routes arrive at the router, set up some application like the echo server example
    • Change the Service.type to LoadBalancer
  5. Routing to the loadbalancer will result in a loop
    • an mtr -T -P 80 [LB_IP] will show the loop

Screenshots / Architecture Diagrams / Network Topologies

               ┌────────┐                    
     ┌────────┤  Router ├───────────┐        
     │         └────┬───┘           │        
     │              │                │        
     │              │                │        
     │              │                │        
     │              │                │        
     │              │                │        
┌────┴────┐   ┌────┴────┐  ┌───────┴───────┐
│ Node 01  │   │ Node 02 │   │ Control Plane  │
└─────────┘   └─────────┘  └───────────────┘

System Information (please complete the following information):

  • Kube-Router Version (kube-router --version): v2.1.1

  • Kube-Router Parameters:

    • --run-router=true
    • --run-firewall=true
    • --run-service-proxy=false
    • --bgp-graceful-restart=true
    • --enable-ipv4=false
    • --enable-ipv6=true
    • --enable-overlay=false
    • --nodes-full-mesh=false
    • --advertise-external-ip=true
    • --advertise-cluster-ip=true
    • --router-id=generate
    • --kubeconfig=/var/lib/kube-router/kubeconfig
    • --service-cluster-ip-range=2001:470:504f:f200::/112
    • --peer-router-ips=2001:470:504f:103::1
    • --peer-router-asns=65000
    • --advertise-loadbalancer-ip=true
    • --run-loadbalancer=true
    • --loadbalancer-ip-range=2001:470:504f:f300::/64
  • Kubernetes Version (kubectl version) : v1.29.4 (Client & Server)

  • Cloud Type: on premise

  • Kubernetes Deployment Type: kubeadm

  • Kube-Router Deployment Type: DaemonSet

  • Cluster Size: 1xControl, 2xNode

Logs, other output, metrics

Router routing table ip -6 r show proto bird

2001:470:504f:f100::/64 via 2001:470:504f:103::5 dev eth0.103 metric 32 pref medium
2001:470:504f:f101::/64 via 2001:470:504f:103::6 dev eth0.103 metric 32 pref medium
2001:470:504f:f102::/64 via 2001:470:504f:103::7 dev eth0.103 metric 32 pref medium
2001:470:504f:f200::1 metric 32 pref medium
        nexthop via 2001:470:504f:103::5 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::6 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::7 dev eth0.103 weight 1 
2001:470:504f:f200::a metric 32 pref medium
        nexthop via 2001:470:504f:103::5 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::6 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::7 dev eth0.103 weight 1 
2001:470:504f:f200::f0e metric 32 pref medium
        nexthop via 2001:470:504f:103::5 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::6 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::7 dev eth0.103 weight 1 
2001:470:504f:f200::78b5 metric 32 pref medium
        nexthop via 2001:470:504f:103::5 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::6 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::7 dev eth0.103 weight 1 
2001:470:504f:f200::9964 metric 32 pref medium
        nexthop via 2001:470:504f:103::5 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::6 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::7 dev eth0.103 weight 1 
2001:470:504f:f200::a917 metric 32 pref medium
        nexthop via 2001:470:504f:103::5 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::6 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::7 dev eth0.103 weight 1 
2001:470:504f:f300:: metric 32 pref medium
        nexthop via 2001:470:504f:103::5 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::6 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::7 dev eth0.103 weight 1 
2001:470:504f:f300::1 metric 32 pref medium
        nexthop via 2001:470:504f:103::5 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::6 dev eth0.103 weight 1 
        nexthop via 2001:470:504f:103::7 dev eth0.103 weight 1 

Routing table on nodes (they all look the same except for the pod network on the kube-bridge)

yannik@kubernetes-02 ~ % ip -6 r
2001:470:504f:103::/64 dev eth0 proto ra metric 1002 pref high
2001:470:504f:f102::/64 dev kube-bridge proto kernel metric 256 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
fe80::/64 dev kube-bridge proto kernel metric 256 pref medium
fe80::/64 dev veth53f3d5d4 proto kernel metric 256 pref medium
fe80::/64 dev veth165b0ac5 proto kernel metric 256 pref medium
fe80::/64 dev veth1d668e87 proto kernel metric 256 pref medium
default via fe80::b62e:99ff:fea9:599c dev eth0 proto ra metric 1002 pref high

To show off the routing cycle I issued an mtr --report command to the echo LoadBalancer

% mtr -T -P 443 2001:470:504f:f300::1 -m 10 -c 1 --report
Start: 2024-05-06T22:23:18+0200
HOST: hypervisor                  Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 2001:470:504f:101::1       0.0%     1    0.2   0.2   0.2   0.2   0.0
  2.|-- 2001:470:504f:103::7       0.0%     1    0.4   0.4   0.4   0.4   0.0
  3.|-- 2001:470:504f:103::1       0.0%     1    0.4   0.4   0.4   0.4   0.0
  4.|-- 2001:470:504f:103::6       0.0%     1    0.7   0.7   0.7   0.7   0.0
  5.|-- 2001:470:504f:103::1       0.0%     1    0.5   0.5   0.5   0.5   0.0
  6.|-- 2001:470:504f:103::7       0.0%     1    0.7   0.7   0.7   0.7   0.0
  7.|-- 2001:470:504f:103::1       0.0%     1    0.7   0.7   0.7   0.7   0.0
  8.|-- 2001:470:504f:103::5       0.0%     1    0.9   0.9   0.9   0.9   0.0
  9.|-- 2001:470:504f:103::1       0.0%     1    0.9   0.9   0.9   0.9   0.0
 10.|-- 2001:470:504f:103::6       0.0%     1    1.0   1.0   1.0   1.0   0.0
@YannikSc YannikSc added the bug label May 6, 2024
@aauren
Copy link
Collaborator

aauren commented May 7, 2024

Unfortunately, I'm not able to reproduce this result on any of my clusters. I have tested with a cluster peering with a Juniper router that is controlling the next-hop routing. I have also tested with a cluster that is using FRR to peer with kube-router and controlling the Linux ip routing table much the way that bird is doing for you. Neither of them exhibit this issue.

If I had to guess, I would say that it is most likely an artifact of how bird is configured? If you can somehow dig into it more to provide more information, update this issue and I'll take a look again.

@aauren
Copy link
Collaborator

aauren commented May 7, 2024

Routing table on FRR node:

# ip -6 r show 
::1 dev lo proto kernel metric 256 pref medium
2001:db8:42:1000::/64 nhid 16 via 2600:1f18:477d:6900:e20c:7eee:81e1:9014 dev ens5 proto bgp metric 20 onlink pref medium
2001:db8:42:1001::/64 nhid 18 via 2600:1f18:477d:6900:e5e9:d1cd:2e60:6d11 dev ens5 proto bgp metric 20 onlink pref medium
2001:db8:42:1200:: nhid 32 proto bgp metric 20 pref medium
        nexthop via 2600:1f18:477d:6900:e20c:7eee:81e1:9014 dev ens5 weight 1 onlink 
        nexthop via 2600:1f18:477d:6900:e5e9:d1cd:2e60:6d11 dev ens5 weight 1 onlink 
2600:1f18:477d:6900::/64 dev ens5 proto ra metric 100 pref medium
fe80::/64 dev ens5 proto kernel metric 256 pref medium
default via fe80::8ff:f7ff:fe8a:e375 dev ens5 proto ra metric 100 expires 1798sec pref medium

mtr results:

# mtr -T -P 5000 2001:db8:42:1200:: -m 10 -c 1 --report
Start: 2024-05-07T15:30:13+0000
HOST: aws-bgp                     Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 2001:db8:42:1200::         0.0%     1    0.4   0.4   0.4   0.4   0.0

Relevant FRR details:

# vtysh -c "show bgp all"                              
...
   Network          Next Hop            Metric LocPrf Weight Path
*>i2001:db8:42:1000::/64
                    2600:1f18:477d:6900:e20c:7eee:81e1:9014
                                            10    100      0 i
*>i2001:db8:42:1001::/64
                    2600:1f18:477d:6900:e5e9:d1cd:2e60:6d11
                                            10    100      0 i
*>i2001:db8:42:1200::/128
                    2600:1f18:477d:6900:e20c:7eee:81e1:9014
                                            10    100      0 i
*=i                 2600:1f18:477d:6900:e5e9:d1cd:2e60:6d11
                                            10    100      0 i

Displayed  3 routes and 4 total paths

@YannikSc
Copy link
Author

YannikSc commented May 7, 2024

Hey, thanks for your effort. I just tried to use Quagga BGP instead of Bird to rule this issue out, but I was just unable to get a session established between kube-router and Quagga, so I rather want to stay with Bird.

I don't really know where I can investigate further.
In my eyes everyone is sending packets to the router (as expected), the router routes the packets correctly to the nodes and the nodes don't know where to go with them so they throw them down the default route, back to the router.
So in my eyes the issue is simply, that the nodes do not have routes set up to handle the LoadBalancer/Service traffic. This is, at least in my head, completely unrelated to the BGP.

So maybe you can give me a hint in which direction I have to look now

@aauren
Copy link
Collaborator

aauren commented May 7, 2024

I noticed that you're running --service-proxy=false it is normally this functionality in kube-router that is responsible for routing packets for services. I would assume that if you're running that way then you have something like kube-proxy in the loop that is handling routing service traffic? Maybe something is wrong with kube-proxy or whatever is handling service proxy for you?

@YannikSc
Copy link
Author

YannikSc commented May 7, 2024

Ohh no. I used the non-Service Proxy config in combination with removing the kube-proxy.
I enabled the service-proxy and while I still had issues out-of-the-box with the firewall blocking everything.
After adding a rule to accept all incoming traffic from the server's internal IP, everything seems to work.

So, I'm sorry for making such trouble and big thanks for your help!

@YannikSc YannikSc closed this as completed May 7, 2024
@aauren
Copy link
Collaborator

aauren commented May 7, 2024

No worries. BTW in regards to your other mention of how workers without the workload were advertising VIPs I would recommend that you look into Kubernetes Traffic Policies: https://kubernetes.io/docs/reference/networking/virtual-ips/#traffic-policies

There is also a service.local annotation that kube-router provides for controlling this as well (https://www.kube-router.io/docs/user-guide/#controlling-service-locality-traffic-policies), but I think that its better and more portable to use the upstream service traffic policy definition.

However, either mechanism will allow you to control how kube-router advertises BGP VIPs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants