New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BGP Route Reflector Feature #192

Closed
ut0mt8 opened this Issue Oct 11, 2017 · 26 comments

Comments

Projects
None yet
7 participants
@ut0mt8
Contributor

ut0mt8 commented Oct 11, 2017

First a big thanks for this very promising project. This is what I expect to be the future of the kubernetes networking :)

kube-router could be enabled to peer with external router. This is fine.
But in my understanding the external router could peer with all node/kuberouter instances. For a small cluster this is ok; but for a bigger one (say 100 nodes) it s not really doable.

First problem : it doesn't scale, a full mesh of 100 nodes plus one external routeur is a big waste, and will certainly cause some convergence issue.
Second problem : for the external router point adding a node will require a reconfiguration of the bgp configuration (on a linux box with quagga/bird, this can be scripted, but on a real network equipment no).

So it will be really cool if an instance of kube-router could be specialised as a Route Reflector both for bgp scaling in the kubernetes network, and for allowng simple peering with externals routers.

Does it make sense ?

@ut0mt8 ut0mt8 changed the title from BGP RR configuration to BGP Route Reflector Feature Oct 11, 2017

@bzub bzub added the enhancement label Oct 11, 2017

@murali-reddy

This comment has been minimized.

Show comment
Hide comment
@murali-reddy

murali-reddy Oct 12, 2017

Member

@ut0mt8 thanks for your comments. Adding route reflector certianly make sense. Kube-router has to support route reflector at some time as the adoption of kube-router increase and there is need from user for large cluster deployment. I will see how much of effort it takes and priortize this feature.

Member

murali-reddy commented Oct 12, 2017

@ut0mt8 thanks for your comments. Adding route reflector certianly make sense. Kube-router has to support route reflector at some time as the adoption of kube-router increase and there is need from user for large cluster deployment. I will see how much of effort it takes and priortize this feature.

@bzub bzub self-assigned this Oct 15, 2017

@ut0mt8

This comment has been minimized.

Show comment
Hide comment
@ut0mt8

ut0mt8 Nov 17, 2017

Contributor

I wait for this :) Currently I think I can use two RRs with two external bgp peers, disabling meshing.
The only problem is that adding a new node need a manual configuration of the RRs. (I surely can automate this watching api server)

Contributor

ut0mt8 commented Nov 17, 2017

I wait for this :) Currently I think I can use two RRs with two external bgp peers, disabling meshing.
The only problem is that adding a new node need a manual configuration of the RRs. (I surely can automate this watching api server)

@murali-reddy

This comment has been minimized.

Show comment
Hide comment
@murali-reddy

murali-reddy Nov 18, 2017

Member

Sorry this request did not make any progress. There are two critical issues ( #202 slow convergence and #220 routes getting deleted during rolling upgrades) that are affecting users. Once they are addressed I am hoping that kube-router's routing has a solid ground on which more advanced features like route reflector can be added.

Member

murali-reddy commented Nov 18, 2017

Sorry this request did not make any progress. There are two critical issues ( #202 slow convergence and #220 routes getting deleted during rolling upgrades) that are affecting users. Once they are addressed I am hoping that kube-router's routing has a solid ground on which more advanced features like route reflector can be added.

@ut0mt8

This comment has been minimized.

Show comment
Hide comment
@ut0mt8

ut0mt8 Nov 18, 2017

Contributor

#202 could be handled with RRs :) doing a full mesh of >50 bgp speaker is a bad idea in every context (I was network engineer in previous job).

Contributor

ut0mt8 commented Nov 18, 2017

#202 could be handled with RRs :) doing a full mesh of >50 bgp speaker is a bad idea in every context (I was network engineer in previous job).

@SEJeff

This comment has been minimized.

Show comment
Hide comment
@SEJeff

SEJeff Nov 20, 2017

Contributor

@ut0mt8 you can now do a single AS per rack model with the current kube-router, which is an alternative to using reflectors depending on your topology.

We specify which using node annotations such as:

net.kuberouter.node.bgppeer.address=10.2.3.4
net.kuberouter.node.bgppeer.asn=654321
net.kuberouter.nodeasn=654322

Your kube-router daemonset can have args such as:

    spec:
      containers:
      - args:
        - --run-router=true
        - --run-firewall=false
        - --run-service-proxy=true
        - --cluster-cidr=10.2.3.0/18
        - --nodes-full-mesh=false
        - --enable-overlay=false
        - --kubeconfig=/etc/kubernetes/kubeconfig
Contributor

SEJeff commented Nov 20, 2017

@ut0mt8 you can now do a single AS per rack model with the current kube-router, which is an alternative to using reflectors depending on your topology.

We specify which using node annotations such as:

net.kuberouter.node.bgppeer.address=10.2.3.4
net.kuberouter.node.bgppeer.asn=654321
net.kuberouter.nodeasn=654322

Your kube-router daemonset can have args such as:

    spec:
      containers:
      - args:
        - --run-router=true
        - --run-firewall=false
        - --run-service-proxy=true
        - --cluster-cidr=10.2.3.0/18
        - --nodes-full-mesh=false
        - --enable-overlay=false
        - --kubeconfig=/etc/kubernetes/kubeconfig
@cjbrigato

This comment has been minimized.

Show comment
Hide comment
@cjbrigato

cjbrigato Nov 30, 2017

Contributor

@SEJeff, just wanted to warn that's not a private 16bit ASN you've putted here, making it a public 32bit one ;)

Adding to the issue, I tryied with such configuration but ran into nothing working at all :

[root@prdinfk8s501-adm ~]# kubectl describe node prdinfk8s501.dedale.tf1.fr|grep nodeasn
Annotations:        net.kuberouter.nodeasn=65432
[root@prdinfk8s501-adm ~]# kubectl  logs kube-router-4b4bd -n kube-system|tail -n1
E1130 16:51:10.462062       1 network_routes_controller.go:172] Failed to start node BGP server: Could not find ASN number for the node. Node needs to be annotated with ASN number details to start BGP server.

Any insight ?

Contributor

cjbrigato commented Nov 30, 2017

@SEJeff, just wanted to warn that's not a private 16bit ASN you've putted here, making it a public 32bit one ;)

Adding to the issue, I tryied with such configuration but ran into nothing working at all :

[root@prdinfk8s501-adm ~]# kubectl describe node prdinfk8s501.dedale.tf1.fr|grep nodeasn
Annotations:        net.kuberouter.nodeasn=65432
[root@prdinfk8s501-adm ~]# kubectl  logs kube-router-4b4bd -n kube-system|tail -n1
E1130 16:51:10.462062       1 network_routes_controller.go:172] Failed to start node BGP server: Could not find ASN number for the node. Node needs to be annotated with ASN number details to start BGP server.

Any insight ?

@SEJeff

This comment has been minimized.

Show comment
Hide comment
@SEJeff

SEJeff Nov 30, 2017

Contributor

Thanks @cjbrigato. Those are just some made up bogus numbers and ip addresses I put in there as an example.

Contributor

SEJeff commented Nov 30, 2017

Thanks @cjbrigato. Those are just some made up bogus numbers and ip addresses I put in there as an example.

@SEJeff

This comment has been minimized.

Show comment
Hide comment
@SEJeff

SEJeff Nov 30, 2017

Contributor

@jdconti might have some pointers for you.

What other annotations do you have on this node for kube-router? What topology are you using?

Contributor

SEJeff commented Nov 30, 2017

@jdconti might have some pointers for you.

What other annotations do you have on this node for kube-router? What topology are you using?

@murali-reddy

This comment has been minimized.

Show comment
Hide comment
Member

murali-reddy commented Nov 30, 2017

@cjbrigato

This comment has been minimized.

Show comment
Hide comment
@cjbrigato

cjbrigato Nov 30, 2017

Contributor

Haaa k8s ecosystem is Always going quite fast. I Always try to monitor everything I Can, even little PRs and issues but definitely didn't see this one. Thanks a lot !

@murali-reddy I spotted an interesting one. Please tell me if It should be redirected to another issue or a new one since I'm getting quite "off-topic" here.

Consecutive to the streamlining of annotation, Here is a buggy and I guess unintended behavior :

time="2017-11-30T19:59:18Z" level=warning msg="sent notification" Data="as number mismatch expected 64512, received 65432" Key=10.247.33.255 Topic=Peer

It happens when you have both the new annotations AND the old ones, with mismatching ASNs.
Shouldn't the old annotations have no more involvement in Kube-router behavior ?

Annotations:        kube-router.io/node.asn=65432
                    net.kuberouter.nodeasn=64512
Contributor

cjbrigato commented Nov 30, 2017

Haaa k8s ecosystem is Always going quite fast. I Always try to monitor everything I Can, even little PRs and issues but definitely didn't see this one. Thanks a lot !

@murali-reddy I spotted an interesting one. Please tell me if It should be redirected to another issue or a new one since I'm getting quite "off-topic" here.

Consecutive to the streamlining of annotation, Here is a buggy and I guess unintended behavior :

time="2017-11-30T19:59:18Z" level=warning msg="sent notification" Data="as number mismatch expected 64512, received 65432" Key=10.247.33.255 Topic=Peer

It happens when you have both the new annotations AND the old ones, with mismatching ASNs.
Shouldn't the old annotations have no more involvement in Kube-router behavior ?

Annotations:        kube-router.io/node.asn=65432
                    net.kuberouter.nodeasn=64512
@murali-reddy

This comment has been minimized.

Show comment
Hide comment
@murali-reddy

murali-reddy Nov 30, 2017

Member

@cjbrigato yes net.kuberouter.nodeasn shoud not have any involvement. Try deleting the kube-router pods, so on the new pods GoBGP starts from scratch with correct ASN.

If you are still running into issue, please open a different issue.

Member

murali-reddy commented Nov 30, 2017

@cjbrigato yes net.kuberouter.nodeasn shoud not have any involvement. Try deleting the kube-router pods, so on the new pods GoBGP starts from scratch with correct ASN.

If you are still running into issue, please open a different issue.

@cjbrigato

This comment has been minimized.

Show comment
Hide comment
@cjbrigato

cjbrigato Nov 30, 2017

Contributor

You're right. That was relicates and not annotation involvment at all. Everything is working as expected now, thanks again and I'm off with off-topic'ing this issue.

Contributor

cjbrigato commented Nov 30, 2017

You're right. That was relicates and not annotation involvment at all. Everything is working as expected now, thanks again and I'm off with off-topic'ing this issue.

@h3po

This comment has been minimized.

Show comment
Hide comment
@h3po

h3po Dec 7, 2017

@ut0mt8

Second problem : for the external router point adding a node will require a reconfiguration of the bgp configuration (on a linux box with quagga/bird, this can be scripted, but on a real network equipment no).

Actually, I picked kube-router for my first cluster because I discovered that bgpd in the quagga distribution by cumulusnetworks can be configured to peer by subnet instead of a list of IPs. So no scripting required, no hardcoded IPs anywhere.
I'm running the cumulusnetworks/quagga image, here is my jinja2 template for bgpd.conf:

hostname {{inventory_hostname}}
router bgp 64513
bgp router-id {{int_ip}}
neighbor dynamicnodes peer-group
neighbor dynamicnodes remote-as 64512
bgp listen range {{int_cidr}} peer-group dynamicnodes

where $int_cidr is the subnet that my dhcp is serving to nodes

h3po commented Dec 7, 2017

@ut0mt8

Second problem : for the external router point adding a node will require a reconfiguration of the bgp configuration (on a linux box with quagga/bird, this can be scripted, but on a real network equipment no).

Actually, I picked kube-router for my first cluster because I discovered that bgpd in the quagga distribution by cumulusnetworks can be configured to peer by subnet instead of a list of IPs. So no scripting required, no hardcoded IPs anywhere.
I'm running the cumulusnetworks/quagga image, here is my jinja2 template for bgpd.conf:

hostname {{inventory_hostname}}
router bgp 64513
bgp router-id {{int_ip}}
neighbor dynamicnodes peer-group
neighbor dynamicnodes remote-as 64512
bgp listen range {{int_cidr}} peer-group dynamicnodes

where $int_cidr is the subnet that my dhcp is serving to nodes

@ut0mt8

This comment has been minimized.

Show comment
Hide comment
@ut0mt8

ut0mt8 Dec 9, 2017

Contributor

Interesting. This remind me the possibility of bgp unumbered.
But this is not applying in my case, as by nature RR is 1 to N operation.
Anyway I will look at the kuberouter code to look if it was difficult to implement,
On the paper it is only a matter of gobgp automation, which kuberouter already do in most part.

Contributor

ut0mt8 commented Dec 9, 2017

Interesting. This remind me the possibility of bgp unumbered.
But this is not applying in my case, as by nature RR is 1 to N operation.
Anyway I will look at the kuberouter code to look if it was difficult to implement,
On the paper it is only a matter of gobgp automation, which kuberouter already do in most part.

@Matty9191

This comment has been minimized.

Show comment
Hide comment
@Matty9191

Matty9191 Feb 21, 2018

+1 to adding a highly available set of route reflectors.

Matty9191 commented Feb 21, 2018

+1 to adding a highly available set of route reflectors.

@ut0mt8

This comment has been minimized.

Show comment
Hide comment
@ut0mt8

ut0mt8 Feb 21, 2018

Contributor

Well I ve began to wrote something, but I ve figured out that was not as easy as I think at start.

Contributor

ut0mt8 commented Feb 21, 2018

Well I ve began to wrote something, but I ve figured out that was not as easy as I think at start.

@Matty9191

This comment has been minimized.

Show comment
Hide comment
@Matty9191

Matty9191 Feb 21, 2018

Once it is complete it will be an incredible feature. The comment @ut0mt8 made having to define all your neighbors upstream is a deal killer for us. In the interim I am going to look into @SEJeff single AS per rack solution.

Matty9191 commented Feb 21, 2018

Once it is complete it will be an incredible feature. The comment @ut0mt8 made having to define all your neighbors upstream is a deal killer for us. In the interim I am going to look into @SEJeff single AS per rack solution.

@ut0mt8

This comment has been minimized.

Show comment
Hide comment
@ut0mt8

ut0mt8 Feb 21, 2018

Contributor

When I am thinking of the implementation there a couple a question/issue I found and have no definitive answer.

First, what node/worker/running kube-router instance I should choose to be RR server ? , and by what way ? a flag ? or more elegant an annotation ? and I suppose choosing kube-router running on controller make sense ? anyway it could be configurable...

Second, and related, how other node/instance have conscience to be RR client, and how they found their RR server ? read annotation ?

I was thinking of annotate RR server with kube-router.io/rr.server and RR clients with kube-router.io/rr.client.
It should work like this : each kuberouter RR server should peer with other RR server (without reflection) and peer with all RR client (with reflection enabled).
On the opposite kuberoute RR client should only peer with RR server.

One thing I found annoying is that kuberouter is not really made to work outside of a pod (honestly I hate this, why launching control plane component of a k8s cluster is not possible the good old way) ?

Contributor

ut0mt8 commented Feb 21, 2018

When I am thinking of the implementation there a couple a question/issue I found and have no definitive answer.

First, what node/worker/running kube-router instance I should choose to be RR server ? , and by what way ? a flag ? or more elegant an annotation ? and I suppose choosing kube-router running on controller make sense ? anyway it could be configurable...

Second, and related, how other node/instance have conscience to be RR client, and how they found their RR server ? read annotation ?

I was thinking of annotate RR server with kube-router.io/rr.server and RR clients with kube-router.io/rr.client.
It should work like this : each kuberouter RR server should peer with other RR server (without reflection) and peer with all RR client (with reflection enabled).
On the opposite kuberoute RR client should only peer with RR server.

One thing I found annoying is that kuberouter is not really made to work outside of a pod (honestly I hate this, why launching control plane component of a k8s cluster is not possible the good old way) ?

@ut0mt8

This comment has been minimized.

Show comment
Hide comment
@ut0mt8

ut0mt8 Feb 21, 2018

Contributor

Pseudo code include (I think I'm not too far)

var rr-client int = false
var rr-server int = false
var clusterId = 0

// find Annotations
for _, node := range nodes.Items {
  nodeIP, _ := utils.GetNodeIP(&node)
  if nodeIP.String() == nrc.nodeIP.String() {
    if clusterId, ok = node.ObjectMeta.Annotations["kube-router.io/rr-server"]; ok {
      rr-server = true
    }
    if clusterId, ok = node.ObjectMeta.Annotations["kube-router.io/rr-client"]; ok {
      rr-client = true
    }
  }
}
for _, node := range nodes.Items {

  ...
    
  // we are rr-client peer only with rr-server
  if rr-client {
    _, ok := node.ObjectMeta.Annotations["kube-router.io/rr-server"];
    if !ok {
      continue
    }
  }

...

    // we are rr-server peer with other rr-client with reflection enabled
    if rr-server {
      if _, ok := node.ObjectMeta.Annotations["kube-router.io/rr-client"]; ok {
        //add rr options with clusterId
        n.RouteReflector = config.RouteReflector{
                Config: config.RouteReflectorConfig{
                    RouteReflectorClient:    true,
                    RouteReflectorClusterId: config.RrClusterIdType(clusterId),
                },
                State: config.RouteReflectorState{
                    RouteReflectorClient:    true,
                    RouteReflectorClusterId: config.RrClusterIdType(clusterId),
                },
            }
      }
    }
}
Contributor

ut0mt8 commented Feb 21, 2018

Pseudo code include (I think I'm not too far)

var rr-client int = false
var rr-server int = false
var clusterId = 0

// find Annotations
for _, node := range nodes.Items {
  nodeIP, _ := utils.GetNodeIP(&node)
  if nodeIP.String() == nrc.nodeIP.String() {
    if clusterId, ok = node.ObjectMeta.Annotations["kube-router.io/rr-server"]; ok {
      rr-server = true
    }
    if clusterId, ok = node.ObjectMeta.Annotations["kube-router.io/rr-client"]; ok {
      rr-client = true
    }
  }
}
for _, node := range nodes.Items {

  ...
    
  // we are rr-client peer only with rr-server
  if rr-client {
    _, ok := node.ObjectMeta.Annotations["kube-router.io/rr-server"];
    if !ok {
      continue
    }
  }

...

    // we are rr-server peer with other rr-client with reflection enabled
    if rr-server {
      if _, ok := node.ObjectMeta.Annotations["kube-router.io/rr-client"]; ok {
        //add rr options with clusterId
        n.RouteReflector = config.RouteReflector{
                Config: config.RouteReflectorConfig{
                    RouteReflectorClient:    true,
                    RouteReflectorClusterId: config.RrClusterIdType(clusterId),
                },
                State: config.RouteReflectorState{
                    RouteReflectorClient:    true,
                    RouteReflectorClusterId: config.RrClusterIdType(clusterId),
                },
            }
      }
    }
}
@ut0mt8

This comment has been minimized.

Show comment
Hide comment
@ut0mt8

ut0mt8 Feb 25, 2018

Contributor

I'm almost done ; I just have to deal with network policies.

Contributor

ut0mt8 commented Feb 25, 2018

I'm almost done ; I just have to deal with network policies.

@murali-reddy

This comment has been minimized.

Show comment
Hide comment
@murali-reddy

murali-reddy Feb 26, 2018

Member

@ut0mt8 Awesome!! let me know when you are ready, with private image. I am happy to test.

Member

murali-reddy commented Feb 26, 2018

@ut0mt8 Awesome!! let me know when you are ready, with private image. I am happy to test.

@ut0mt8

This comment has been minimized.

Show comment
Hide comment
@ut0mt8

ut0mt8 Feb 26, 2018

Contributor

@murali-reddy : I think I'm good.

We can test with the ut0mt8/kube-router:rr4 image.
The code is here
https://github.com/ut0mt8/kube-router/blob/feature_rr/app/controllers/network_routes_controller.go

My first test are positive. Any comment or feedback are welcome.

Contributor

ut0mt8 commented Feb 26, 2018

@murali-reddy : I think I'm good.

We can test with the ut0mt8/kube-router:rr4 image.
The code is here
https://github.com/ut0mt8/kube-router/blob/feature_rr/app/controllers/network_routes_controller.go

My first test are positive. Any comment or feedback are welcome.

@ut0mt8

This comment has been minimized.

Show comment
Hide comment
@ut0mt8

ut0mt8 Feb 26, 2018

Contributor

Test with dual rr for redundancy look also good. I encourage all that are interested to test and review my patch.

Ah and a little howto :)

so it work with annotations

kube-router.io/rr.client=NUMBER
kube-router.io/rr-server=NUMBER

Obviously the number should be the same for adjacency to form.

Note : for an optimal setup we should review default internal bgp timer which are way too high.
Note2: I think we should also review all options, like ibgp-full-mesh or per-rack-asn, I don't know if all work well together. It certainly can make crazy setup however.

Contributor

ut0mt8 commented Feb 26, 2018

Test with dual rr for redundancy look also good. I encourage all that are interested to test and review my patch.

Ah and a little howto :)

so it work with annotations

kube-router.io/rr.client=NUMBER
kube-router.io/rr-server=NUMBER

Obviously the number should be the same for adjacency to form.

Note : for an optimal setup we should review default internal bgp timer which are way too high.
Note2: I think we should also review all options, like ibgp-full-mesh or per-rack-asn, I don't know if all work well together. It certainly can make crazy setup however.

@murali-reddy

This comment has been minimized.

Show comment
Hide comment
@murali-reddy

murali-reddy Feb 27, 2018

Member

@ut0mt8 thanks! I will try setup and test it out. Would you mind raising a PR?

Member

murali-reddy commented Feb 27, 2018

@ut0mt8 thanks! I will try setup and test it out. Would you mind raising a PR?

@ut0mt8

This comment has been minimized.

Show comment
Hide comment
@ut0mt8

ut0mt8 Feb 27, 2018

Contributor

@murali-reddy done ! I'm open to any review/remark :)

Contributor

ut0mt8 commented Feb 27, 2018

@murali-reddy done ! I'm open to any review/remark :)

This was referenced Feb 28, 2018

@murali-reddy

This comment has been minimized.

Show comment
Hide comment
@murali-reddy

murali-reddy Mar 1, 2018

Member

Fixed as part of #325.

If there are any possible improvements we will open a seperate issue.

Member

murali-reddy commented Mar 1, 2018

Fixed as part of #325.

If there are any possible improvements we will open a seperate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment