Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BGP support to the Antrea Agent #5948

Open
antoninbas opened this issue Jan 31, 2024 · 19 comments
Open

Add BGP support to the Antrea Agent #5948

antoninbas opened this issue Jan 31, 2024 · 19 comments
Assignees
Labels
area/transit/bgp Issues or PRs related to BGP support. area/transit/routing Issues or PRs related to routing. kind/feature Categorizes issue or PR as related to a new feature.

Comments

@antoninbas
Copy link
Contributor

antoninbas commented Jan 31, 2024

Describe the problem/challenge you have
Over the years we have had a few requests to add BGP speaker capabilities to the Antrea Agent. The purpose of this issue is to collect the use cases that we would like to cover with this capability.

Note that while it is possible to meet some of these use cases by deploying kube-router in "BGP mode" alongside Antrea, having this capability available OOTB means potentially a better integration with Antrea features, and doesn't require users to deploy yet another DaemonSet in their cluster.

I believe that there are 3 main use cases for BGP in K8s with Antrea:

  1. Intra-cluster Pod-to-Pod routing using iBGP (full-mesh or with route reflectors for scaling)
  2. Peering with BGP peers outside of the cluster to advertise Pod IPs / Egress IPs / Service IPs
  3. Multi-cluster routing, in particular across gateways

1 & 3 are not very interesting IMO, because they just provide alternative implementations to what we already support, and there is no clear benefit. However, we can add some value with 2 for on-prem users who want to make K8s endpoints routable by their BGP fabric.

As a side note, Calico and kube-router support both 1 & 2, while Cilium has added support for 2.

Describe the solution you'd like
I believe that our support should focus on use case 2:

Peering with BGP peers outside of the cluster to advertise Pod IPs / Egress IPs / Service IPs

Each Antrea Agent should run a BGP speaker and advertise local IPs to a list of configured BGP peers. The AS number (ASN) for the Antrea Agent should be configurable (all Agents may use the same local ASN or not). The list of advertised local IPs should be configurable from this list:

  • local Pod IPs / local Pod CIDR
  • local (i.e., currently assigned to the Node by the Egress feature) Egress IPs - with this capability, it will be possible for routes to be automatically configured in the physical network for "return" Egress traffic
  • local (i.e., currently assigned to the Node by the ServiceExternalIP feature) LoadBalancer Service IPs - on-prem users with a BGP fabric will be able to easily expose K8s Services to the rest of their network. At the moment, the ServiceExternalIP feature requires LoadBalancer IPs to be allocated from the Node network (or requires adding static routes to the physical network).

A note on Egress IP advertisement:

  • Egress IPs not limited to the Node network
  • Inter-Node Egress traffic may need to go through BGP router
  • Similar to the EgressSeparateSubnet feature but L3 / BGP approach vs L2 approach?

Anything else you would like to add?
While the exact API is yet to be decided, BGP peering should ideally be configurable using CRD(s).

cc @jianjuns @tnqn

@antoninbas antoninbas added kind/feature Categorizes issue or PR as related to a new feature. area/transit/routing Issues or PRs related to routing. labels Jan 31, 2024
@jianjuns
Copy link
Contributor

I feel no need to restrict Pod IP advertisement to noEncap only. We can support encap mode too?

For LoadBalancer IP we should be able to enable ECMP too.

@antoninbas
Copy link
Contributor Author

I feel no need to restrict Pod IP advertisement to noEncap only. We can support encap mode too?

I edited the issue to remove the reference to noEncap. It was left over from a previous draft I was working on.

For LoadBalancer IP we should be able to enable ECMP too.

Yes that's a good point. I guess in this case all cluster Nodes advertise the LoadBalancer IP (or at least all cluster Nodes running at least one backend Pod for the Service) with the same "cost", with kube-proxy / AntreaProxy being responsible for the last traffic hop. This would be quite different from how ServiceExternalIP works today, in L2 mode.

@vrabbi
Copy link
Contributor

vrabbi commented Jan 31, 2024

Use case 2 would be extremely beneficial for some use cases we have. We are really interestes in this for pod ips/cidr and also for egress. We dont use antrea for LB but that would be a good option to have as well

@hongliangl hongliangl modified the milestone: Antrea v1.16 release Feb 1, 2024
@hongliangl hongliangl self-assigned this Feb 1, 2024
@ColonelBundy
Copy link

Use case 2 is very interesting for our setup, however that is limited to service ips. Could there be a feature switch/selector to enable/disable what and when you should advertise? For instance I may want to advertise some service ips for some namespaces or all but no pod ip's etc?

@antoninbas
Copy link
Contributor Author

@ColonelBundy I definitely wanted to have the ability to disable advertising Pod CIDRs.
Having selectors to provide more granularity for Service IP advertisement also sounds good, we will keep that in mind.

@rajnkamr
Copy link
Contributor

rajnkamr commented Feb 8, 2024

Pod CIDRs can be allocated from multiple non overlapping IPPools (IPAM noencap) , it is evident that only when Pod CIDR is allocated from IPPool, routes should be advertised otherwise it might not be required to be advertised, We might want to include multiple IPPool support for BGP.

@hongliangl
Copy link
Contributor

hongliangl commented Feb 8, 2024

@ColonelBundy Question: I understand that you want to advertise the Pod CIDR or Service IP to another AS. Do you want the routes advertised from another AS to be distributed and installed on K8s Nodes?

@ColonelBundy
Copy link

@ColonelBundy Question: I understand that you want to advertise the Pod CIDR or Service IP to another AS. Do you want the routes advertised from another AS to be distributed and installed on K8s Nodes?

For our use case we only want to advertise service ips. To put it simply, we're looking to not to have to use metallb for l3 external ips.

Having the option to select which ippool to advertise and to which peer would be a killer feature.
Hope that clarifies.

@hongliangl
Copy link
Contributor

hongliangl commented Feb 8, 2024

@ColonelBundy Question: I understand that you want to advertise the Pod CIDR or Service IP to another AS. Do you want the routes advertised from another AS to be distributed and installed on K8s Nodes?

For our use case we only want to advertise service ips. To put it simply, we're looking to not to have to use metallb for l3 external ips.

Having the option to select which ippool to advertise and to which peer would be a killer feature. Hope that clarifies.

Got that. If so, I think that a client in another AS should be reachable via the default route of your cluster Nodes, so that the reply packets from the connection, which is originated from another AS and destined to a Service in the cluster, can be forwarded back where it is originated. Is that your setup? @ColonelBundy

@ColonelBundy
Copy link

@ColonelBundy Question: I understand that you want to advertise the Pod CIDR or Service IP to another AS. Do you want the routes advertised from another AS to be distributed and installed on K8s Nodes?

For our use case we only want to advertise service ips. To put it simply, we're looking to not to have to use metallb for l3 external ips.
Having the option to select which ippool to advertise and to which peer would be a killer feature. Hope that clarifies.

Got that. If so, I think that a client in another AS should be reachable via the default route of your cluster Nodes, so that the reply packets from the connection, which is originated from another AS and destined to a Service in the cluster, can be forwarded back where it is originated. Is that your setup? @ColonelBundy

Yea that sounds good

@andreasm80
Copy link

It would be a very powerful feature if we could solve use case 2 with something out of the box, as it would add great flexibility when using Egress and ServiceExternalIP. As mentioned, there were ways to solve it by using static routes etc. But maintaining static routes is troublesome when nodes are decommissioned, new ones are provisioned, and the interfaces are moved between nodes. Using BGP would solve this and dynamically update the routes on-demand. I created a post last year using Daemonset to install and configure FRR to get past this, probably not the prettiest way, but it gave me what I wanted: https://blog.andreasm.io/2023/02/20/antrea-egress/.

@antoninbas
Copy link
Contributor Author

@andreasm80 that's a nice blogpost, is it ok if I link to it from the https://antrea.io website?

@andreasm80
Copy link

Thanks @antoninbas. Yes, that is ok by me.

@hongliangl
Copy link
Contributor

hongliangl commented Feb 28, 2024

Use case 2 is very interesting for our setup, however that is limited to service ips. Could there be a feature switch/selector to enable/disable what and when you should advertise? For instance I may want to advertise some service ips for some namespaces or all but no pod ip's etc?

Use case 2 is very interesting for our setup, however that is limited to service ips. Could there be a feature switch/selector to enable/disable what and when you should advertise? For instance I may want to advertise some service ips for some namespaces or all but no pod ip's etc?

@ColonelBundy Hello, I saw the case that advertise some service ips for some namespaces or all but no pod ip's etc you mentioned above, and I wonder know that if you have the case like this:

  • Advertise some ClusterIPs of some Services in a Namespace to a BGP peer. The Services could be selected by a serviceSelector and namespaceSelector.
  • Advertise some LoadBalancer ingress IPs of some Services in another Namespace to another BGP peer. The Services could be selected by a serviceSelector and namespaceSelector.
  • Advertise the Pod CIDR of a Node to another BGP peer.

Thanks

@ColonelBundy
Copy link

ColonelBundy commented Feb 28, 2024

Use case 2 is very interesting for our setup, however that is limited to service ips. Could there be a feature switch/selector to enable/disable what and when you should advertise? For instance I may want to advertise some service ips for some namespaces or all but no pod ip's etc?

Use case 2 is very interesting for our setup, however that is limited to service ips. Could there be a feature switch/selector to enable/disable what and when you should advertise? For instance I may want to advertise some service ips for some namespaces or all but no pod ip's etc?

@ColonelBundy Hello, I saw the case that advertise some service ips for some namespaces or all but no pod ip's etc you mentioned above, and I wonder know that if you have the case like this:

  • Advertise some ClusterIPs of some Services in a Namespace to a BGP peer. The Services could be selected by a serviceSelector and namespaceSelector.
  • Advertise some LoadBalancer ingress IPs of some Services in another Namespace to another BGP peer. The Services could be selected by a serviceSelector and namespaceSelector.
  • Advertise the Pod CIDR of a Node to another BGP peer.

Thanks

Pretty spot on, except we currently have no use case to advertise pod cidrs to an external peer right now. But that may change with time. And also to clarify, a selector for which pod cidrs to advertise would also be very handy.

@hongliangl
Copy link
Contributor

And also to clarify, a selector for which pod cidrs to advertise would also be very handy.

Do you mean that using a selector to select target K8s Nodes and advertising their Pod CIDRs?

@ColonelBundy
Copy link

And also to clarify, a selector for which pod cidrs to advertise would also be very handy.

Do you mean that using a selector to select target K8s Nodes and advertising their Pod CIDRs?

More along the lines of which pods. I might wish to advertise some pods in some namespaces to some peers.
It's very granular but hopefully that can cover any potential use case.

@hongliangl
Copy link
Contributor

And also to clarify, a selector for which pod cidrs to advertise would also be very handy.

Do you mean that using a selector to select target K8s Nodes and advertising their Pod CIDRs?

More along the lines of which pods. I might wish to advertise some pods in some namespaces to some peers. It's very granular but hopefully that can cover any potential use case.

Thanks for the suggestion. We will keep that in mind. Could you tell me the reason why advertise Pods directly in some Namespaces instead of using Service IPs within those Namespaces? Has such case been employed in a production environment?

@ColonelBundy
Copy link

And also to clarify, a selector for which pod cidrs to advertise would also be very handy.

Do you mean that using a selector to select target K8s Nodes and advertising their Pod CIDRs?

More along the lines of which pods. I might wish to advertise some pods in some namespaces to some peers. It's very granular but hopefully that can cover any potential use case.

Thanks for the suggestion. We will keep that in mind. Could you tell me the reason why advertise Pods directly in some Namespaces instead of using Service IPs within those Namespaces? Has such case been employed in a production environment?

We don't have such a case at the moment. And I do agree that advertising service ips should be the priority if you ever want to advertise individual services.
But I'm thinking more along the lines that I believe it could be useful in scenarios where you cannot use ipsec/wireguard for multi cluster or simply it's not desired. And also, this allows you to probably skip the multi cluster gateway and give you granular control of which pods to be shared in the multi cluster environment.

But then again, we don't have this specific use case as of this moment, so feel free to dismiss this idea if it's not within scope.

@rajnkamr rajnkamr added the area/transit/bgp Issues or PRs related to BGP support. label Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/transit/bgp Issues or PRs related to BGP support. area/transit/routing Issues or PRs related to routing. kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

8 participants