Skip to content

Conversation

@aauren
Copy link
Collaborator

@aauren aauren commented Apr 25, 2019

@murali-reddy I finally got around to making good on an implementation for #633 based on your recommendation.

The main goal of this request is to add a BGP import policy that rejects cluster VIPs so that they don't get added to the local routing table and instead are routed via pure ECMP.

In the process of doing this, I found that there was quite a bit of logic crossover between the export policies and the import policies (all of the prefixset and neighborset stuff). So rather than duplicate the logic or make the function larger than it was already I broke addImportPolicies and addExportPolicies into their own functions. This makes it easier to read and identify them as separate functionality. They are now called from a new AddPolicies function that first sets up the common prefix and neighbor sets and then calls the import/export functions.

Everything that used to call addExportPolicies has been converted to run AddPolicies.

@aauren aauren force-pushed the add_import_policy_for_service_vips branch from 83c935e to b89a4b2 Compare April 25, 2019 23:17
@aauren aauren force-pushed the add_import_policy_for_service_vips branch from b89a4b2 to acf35bb Compare April 26, 2019 00:26
@aauren aauren force-pushed the add_import_policy_for_service_vips branch from acf35bb to cee75a7 Compare April 26, 2019 00:28
@murali-reddy
Copy link
Member

Awesome @aauren I will test it out and will revert back

@murali-reddy murali-reddy merged commit 8fe9f70 into cloudnativelabs:master May 26, 2019
@murali-reddy
Copy link
Member

LGTM

~ # gobgp policy prefix
NAME                PREFIX
clusteripprefixset  
podcidrprefixset    10.1.1.0/24 24..24
~ # gobgp policy
Name kube_router_export:
    StatementName kube_router_export_stmt0:
      Conditions:
        PrefixSet: any podcidrprefixset
        NeighborSet: any iBGPpeerset
      Actions:
         accept
Name kube_router_import:
    StatementName kube_router_import_stmt0:
      Conditions:
        PrefixSet: any clusteripprefixset
      Actions:
         reject

@ticpu
Copy link
Contributor

ticpu commented Jun 20, 2019

Since the commit 8fe9f70 "Add Import Policy for Service VIPs (#721)", BGP external IPs routes are exported on startup and then are removed after the --routes-sync-period interval which seems to break external IPs exports.

@aauren
Copy link
Collaborator Author

aauren commented Jun 24, 2019

@ticpu Can you give a little more information? While some of the export policy logic got shuffled around so that combined logic happened at a higher level, the actual logic execution for the export policy side of things should be identical and the only added thing would be the addition of the import policies.

Additionally, the function that is run on initial setup is the same function that is being called on the --routes-sync-period interval, so if they were ever exported, they should always be exported.

Can you show maybe an abbreviated version of gobgp global rib directly after startup and then again after the routes-sync-period?

@ticpu
Copy link
Contributor

ticpu commented Jun 30, 2019

@aauren here are the tests results you requested, thanks to @danboucher78 who did the tests themselves.

We can see there's a BGP route that disappear when the commit is included. It also disappears on the (Juniper) router side.

Startup params for both test

- --run-router=true
- --run-firewall=true
- --run-service-proxy=true
- --kubeconfig=/var/lib/kube-router/kubeconfig
- --master=https://10.10.37.10:6443
- --network-policy-handler=nftables
- --network-policy-default-action=allow
- --v=6
- --advertise-external-ip=true
- --cluster-asn=65520
- --peer-router-asns=65521
- --peer-router-ips=10.10.37.1
- --routes-sync-period=1m

Version 0.3.1-19-g972340e5

Before sync

$ gobgp global rib

   Network              Next Hop             AS_PATH              Age        Attrs
*> 10.10.37.100/32      10.10.37.20                               00:00:27   [{Origin: i}]
*> 10.13.0.0/24         10.10.37.10                               00:00:15   [{Origin: i} {LocalPref: 100}]
*> 10.13.2.0/24         10.10.37.21                               00:00:14   [{Origin: i} {LocalPref: 100}]
*> 10.13.3.0/24         10.10.37.22                               00:00:13   [{Origin: i} {LocalPref: 100}]
*> 10.13.5.0/24         10.10.37.20                               00:00:27   [{Origin: i}]

Sync logs

[kube-router-cqljq kube-router] I0630 01:01:16.276120       1 network_routes_controller.go:279] Performing periodic sync of service VIP routes
[kube-router-cqljq kube-router] I0630 01:01:16.276441       1 ecmp_vip.go:25] Advertising route: '10.10.37.100/32 via 10.10.37.20' to peers
[kube-router-cqljq kube-router] I0630 01:01:16.277292       1 network_routes_controller.go:283] Performing periodic sync of pod CIDR routes
[kube-router-cqljq kube-router] I0630 01:01:16.277681       1 network_routes_controller.go:346] Processing bgp route advertisement from peer
[kube-router-cqljq kube-router] I0630 01:01:16.281643       1 round_trippers.go:436] GET https://10.10.37.10:6443/api/v1/nodes/megak8snode1 200 OK in 4 milliseconds
[kube-router-cqljq kube-router] I0630 01:01:16.282362       1 network_routes_controller.go:398] Advertising route: '10.13.5.0/24 via 10.10.37.20' to peers
[kube-router-cqljq kube-router] I0630 01:01:16.282861       1 network_routes_controller.go:346] Processing bgp route advertisement from peer
[kube-router-cqljq kube-router] I0630 01:01:16.285451       1 round_trippers.go:436] GET https://10.10.37.10:6443/api/v1/nodes/megak8snode1 200 OK in 2 milliseconds
[kube-router-cqljq kube-router] I0630 01:01:16.288225       1 round_trippers.go:436] GET https://10.10.37.10:6443/api/v1/nodes 200 OK in 2 milliseconds
[kube-router-cqljq kube-router] I0630 01:01:16.288999       1 bgp_peers.go:36] Syncing BGP peers for the node took 3.12447ms

After sync

$ gobgp global rib

   Network              Next Hop             AS_PATH              Age        Attrs
*> 10.13.0.0/24         10.10.37.10                               00:00:53   [{Origin: i} {LocalPref: 100}]
*> 10.13.2.0/24         10.10.37.21                               00:00:52   [{Origin: i} {LocalPref: 100}]
*> 10.13.3.0/24         10.10.37.22                               00:00:51   [{Origin: i} {LocalPref: 100}]
*> 10.13.5.0/24         10.10.37.20                               00:00:05   [{Origin: i}]

Without commit 8fe9f70

Before sync

$ gobgp global rib

   Network              Next Hop             AS_PATH              Age        Attrs
*> 10.10.37.100/32      10.10.37.20                               00:00:12   [{Origin: i}]
*> 10.13.0.0/24         10.10.37.10                               00:00:01   [{Origin: i} {LocalPref: 100}]
*> 10.13.2.0/24         10.10.37.21                               00:00:00   [{Origin: i} {LocalPref: 100}]
*> 10.13.3.0/24         10.10.37.22                               00:00:04   [{Origin: i} {LocalPref: 100}]
*> 10.13.5.0/24         10.10.37.20                               00:00:12   [{Origin: i}]

Sync logs

[kube-router-76cth kube-router] I0630 01:06:26.761960       1 network_routes_controller.go:279] Performing periodic sync of service VIP routes
[kube-router-76cth kube-router] I0630 01:06:26.762006       1 ecmp_vip.go:25] Advertising route: '10.10.37.100/32 via 10.10.37.20' to peers
[kube-router-76cth kube-router] I0630 01:06:26.762214       1 network_routes_controller.go:346] Processing bgp route advertisement from peer
[kube-router-76cth kube-router] I0630 01:06:26.762575       1 network_routes_controller.go:283] Performing periodic sync of pod CIDR routes
[kube-router-76cth kube-router] I0630 01:06:26.764981       1 round_trippers.go:436] GET https://10.10.37.10:6443/api/v1/nodes/megak8snode1 200 OK in 2 milliseconds
[kube-router-76cth kube-router] I0630 01:06:26.765300       1 network_routes_controller.go:398] Advertising route: '10.13.5.0/24 via 10.10.37.20' to peers
[kube-router-76cth kube-router] I0630 01:06:26.765597       1 network_routes_controller.go:346] Processing bgp route advertisement from peer
[kube-router-76cth kube-router] I0630 01:06:26.768159       1 round_trippers.go:436] GET https://10.10.37.10:6443/api/v1/nodes/megak8snode1 200 OK in 2 milliseconds
[kube-router-76cth kube-router] I0630 01:06:26.771513       1 round_trippers.go:436] GET https://10.10.37.10:6443/api/v1/nodes 200 OK in 2 milliseconds
[kube-router-76cth kube-router] I0630 01:06:26.772145       1 bgp_peers.go:36] Syncing BGP peers for the node took 3.129482ms

After sync

$ gobgp global rib

   Network              Next Hop             AS_PATH              Age        Attrs
*> 10.10.37.100/32      10.10.37.20                               00:00:06   [{Origin: i}]
*> 10.13.0.0/24         10.10.37.10                               00:00:54   [{Origin: i} {LocalPref: 100}]
*> 10.13.2.0/24         10.10.37.21                               00:00:53   [{Origin: i} {LocalPref: 100}]
*> 10.13.3.0/24         10.10.37.22                               00:00:57   [{Origin: i} {LocalPref: 100}]
*> 10.13.5.0/24         10.10.37.20                               00:00:06   [{Origin: i}]

@aauren
Copy link
Collaborator Author

aauren commented Jun 30, 2019

Thanks @ticpu!

I'll go over this early next week and try to see what's happening!

@murali-reddy
Copy link
Member

@ticpu thanks for reporting.

I will test it out as well. Somehow I could not catch regression when I tested the PR

@aauren
Copy link
Collaborator Author

aauren commented Jul 2, 2019

@ticpu - I was able to reproduce what you mention. I think the reason why we didn't encounter this in our original testing was that we use a longer sync interval than you guys do in our configurations.

I believe the reason why you only encounter this problem on sync interval is because in the logic of the network_routes_controller (NRC) it processes advertisements before it processes policy. The net effect of this is, that policy isn't evaluated when kube-router first starts up (FYI @murali-reddy if this is true, we may want to revisit the ordering of the logic there in the NRC: https://github.com/cloudnativelabs/kube-router/blob/master/pkg/controllers/routing/network_routes_controller.go#L250-L303).

However, after a complete loop of the NRC, the policy is in place and ready to be enforced. On sync interval the NRC does a re-processing of advertisements and withdrawals. During this time, since the import policy now exists, GoBGP evaluates the externalIPs and clusterIPs against the import policy. They fail the import policy and the routes are not placed in the RIB. Because the routes are not placed in the RIB they are also not advertised upstream to route-reflectors or routers causing the route to be withdrawn. The logic of GoBGP in this area is explained here: https://github.com/osrg/gobgp/blob/master/docs/sources/policy.md#route-server-policy-model

My faulty assumption here was that import policy would only be applied on incoming BGP advertisements that GoBGP received from other nodes in the cluster. However, I believe that GoBGP treats both local kube-router advertisements and remote advertisements the same with regards to the enforcement of import policy.

My recommendation is that we revert this PR and enforce the import policy within the kube-router logic before we inject local routes into the kernel routing table in the NRC in this function: https://github.com/cloudnativelabs/kube-router/blob/master/pkg/controllers/routing/network_routes_controller.go#L408

I let @murali-reddy know about my findings and my recommendation in Slack. He is going to see if there are any other better options for either saving this patch or doing something different.

@aauren
Copy link
Collaborator Author

aauren commented Aug 19, 2019

@ticpu This issue appears to be fixed by @tamihiro's change on #771

ticpu pushed a commit to ticpu/kube-router that referenced this pull request Dec 10, 2019
* rename export policies to make it direction independent

* split creating neighborsets and prefixsets from applying export policy

* add bgp import policy to deny service VIPs

* add tests for addition of import policy
mk01 pushed a commit to mk01/kube-router-ipv6 that referenced this pull request Mar 12, 2020
* rename export policies to make it direction independent

* split creating neighborsets and prefixsets from applying export policy

* add bgp import policy to deny service VIPs

* add tests for addition of import policy

(cherry picked from commit 8fe9f70)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants