Specs to address an issue with RD Collision

In large scale Contrail deployments, Compute IP addresses are often reused across data centers. This poses a RD collision problem when these data centers are inter-connected through a hierarchical BGP topology. This spec describes an implementation plan to address this problem. The proposal will still generate a Type 1 RD. Partial-Bug: #1679198 Change-Id: I23b381b0f09bb9b7a3493dff6fd7f9dd3f5c7edb
Juniper · Jul 11, 2017 · 052f9b2 · 052f9b2
1 parent cff205b
commit 052f9b2
Show file tree

Hide file tree

Showing 4 changed files with 199 additions and 0 deletions.
diff --git a/specs/images/Proposed-RD.PNG b/specs/images/Proposed-RD.PNG
diff --git a/specs/images/RD-Prefix.JPG b/specs/images/RD-Prefix.JPG
diff --git a/specs/images/schematics-topology.PNG b/specs/images/schematics-topology.PNG
diff --git a/specs/rd_collision.md b/specs/rd_collision.md
@@ -0,0 +1,199 @@
+#1. Introduction
+In large scale OpenContrail deployments, compute nodes are typically assigned  
+IP addresses from a private range (e.g. 172.16/12). The same IP range is often  
+reused across data centers. This poses a RD Collision problem when these data  
+centers are inter-connected through a hierarchical BGP topology. This blueprint  
+describes an implementation plan to address this problem.
+
+#2. Problem statement
+
+Consider a scenario as depicted below, where data centers are depicted as   
+sites, and where sites are inter-connected.
+![](images/schematics-topology.png)
+The vrouter-agent module in a compute node advertises its local prefixes using  
+XMPP to the control node (synonymous with a BGP RR). The control node announces  
+these routes to the gateway and other control nodes using BGP as VPNv4 AF   
+NLRIs[rfc4364].   
+The RD for each such prefix is formatted as follows:[contrail_controller_src]:
+
+![](images/RD-Prefix.jpg)
+
+The VRF ID is a dynamically allocated number, per routing instance created at  
+the control node. This works fine within the data center, but may collide with  
+RD values generated at other centers when the routing information is exchanged  
+with other control nodes, gateways, and sites. Examples of RD collision:
+
+2.1. When “control node1” advertises to “control node2” within the site
+
+2.2. On the external RR, when gateways from different sites advertise routing  
+information with same RD. This will lead to information hiding as the RR would  
+advertise only the best path for the RD:ip NLRI.
+
+2.3. On the Gateway, when it receives the prefix with same RD internally and  
+the other site (through the RR). Depending on the BGP implementation at the  
+gateway routers, this may lead to a few issues:
+
+a. Suppose the same prefix belonging to the same VPN is advertised from both  
+sites. Each gateway will now have two identical VPNv4 NLRIs, one local to the  
+site and one pointing to the other site. This may be an anycast prefix. For  
+the most part, this should work fine as the prefix local to the site is  
+preferred. Regardless, there is a lack of path diversity as the remote site  
+prefix will not be advertised internally (unless add-path is supported).
+
+b. Suppose the same (or different) prefix belonging to different VPNs is  
+advertised from the sites. Most implementation will handle this correctly  
+by importing to the VRFs guided by the associated RT values. Regardless,   
+it creates confusion and is error-prone
+
+#3. Proposed solution
+The proposal is to implement a configurable seeding parameter at the control  
+nodes to make the generated RDs unique. More specifically, the operator  
+configures a 2-byte RDClusterSeed per Contrail cluster that is guaranteed to be  
+unique across all the managed clusters. Contrail BGP uses the seed to come up  
+with the RD value as follows:
+![](images/Proposed-RD.png)
+
+The default behavior remains unchanged, that is, the generated RD is based on  
+the compute node’s IP address as described in the Introduction section.  This  
+proposal will still generate Type 1 RD.
+
+##3.1 Alternatives considered
+
+3.1.1 Using ASN as RDClusterSeed
+
+An easy, automated approach would have been to use the same schema as above,  
+but use the cluster’s ASN as the seed. The advantage is that no configuration  
+is needed. It, however, does not eliminate collision completely, as the same  
+ASN is used across sites.
+
+3.1.2 Using control node’s routerID as the RD
+
+This is how the control node generates statically configured prefixes. The  
+downside is that if multiple compute nodes advertise the same prefix belonging  
+to the same VPN (e.g. an anycast prefix), the control node will announce one  
+path only.
+
+3.1.3 A new RD type (=4)
+
+The RD as defined above (cluster id, 2-bytes of compute node id, and the VRF id)  
+can be encoded with a new RD Type. This would have the following benefits:
+1. Avoid possibility of conflict/overlap with RDs generated by regular PEs  
+from their IP address/router-id
+2. RDs would not be generated from private IPs (discouraged by RFC4364)
+3. Explicit indication of the scope of the RD which can be used to filter vpn
+routes at well-defined boundaries
+
+It, however, has some disadvantages:
+
+1. May lead to undefined behavior in some implementations,
+
+2. Few BGP implementations support policy matches on RD. Those will not
+be able to parse the new RD type and will drop prefixes.
+
+3. RD won’t be pretty-printed on the routers.
+
+3.1.4 A pairing function to generate unique RD
+
+One can use a pairing function (such as Cantor’s) from the compute node IP and  
+control node routerID to come up with a unique 32-bit number that can be put in  
+the administrator subfield. This is somewhat compute-intensive.
+###3.2 API schema changes
+Not contemplated at this time.
+
+##3.3 User workflow impact
+None
+
+##3.4 UI changes
+Not contemplated at this time
+
+##3.5 Notification impact
+Not contemplated at this time
+
+#4. Implementation
+
+4.1 bgp_schema.xsd:
+
+Add a new XSD element called “route-distinguisher_cluster_seed” in  
+global-system-config: 
+
+    <xsd:element name='route_distinguisher_cluster_seed' type='xsd:integer'  
+    required='optional' operations='CRUD' description='Only for contrail  
+    control nodes, a 16 bit seed per cluster that is used to create unique  
+    RD values'/>
+
+4.2 bgp_config.h:
+
+Add ‘uint32_t route_distinguisher_cluster_seed_’ to BgpProtocolConfig class  
+along with getter/setter methods.
+
+4.3 bgp_server.cc:
+
+Handle changes in route_distinguisher_cluster_seed_ in  
+ProcessGlobalSystemConfig() function.
+
+4.4 bgp_server.h:
+
+Add 'uint32_t route_distinguisher_cluster_seed_' to BGPServer class along with  
+getter/setter methods.
+
+4.5 bgp_xmpp_channel.cc:
+
+Modify ProcessItem() function to compute RD value based on  
+route_distinguisher_cluster_seed (if set in bgp_server_) instead of the current  
+algorithm that is based on next-hop IP.
+
+4.6 Clear all XMPP sessions when route_distinguisher_cluster_seed value changes.
+
+4.7 Sample configuration file
+
+    ?xml version="1.0" encoding="utf-8"?>
+    <config>
+      <global-system-config>
+        <route-distinguisher_cluster_seed>100</route-distinguisher_cluster_seed>
+      </global-system-config>
+      <routing-instance name='default-domain:default-project:ip-fabric:__default__'>
+        <bgp-router name='local'>
+          <address>127.0.0.1</address>
+          <autonomous-system>65511</autonomous-system>
+        </bgp-router>
+        <bgp-router name='remote'>
+          <address>127.0.0.100</address>
+          <autonomous-system>1</autonomous-system>
+        </bgp-router>
+      </routing-instance>
+    </config>
+
+#5. Performance and scaling impact
+We do not expect any impact to performance and scaling with the proposed  
+solution.
+
+#6. Upgrade
+No impact to upgrade as the solution proposes a new configuration parameter
+
+#7. Deprecations
+This feature does not deprecate any existing functionality.
+
+#8. Dependencies
+No dependencies.
+
+#9. Testing
+On a basic opencontrail sandbox, configure RDClusterSeed and check RD  
+generation behavior. Unconfigure and check again.
+
+On a complex opencontrail setup that includes multiple clusters and compute  
+nodes with same IP addresses, check RD generation behavior with and without  
+RDClusterSeed knob.
+
+Change RDClusterSeed and check that routes are re-advertised.
+
+#10. Documentation Impact
+Potentially Configuration Requirements and Release Notes
+
+#11. References
+bgp_design] http://juniper.github.io/contrail-vnc/bgp_design.html
+
+[adding_bgp_knob_to_opencontrail] http://www.opencontrail.org/addingbgp-knob-to-  
+opencontrail/
+
+[contrail_controller_src] https://github.com/Juniper/contrail-controller  
+[RFC4364] https://www.ietf.org/rfc/rfc4364.txt