Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Specs to address an issue with RD Collision
In large scale Contrail deployments, Compute IP addresses are often reused across data centers. This poses a RD collision problem when these data centers are inter-connected through a hierarchical BGP topology. This spec describes an implementation plan to address this problem. The proposal will still generate a Type 1 RD. Partial-Bug: #1679198 Change-Id: I23b381b0f09bb9b7a3493dff6fd7f9dd3f5c7edb
- Loading branch information
1 parent
cff205b
commit 052f9b2
Showing
4 changed files
with
199 additions
and
0 deletions.
There are no files selected for viewing
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,199 @@ | ||
#1. Introduction | ||
In large scale OpenContrail deployments, compute nodes are typically assigned | ||
IP addresses from a private range (e.g. 172.16/12). The same IP range is often | ||
reused across data centers. This poses a RD Collision problem when these data | ||
centers are inter-connected through a hierarchical BGP topology. This blueprint | ||
describes an implementation plan to address this problem. | ||
|
||
#2. Problem statement | ||
|
||
Consider a scenario as depicted below, where data centers are depicted as | ||
sites, and where sites are inter-connected. | ||
![](images/schematics-topology.png) | ||
The vrouter-agent module in a compute node advertises its local prefixes using | ||
XMPP to the control node (synonymous with a BGP RR). The control node announces | ||
these routes to the gateway and other control nodes using BGP as VPNv4 AF | ||
NLRIs[rfc4364]. | ||
The RD for each such prefix is formatted as follows:[contrail_controller_src]: | ||
|
||
![](images/RD-Prefix.jpg) | ||
|
||
The VRF ID is a dynamically allocated number, per routing instance created at | ||
the control node. This works fine within the data center, but may collide with | ||
RD values generated at other centers when the routing information is exchanged | ||
with other control nodes, gateways, and sites. Examples of RD collision: | ||
|
||
2.1. When “control node1” advertises to “control node2” within the site | ||
|
||
2.2. On the external RR, when gateways from different sites advertise routing | ||
information with same RD. This will lead to information hiding as the RR would | ||
advertise only the best path for the RD:ip NLRI. | ||
|
||
2.3. On the Gateway, when it receives the prefix with same RD internally and | ||
the other site (through the RR). Depending on the BGP implementation at the | ||
gateway routers, this may lead to a few issues: | ||
|
||
a. Suppose the same prefix belonging to the same VPN is advertised from both | ||
sites. Each gateway will now have two identical VPNv4 NLRIs, one local to the | ||
site and one pointing to the other site. This may be an anycast prefix. For | ||
the most part, this should work fine as the prefix local to the site is | ||
preferred. Regardless, there is a lack of path diversity as the remote site | ||
prefix will not be advertised internally (unless add-path is supported). | ||
|
||
b. Suppose the same (or different) prefix belonging to different VPNs is | ||
advertised from the sites. Most implementation will handle this correctly | ||
by importing to the VRFs guided by the associated RT values. Regardless, | ||
it creates confusion and is error-prone | ||
|
||
#3. Proposed solution | ||
The proposal is to implement a configurable seeding parameter at the control | ||
nodes to make the generated RDs unique. More specifically, the operator | ||
configures a 2-byte RDClusterSeed per Contrail cluster that is guaranteed to be | ||
unique across all the managed clusters. Contrail BGP uses the seed to come up | ||
with the RD value as follows: | ||
![](images/Proposed-RD.png) | ||
|
||
The default behavior remains unchanged, that is, the generated RD is based on | ||
the compute node’s IP address as described in the Introduction section. This | ||
proposal will still generate Type 1 RD. | ||
|
||
##3.1 Alternatives considered | ||
|
||
3.1.1 Using ASN as RDClusterSeed | ||
|
||
An easy, automated approach would have been to use the same schema as above, | ||
but use the cluster’s ASN as the seed. The advantage is that no configuration | ||
is needed. It, however, does not eliminate collision completely, as the same | ||
ASN is used across sites. | ||
|
||
3.1.2 Using control node’s routerID as the RD | ||
|
||
This is how the control node generates statically configured prefixes. The | ||
downside is that if multiple compute nodes advertise the same prefix belonging | ||
to the same VPN (e.g. an anycast prefix), the control node will announce one | ||
path only. | ||
|
||
3.1.3 A new RD type (=4) | ||
|
||
The RD as defined above (cluster id, 2-bytes of compute node id, and the VRF id) | ||
can be encoded with a new RD Type. This would have the following benefits: | ||
1. Avoid possibility of conflict/overlap with RDs generated by regular PEs | ||
from their IP address/router-id | ||
2. RDs would not be generated from private IPs (discouraged by RFC4364) | ||
3. Explicit indication of the scope of the RD which can be used to filter vpn | ||
routes at well-defined boundaries | ||
|
||
It, however, has some disadvantages: | ||
|
||
1. May lead to undefined behavior in some implementations, | ||
|
||
2. Few BGP implementations support policy matches on RD. Those will not | ||
be able to parse the new RD type and will drop prefixes. | ||
|
||
3. RD won’t be pretty-printed on the routers. | ||
|
||
3.1.4 A pairing function to generate unique RD | ||
|
||
One can use a pairing function (such as Cantor’s) from the compute node IP and | ||
control node routerID to come up with a unique 32-bit number that can be put in | ||
the administrator subfield. This is somewhat compute-intensive. | ||
###3.2 API schema changes | ||
Not contemplated at this time. | ||
|
||
##3.3 User workflow impact | ||
None | ||
|
||
##3.4 UI changes | ||
Not contemplated at this time | ||
|
||
##3.5 Notification impact | ||
Not contemplated at this time | ||
|
||
#4. Implementation | ||
|
||
4.1 bgp_schema.xsd: | ||
|
||
Add a new XSD element called “route-distinguisher_cluster_seed” in | ||
global-system-config: | ||
|
||
<xsd:element name='route_distinguisher_cluster_seed' type='xsd:integer' | ||
required='optional' operations='CRUD' description='Only for contrail | ||
control nodes, a 16 bit seed per cluster that is used to create unique | ||
RD values'/> | ||
|
||
4.2 bgp_config.h: | ||
|
||
Add ‘uint32_t route_distinguisher_cluster_seed_’ to BgpProtocolConfig class | ||
along with getter/setter methods. | ||
|
||
4.3 bgp_server.cc: | ||
|
||
Handle changes in route_distinguisher_cluster_seed_ in | ||
ProcessGlobalSystemConfig() function. | ||
|
||
4.4 bgp_server.h: | ||
|
||
Add 'uint32_t route_distinguisher_cluster_seed_' to BGPServer class along with | ||
getter/setter methods. | ||
|
||
4.5 bgp_xmpp_channel.cc: | ||
|
||
Modify ProcessItem() function to compute RD value based on | ||
route_distinguisher_cluster_seed (if set in bgp_server_) instead of the current | ||
algorithm that is based on next-hop IP. | ||
|
||
4.6 Clear all XMPP sessions when route_distinguisher_cluster_seed value changes. | ||
|
||
4.7 Sample configuration file | ||
|
||
?xml version="1.0" encoding="utf-8"?> | ||
<config> | ||
<global-system-config> | ||
<route-distinguisher_cluster_seed>100</route-distinguisher_cluster_seed> | ||
</global-system-config> | ||
<routing-instance name='default-domain:default-project:ip-fabric:__default__'> | ||
<bgp-router name='local'> | ||
<address>127.0.0.1</address> | ||
<autonomous-system>65511</autonomous-system> | ||
</bgp-router> | ||
<bgp-router name='remote'> | ||
<address>127.0.0.100</address> | ||
<autonomous-system>1</autonomous-system> | ||
</bgp-router> | ||
</routing-instance> | ||
</config> | ||
|
||
#5. Performance and scaling impact | ||
We do not expect any impact to performance and scaling with the proposed | ||
solution. | ||
|
||
#6. Upgrade | ||
No impact to upgrade as the solution proposes a new configuration parameter | ||
|
||
#7. Deprecations | ||
This feature does not deprecate any existing functionality. | ||
|
||
#8. Dependencies | ||
No dependencies. | ||
|
||
#9. Testing | ||
On a basic opencontrail sandbox, configure RDClusterSeed and check RD | ||
generation behavior. Unconfigure and check again. | ||
|
||
On a complex opencontrail setup that includes multiple clusters and compute | ||
nodes with same IP addresses, check RD generation behavior with and without | ||
RDClusterSeed knob. | ||
|
||
Change RDClusterSeed and check that routes are re-advertised. | ||
|
||
#10. Documentation Impact | ||
Potentially Configuration Requirements and Release Notes | ||
|
||
#11. References | ||
bgp_design] http://juniper.github.io/contrail-vnc/bgp_design.html | ||
|
||
[adding_bgp_knob_to_opencontrail] http://www.opencontrail.org/addingbgp-knob-to- | ||
opencontrail/ | ||
|
||
[contrail_controller_src] https://github.com/Juniper/contrail-controller | ||
[RFC4364] https://www.ietf.org/rfc/rfc4364.txt |