Skip to content

Commit

Permalink
Specs to address an issue with RD Collision
Browse files Browse the repository at this point in the history
In large scale Contrail deployments, Compute IP addresses are often
reused across data centers. This poses a RD collision problem when these
data centers are inter-connected through a hierarchical BGP topology.
This spec describes an implementation plan to address this problem. The
proposal will still generate a Type 1 RD.

Partial-Bug: #1679198

Change-Id: I23b381b0f09bb9b7a3493dff6fd7f9dd3f5c7edb
  • Loading branch information
sandeepopenstack committed Jul 11, 2017
1 parent cff205b commit 052f9b2
Show file tree
Hide file tree
Showing 4 changed files with 199 additions and 0 deletions.
Binary file added specs/images/Proposed-RD.PNG
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added specs/images/RD-Prefix.JPG
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added specs/images/schematics-topology.PNG
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
199 changes: 199 additions & 0 deletions specs/rd_collision.md
@@ -0,0 +1,199 @@
#1. Introduction
In large scale OpenContrail deployments, compute nodes are typically assigned
IP addresses from a private range (e.g. 172.16/12). The same IP range is often
reused across data centers. This poses a RD Collision problem when these data
centers are inter-connected through a hierarchical BGP topology. This blueprint
describes an implementation plan to address this problem.

#2. Problem statement

Consider a scenario as depicted below, where data centers are depicted as
sites, and where sites are inter-connected.
![](images/schematics-topology.png)
The vrouter-agent module in a compute node advertises its local prefixes using
XMPP to the control node (synonymous with a BGP RR). The control node announces
these routes to the gateway and other control nodes using BGP as VPNv4 AF
NLRIs[rfc4364].
The RD for each such prefix is formatted as follows:[contrail_controller_src]:

![](images/RD-Prefix.jpg)

The VRF ID is a dynamically allocated number, per routing instance created at
the control node. This works fine within the data center, but may collide with
RD values generated at other centers when the routing information is exchanged
with other control nodes, gateways, and sites. Examples of RD collision:

2.1. When “control node1” advertises to “control node2” within the site

2.2. On the external RR, when gateways from different sites advertise routing
information with same RD. This will lead to information hiding as the RR would
advertise only the best path for the RD:ip NLRI.

2.3. On the Gateway, when it receives the prefix with same RD internally and
the other site (through the RR). Depending on the BGP implementation at the
gateway routers, this may lead to a few issues:

a. Suppose the same prefix belonging to the same VPN is advertised from both
sites. Each gateway will now have two identical VPNv4 NLRIs, one local to the
site and one pointing to the other site. This may be an anycast prefix. For
the most part, this should work fine as the prefix local to the site is
preferred. Regardless, there is a lack of path diversity as the remote site
prefix will not be advertised internally (unless add-path is supported).

b. Suppose the same (or different) prefix belonging to different VPNs is
advertised from the sites. Most implementation will handle this correctly
by importing to the VRFs guided by the associated RT values. Regardless,
it creates confusion and is error-prone

#3. Proposed solution
The proposal is to implement a configurable seeding parameter at the control
nodes to make the generated RDs unique. More specifically, the operator
configures a 2-byte RDClusterSeed per Contrail cluster that is guaranteed to be
unique across all the managed clusters. Contrail BGP uses the seed to come up
with the RD value as follows:
![](images/Proposed-RD.png)

The default behavior remains unchanged, that is, the generated RD is based on
the compute node’s IP address as described in the Introduction section. This
proposal will still generate Type 1 RD.

##3.1 Alternatives considered

3.1.1 Using ASN as RDClusterSeed

An easy, automated approach would have been to use the same schema as above,
but use the cluster’s ASN as the seed. The advantage is that no configuration
is needed. It, however, does not eliminate collision completely, as the same
ASN is used across sites.

3.1.2 Using control node’s routerID as the RD

This is how the control node generates statically configured prefixes. The
downside is that if multiple compute nodes advertise the same prefix belonging
to the same VPN (e.g. an anycast prefix), the control node will announce one
path only.

3.1.3 A new RD type (=4)

The RD as defined above (cluster id, 2-bytes of compute node id, and the VRF id)
can be encoded with a new RD Type. This would have the following benefits:
1. Avoid possibility of conflict/overlap with RDs generated by regular PEs
from their IP address/router-id
2. RDs would not be generated from private IPs (discouraged by RFC4364)
3. Explicit indication of the scope of the RD which can be used to filter vpn
routes at well-defined boundaries

It, however, has some disadvantages:

1. May lead to undefined behavior in some implementations,

2. Few BGP implementations support policy matches on RD. Those will not
be able to parse the new RD type and will drop prefixes.

3. RD won’t be pretty-printed on the routers.

3.1.4 A pairing function to generate unique RD

One can use a pairing function (such as Cantor’s) from the compute node IP and
control node routerID to come up with a unique 32-bit number that can be put in
the administrator subfield. This is somewhat compute-intensive.
###3.2 API schema changes
Not contemplated at this time.

##3.3 User workflow impact
None

##3.4 UI changes
Not contemplated at this time

##3.5 Notification impact
Not contemplated at this time

#4. Implementation

4.1 bgp_schema.xsd:

Add a new XSD element called “route-distinguisher_cluster_seed” in
global-system-config:

<xsd:element name='route_distinguisher_cluster_seed' type='xsd:integer'
required='optional' operations='CRUD' description='Only for contrail
control nodes, a 16 bit seed per cluster that is used to create unique
RD values'/>

4.2 bgp_config.h:

Add ‘uint32_t route_distinguisher_cluster_seed_’ to BgpProtocolConfig class
along with getter/setter methods.

4.3 bgp_server.cc:

Handle changes in route_distinguisher_cluster_seed_ in
ProcessGlobalSystemConfig() function.

4.4 bgp_server.h:

Add 'uint32_t route_distinguisher_cluster_seed_' to BGPServer class along with
getter/setter methods.

4.5 bgp_xmpp_channel.cc:

Modify ProcessItem() function to compute RD value based on
route_distinguisher_cluster_seed (if set in bgp_server_) instead of the current
algorithm that is based on next-hop IP.

4.6 Clear all XMPP sessions when route_distinguisher_cluster_seed value changes.

4.7 Sample configuration file

?xml version="1.0" encoding="utf-8"?>
<config>
<global-system-config>
<route-distinguisher_cluster_seed>100</route-distinguisher_cluster_seed>
</global-system-config>
<routing-instance name='default-domain:default-project:ip-fabric:__default__'>
<bgp-router name='local'>
<address>127.0.0.1</address>
<autonomous-system>65511</autonomous-system>
</bgp-router>
<bgp-router name='remote'>
<address>127.0.0.100</address>
<autonomous-system>1</autonomous-system>
</bgp-router>
</routing-instance>
</config>

#5. Performance and scaling impact
We do not expect any impact to performance and scaling with the proposed
solution.

#6. Upgrade
No impact to upgrade as the solution proposes a new configuration parameter

#7. Deprecations
This feature does not deprecate any existing functionality.

#8. Dependencies
No dependencies.

#9. Testing
On a basic opencontrail sandbox, configure RDClusterSeed and check RD
generation behavior. Unconfigure and check again.

On a complex opencontrail setup that includes multiple clusters and compute
nodes with same IP addresses, check RD generation behavior with and without
RDClusterSeed knob.

Change RDClusterSeed and check that routes are re-advertised.

#10. Documentation Impact
Potentially Configuration Requirements and Release Notes

#11. References
bgp_design] http://juniper.github.io/contrail-vnc/bgp_design.html

[adding_bgp_knob_to_opencontrail] http://www.opencontrail.org/addingbgp-knob-to-
opencontrail/

[contrail_controller_src] https://github.com/Juniper/contrail-controller
[RFC4364] https://www.ietf.org/rfc/rfc4364.txt

0 comments on commit 052f9b2

Please sign in to comment.