Skip to content

Announce route support for T2 chassis topology#3115

Merged
yxieca merged 4 commits intosonic-net:masterfrom
oxygen980:announce
Apr 2, 2021
Merged

Announce route support for T2 chassis topology#3115
yxieca merged 4 commits intosonic-net:masterfrom
oxygen980:announce

Conversation

@oxygen980
Copy link
Contributor

Description of PR

Summary: Add support for announce_route to work against a T2 topology
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Approach

What is the motivation for this PR?

Need to advertise routes to the T2 VoQ chassis from the T1 and T3 to simulate a typical deployment of T2 chassis in a data center.

The requirements for the proposed T2 topology in ansible/vars/topo_t2.yml:

  • Advertise total 12K routes to the chassis
    • ~100 routes from down-strem T1
    • 12K routes from upstream T3
  • The upstream routes should have a mix of 8, 16 and 24 ECMP paths to the T3 VMs
  • The down-stream routes should have a mix of 16,32, and 48 ECMP paths across 2 linecards to the T1 VMs

How did you do it?

For T2, we have 3 sets of routes that we are going to advertise

  • 1st set of 1/3 routes are advertised by the first 1/3 of the VMs
  • 2nd set of 1/3 routes are advertised by the remaining 2/3rd of the VMs
  • 3rd set of 1/3 routes are advertised by all the VMs

Also, T1 VM's are distributed over 2 linecards (asics). The same set of routes should be sent by
the same set in both the linecards. So, if linecard1 and linecard2 have T1 VMs connected, then

  • 1st set of routes should be advertised by the first 1/3 of the VMs on linecard1 and also by the first 1/3 of the VMs on linecard2.
  • 2nd set of routes should be advertised by the remaining 2/3 of the VMs on linecard1 and also by the remaining 2/3 of the VMs on linecard2.
  • 3rd set of routes should be advertised by the all VMs on linecard1 and also by all VMs on linecard2
    It is assumed that tne number of T1 VMs that on both the linecards is the same.
    If we don't have 2 linecards for T1 VMs, then routes above would be advertised only by the first linecard that has T1 VMs

The total number of routes are controlled by the podset_number, tor_number, and tor_subnet_number from the topology file.
With the proposed T2 topology with 400 podsets, and 32 routes per podset we would have 12K routes.
In this topology, we have 24 VMs on each linecard (24 T3 VMs and 48 T1 VMs over 2 linecards).
We would have the following distribution:

  • T1 Routes:
    • 192.168.xx.xx (32 routes) from the first 8 T1 VM's from linecard2 and linecard3 (VM25-VM32, and VM49-VM56)
    • 192.169.xx.xx (32 routes) from the remaining 16 T1 VM's on linecard2 and linecard3 (VM33-VM48, and VM64-VM72)
    • 192.170.xx.xx (32 routes) from all T1 VMs on linecard2 and linecard3 (VM25-VM48, and VM49-VM72)
  • T2 Routes:
    • 192.171.xx.xx to 193.45.xx.xx (4K routes) from from first 8 T3 VM's on linecard1 (VM1-VM8)
    • 193.46.xx.xx to 193.176.xx.xx (4K routes) from the remaining 16 T3 VM's on linecard1 (VM9-VM24)
    • 193.177.xx.xx - 194.55.xx.xx (4K routes) from all 24 T3 VM's on linecard1 (VM1-VM24)
    • default route from all 24 T3 VM's on linecard1 (VM1-VM24)

Other changes:

  • testbed.py:

    • Added 't2' as the topology type.
  • topo_t2.yml:

    • Updated topo_t2.yml to reflect the changes for routes (podset_number etc.) and also changing the ASN's to align with a typical T2 deployment
  • Also added some missing template files for ceos dockers when doing add-topo for a T2 topology.

How did you verify/test it?

Ran announce_routes against a T2 chassis

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@oxygen980 oxygen980 requested a review from a team as a code owner March 9, 2021 16:11
@anshuv-mfst
Copy link

@yxieca, @wangxin, @rlhui - could you please review.

@sanmalho-git
Copy link
Contributor

@yxieca @rita can you please review the changes

@shubav
Copy link
Contributor

shubav commented Apr 1, 2021

@yxieca, can we get this merged? All comments have been incorporated.

falodiya added 4 commits April 1, 2021 16:49
For T2, we have 3 sets of routes that we are going to advertise
    - 1st set of 1/3 routes are advertised by the first 1/3 of the VMs
    - 2nd set of 1/3 routes are advertised by the remaining 2/3rd of the VMs
    - 3rd set of 1/3 routes are advertised by all the VMs
Also, T1 VM's are distributed over 2 linecards (asics). The same set of routes should be sent by
the same set in both the linecards. So, if linecard1 and linecard2 have T1 VMs connected, then
    -  1st set of routes should be advertised by the first 1/3 of the VMs on linecard1 and also by the first
       1/3 of the VMs on linecard2.
    -  2nd set of routes should be advertised by the remaining 2/3 of the VMs on linecard1 and also by the remaining
        2/3 of the VMs on linecard2.
    -  3rd set of routes should be advertised by the all VMs on linecard1 and also by all VMs on linecard2
It is assumed that tne number of T1 VMs that on both the linecards is the same.
If we don't have 2 linecards for T1 VMs, then routes above would be advertised only by the first linecard that has T1 VMs

The total number of routes are controlled by the podset_number, tor_number, and tor_subnet_number from the topology file.
With the proposed T2 topology with 400 podsets, and 32 routes per podset we would have 12K routes.
In this topology, we have 24 VMs on each linecard (24 T3 VMs and 48 T1 VMs over 2 linecards).
We would have the following distribution:
- T1 Routes:
   - 192.168.xx.xx (32 routes) from the first 8 T1 VM's from linecard2 and linecard3 (VM25-VM32, and VM49-VM56)
   - 192.169.xx.xx (32 routes) from the remaining 16 T1 VM's on linecard2 and linecard3 (VM33-VM48, and VM64-VM72)
   - 192.170.xx.xx (32 routes) from all T1 VMs on linecard2 and linecard3 (VM25-VM48, and VM49-VM72)
- T2 Routes:
   - 192.171.xx.xx to 193.45.xx.xx (4K routes) from from first 8 T3 VM's on linecard1 (VM1-VM8)
   - 193.46.xx.xx to 193.176.xx.xx (4K routes) from the remaining 16 T3 VM's on linecard1 (VM9-VM24)
   - 193.177.xx.xx - 194.55.xx.xx (4K routes) from all 24 T3 VM's on linecard1 (VM1-VM24)
   - default route from all 24 T3 VM's on linecard1 (VM1-VM24)

testbed.py:
- Added 't2' as the topology type.

topo_t2.yml:
- Updated topo_t2.yml to reflect the changes for routes (podset_number etc.) and also changing the ASN's to align with a typical T2 deployment

Also added some missing template files for ceos dockers when doing add-topo for a T2 topology.
…sed on router type

This was requsted in the review comments.
@yxieca yxieca merged commit dcbbc99 into sonic-net:master Apr 2, 2021
nirmalya-keysight pushed a commit to nirmalya-keysight/sonic-mgmt that referenced this pull request Apr 5, 2021
Approach
What is the motivation for this PR?
Need to advertise routes to the T2 VoQ chassis from the T1 and T3 to simulate a typical deployment of T2 chassis in a data center.

The requirements for the proposed T2 topology in ansible/vars/topo_t2.yml:

Advertise total 12K routes to the chassis
~100 routes from down-strem T1
12K routes from upstream T3
The upstream routes should have a mix of 8, 16 and 24 ECMP paths to the T3 VMs
The down-stream routes should have a mix of 16,32, and 48 ECMP paths across 2 linecards to the T1 VMs
How did you do it?
For T2, we have 3 sets of routes that we are going to advertise

1st set of 1/3 routes are advertised by the first 1/3 of the VMs
2nd set of 1/3 routes are advertised by the remaining 2/3rd of the VMs
3rd set of 1/3 routes are advertised by all the VMs
Also, T1 VM's are distributed over 2 linecards (asics). The same set of routes should be sent by
the same set in both the linecards. So, if linecard1 and linecard2 have T1 VMs connected, then

1st set of routes should be advertised by the first 1/3 of the VMs on linecard1 and also by the first 1/3 of the VMs on linecard2.
2nd set of routes should be advertised by the remaining 2/3 of the VMs on linecard1 and also by the remaining 2/3 of the VMs on linecard2.
3rd set of routes should be advertised by the all VMs on linecard1 and also by all VMs on linecard2
It is assumed that tne number of T1 VMs that on both the linecards is the same.
If we don't have 2 linecards for T1 VMs, then routes above would be advertised only by the first linecard that has T1 VMs
The total number of routes are controlled by the podset_number, tor_number, and tor_subnet_number from the topology file.
With the proposed T2 topology with 400 podsets, and 32 routes per podset we would have 12K routes.
In this topology, we have 24 VMs on each linecard (24 T3 VMs and 48 T1 VMs over 2 linecards).
We would have the following distribution:

T1 Routes:
192.168.xx.xx (32 routes) from the first 8 T1 VM's from linecard2 and linecard3 (VM25-VM32, and VM49-VM56)
192.169.xx.xx (32 routes) from the remaining 16 T1 VM's on linecard2 and linecard3 (VM33-VM48, and VM64-VM72)
192.170.xx.xx (32 routes) from all T1 VMs on linecard2 and linecard3 (VM25-VM48, and VM49-VM72)
T2 Routes:
192.171.xx.xx to 193.45.xx.xx (4K routes) from from first 8 T3 VM's on linecard1 (VM1-VM8)
193.46.xx.xx to 193.176.xx.xx (4K routes) from the remaining 16 T3 VM's on linecard1 (VM9-VM24)
193.177.xx.xx - 194.55.xx.xx (4K routes) from all 24 T3 VM's on linecard1 (VM1-VM24)
default route from all 24 T3 VM's on linecard1 (VM1-VM24)
Other changes:

testbed.py:

Added 't2' as the topology type.
topo_t2.yml:

Updated topo_t2.yml to reflect the changes for routes (podset_number etc.) and also changing the ASN's to align with a typical T2 deployment
Also added some missing template files for ceos dockers when doing add-topo for a T2 topology.

How did you verify/test it?
Ran announce_routes against a T2 chassis
yxieca pushed a commit that referenced this pull request Apr 15, 2021
Approach
What is the motivation for this PR?
Need to add support for a virtual T2 VoQ chassis using KVMs. This will enable ability to run the sonic-mgmt tests in an virtual environment.

How did you do it?
Detailed explanation of the solution is at docs/testbed/README.testbed.vsChassis.md

A KVM based virtual T2 chassis is a multi-dut setup with 3 single-asic KVM's (2 linecards and 1 supervisor card).
Each linecard has 2 eBGP peers - 1 over a 2-port LAG and 1 over a single port as shown below:

          VM0100                                      VM0101
            ||                                          |
  +---------||------------------------------------------|-------------------+
  |         ||                                          |                   |
  |   +----------------------------------------------------------------+    |
  |   |                         Linecard1                              |    |
  |   +----------------------------------------------------------------+    |
  |                                   |                      |              |
  |   +------------+      +--------------------+     +-----------------+    |
  |   | Supervisor |------|  ovs br-T2Midplane |     | ovs br-T2Inband |    |
  |   +------------+      +--------------------+     +-----------------+    |
  |                                   |                      |              |
  |   +----------------------------------------------------------------+    |
  |   |                          Linecard2                             |    |
  |   +----------------------------------------------------------------+    |
  |           ||                                         |                  |
  +-----------||-----------------------------------------|------------------+
              ||                                         |
	VM0102                                     VM0103
For a T2 chassis, we require the following:

midplane connectivity between all the cards - for voq control path.
inband connectivity between the linecards to forward traffic ingressing one linecard to another linecard.
We add the info above into the topology file (topo_t2-vs.yml) under a new vs_chassis section under DUT

For the inband and midplane connectivity, we are going to use ovs bridges and add the frontpanel ports
defined for the respective functionality to the bridges.
The orchestration of the above topology is done as follows:

Bring up the 3 KVM's as pizza boxes with topology file defined in topo_t2-vs.yml, include deploy-mg.

As part of the bringup:
For midplane connectivity, we create br-T2Midplane and add port eth32 (Ethernet124) of all the KVM's
For inband connectivity, we create br-T2Inband and add port eth31 (Ethernet120) of the linecard KVM's
We then run test_vs_chassis_setup that:

Configures the midplane address on each linecard
The needs to happen even before the config_db is loaded.
We do so by appending linux commands for this are to the end of /etc/rc.local before the last 'exit 0' line.
Add 'chassisdb.conf' to /usr/share/sonic/device/x86_64-kvm_x86_64-r0/ directory.
contents for this on linecards are just the chassis_db_address
contents for this on supervisor card include start_chassis_db.
Since we don't have gen-gm/deploy-mg functionality for t2 voq chassis, we copy hard-coded config_db.json
from tests/vs_voq_cfgs/ to /etc/sonic/config_db.json on each card.
Reboot all the cards.
Test for midplane connectivity by ping the chassis_db_address for all the linecards.
Test for chassis-db connectivity by getting SYSTEM_INTERFACE table using 'redis-cli'
For running the vs chassis, we added 't2' Test group to kvmtest.sh. In this group, we:

skip_sanity and disable_loganalyzer:

Until we have all design PR's merged
the iBGP peers on the inband port don't come up and sanity fails.
some log messages are showing up as errors as well.
Call run_test.sh with test_vs_chassis_setup.py with '-E' option to exit if we have a failure in setting up the vs chassis.

As a sample test, call run_test.sh with test_voq_init.py. In this suite, there are 6 tests, 5 pass and 1 is skipped as we don't have
inband iBGP peers established because of missing code from design.

Some other enhancements:

reboot.py:
Added hostname to all the log/error messages. This was required as we are rebooting all the DUTs in parallel.
announce_routes.py:
For an unsupported topology, don't fail, just log message that announce route is not supported on the topology. This is required until PR #3115 is merged.
changed check for 'is_supervisor_card' to use 'card_type' as the inventory variable, instead of 'type'. For vs, type field is used for the type of vm like 'kvm', 'ceos', 'veos'
How did you verify/test it?
Ran the following

 ./kvmtest.sh -T t2 vms-kvm-t2 vlab-t2-01
Also added routes to be announced to the linecards from the eBGP peers.

~1K routes to linecard1 from T3 VMs (VM0100 and VM0101)
1/3 from LAG eBGP peer: 192.171.x.x - 192.184.x.x
1/3 from single-port eBGP peer: 192.185.x.x - 192.199.x.x
1/3 from both LAG and single-port eBGP: 192.200.x.x - 192.217.x.x
~100 routes to linecard2 from T1 VMs (VM0102 and VM0103)
1/3 from LAG eBGP peer: 192.168.x.x - 192.168.240.x
1/3 from single-port eBGP peer: 192.169.x.x - 192.169.240.x
1/3 from both LAG and single-port eBGP: 192.170.x.x - 192.170.240.x
Also when we start the sonic_vm, we are deleting the arp entry from that VM's mgmt_ip on the server host.
This is required, as otherwise when we run kvmtest.sh twice, the mac address of the VM's mgmt_ip on
server would change in each run, and sometimes in the second run, it would be in 'incomplete' state.

sudo arp -n | grep 10.250.0.12
10.250.0.120                     (incomplete)                              br2
10.250.0.121                     (incomplete)                              br2
10.250.0.122             ether   52:54:00:c4:a2:f1   C                     br2
This results in the VM mgmt_ip not pingable from the server and thus gives 'AnsibleHostUnreachable' error.
saravanansv pushed a commit to saravanansv/sonic-mgmt that referenced this pull request May 6, 2021
Approach
What is the motivation for this PR?
Need to advertise routes to the T2 VoQ chassis from the T1 and T3 to simulate a typical deployment of T2 chassis in a data center.

The requirements for the proposed T2 topology in ansible/vars/topo_t2.yml:

Advertise total 12K routes to the chassis
~100 routes from down-strem T1
12K routes from upstream T3
The upstream routes should have a mix of 8, 16 and 24 ECMP paths to the T3 VMs
The down-stream routes should have a mix of 16,32, and 48 ECMP paths across 2 linecards to the T1 VMs
How did you do it?
For T2, we have 3 sets of routes that we are going to advertise

1st set of 1/3 routes are advertised by the first 1/3 of the VMs
2nd set of 1/3 routes are advertised by the remaining 2/3rd of the VMs
3rd set of 1/3 routes are advertised by all the VMs
Also, T1 VM's are distributed over 2 linecards (asics). The same set of routes should be sent by
the same set in both the linecards. So, if linecard1 and linecard2 have T1 VMs connected, then

1st set of routes should be advertised by the first 1/3 of the VMs on linecard1 and also by the first 1/3 of the VMs on linecard2.
2nd set of routes should be advertised by the remaining 2/3 of the VMs on linecard1 and also by the remaining 2/3 of the VMs on linecard2.
3rd set of routes should be advertised by the all VMs on linecard1 and also by all VMs on linecard2
It is assumed that tne number of T1 VMs that on both the linecards is the same.
If we don't have 2 linecards for T1 VMs, then routes above would be advertised only by the first linecard that has T1 VMs
The total number of routes are controlled by the podset_number, tor_number, and tor_subnet_number from the topology file.
With the proposed T2 topology with 400 podsets, and 32 routes per podset we would have 12K routes.
In this topology, we have 24 VMs on each linecard (24 T3 VMs and 48 T1 VMs over 2 linecards).
We would have the following distribution:

T1 Routes:
192.168.xx.xx (32 routes) from the first 8 T1 VM's from linecard2 and linecard3 (VM25-VM32, and VM49-VM56)
192.169.xx.xx (32 routes) from the remaining 16 T1 VM's on linecard2 and linecard3 (VM33-VM48, and VM64-VM72)
192.170.xx.xx (32 routes) from all T1 VMs on linecard2 and linecard3 (VM25-VM48, and VM49-VM72)
T2 Routes:
192.171.xx.xx to 193.45.xx.xx (4K routes) from from first 8 T3 VM's on linecard1 (VM1-VM8)
193.46.xx.xx to 193.176.xx.xx (4K routes) from the remaining 16 T3 VM's on linecard1 (VM9-VM24)
193.177.xx.xx - 194.55.xx.xx (4K routes) from all 24 T3 VM's on linecard1 (VM1-VM24)
default route from all 24 T3 VM's on linecard1 (VM1-VM24)
Other changes:

testbed.py:

Added 't2' as the topology type.
topo_t2.yml:

Updated topo_t2.yml to reflect the changes for routes (podset_number etc.) and also changing the ASN's to align with a typical T2 deployment
Also added some missing template files for ceos dockers when doing add-topo for a T2 topology.

How did you verify/test it?
Ran announce_routes against a T2 chassis
saravanansv pushed a commit to saravanansv/sonic-mgmt that referenced this pull request May 6, 2021
Approach
What is the motivation for this PR?
Need to add support for a virtual T2 VoQ chassis using KVMs. This will enable ability to run the sonic-mgmt tests in an virtual environment.

How did you do it?
Detailed explanation of the solution is at docs/testbed/README.testbed.vsChassis.md

A KVM based virtual T2 chassis is a multi-dut setup with 3 single-asic KVM's (2 linecards and 1 supervisor card).
Each linecard has 2 eBGP peers - 1 over a 2-port LAG and 1 over a single port as shown below:

          VM0100                                      VM0101
            ||                                          |
  +---------||------------------------------------------|-------------------+
  |         ||                                          |                   |
  |   +----------------------------------------------------------------+    |
  |   |                         Linecard1                              |    |
  |   +----------------------------------------------------------------+    |
  |                                   |                      |              |
  |   +------------+      +--------------------+     +-----------------+    |
  |   | Supervisor |------|  ovs br-T2Midplane |     | ovs br-T2Inband |    |
  |   +------------+      +--------------------+     +-----------------+    |
  |                                   |                      |              |
  |   +----------------------------------------------------------------+    |
  |   |                          Linecard2                             |    |
  |   +----------------------------------------------------------------+    |
  |           ||                                         |                  |
  +-----------||-----------------------------------------|------------------+
              ||                                         |
	VM0102                                     VM0103
For a T2 chassis, we require the following:

midplane connectivity between all the cards - for voq control path.
inband connectivity between the linecards to forward traffic ingressing one linecard to another linecard.
We add the info above into the topology file (topo_t2-vs.yml) under a new vs_chassis section under DUT

For the inband and midplane connectivity, we are going to use ovs bridges and add the frontpanel ports
defined for the respective functionality to the bridges.
The orchestration of the above topology is done as follows:

Bring up the 3 KVM's as pizza boxes with topology file defined in topo_t2-vs.yml, include deploy-mg.

As part of the bringup:
For midplane connectivity, we create br-T2Midplane and add port eth32 (Ethernet124) of all the KVM's
For inband connectivity, we create br-T2Inband and add port eth31 (Ethernet120) of the linecard KVM's
We then run test_vs_chassis_setup that:

Configures the midplane address on each linecard
The needs to happen even before the config_db is loaded.
We do so by appending linux commands for this are to the end of /etc/rc.local before the last 'exit 0' line.
Add 'chassisdb.conf' to /usr/share/sonic/device/x86_64-kvm_x86_64-r0/ directory.
contents for this on linecards are just the chassis_db_address
contents for this on supervisor card include start_chassis_db.
Since we don't have gen-gm/deploy-mg functionality for t2 voq chassis, we copy hard-coded config_db.json
from tests/vs_voq_cfgs/ to /etc/sonic/config_db.json on each card.
Reboot all the cards.
Test for midplane connectivity by ping the chassis_db_address for all the linecards.
Test for chassis-db connectivity by getting SYSTEM_INTERFACE table using 'redis-cli'
For running the vs chassis, we added 't2' Test group to kvmtest.sh. In this group, we:

skip_sanity and disable_loganalyzer:

Until we have all design PR's merged
the iBGP peers on the inband port don't come up and sanity fails.
some log messages are showing up as errors as well.
Call run_test.sh with test_vs_chassis_setup.py with '-E' option to exit if we have a failure in setting up the vs chassis.

As a sample test, call run_test.sh with test_voq_init.py. In this suite, there are 6 tests, 5 pass and 1 is skipped as we don't have
inband iBGP peers established because of missing code from design.

Some other enhancements:

reboot.py:
Added hostname to all the log/error messages. This was required as we are rebooting all the DUTs in parallel.
announce_routes.py:
For an unsupported topology, don't fail, just log message that announce route is not supported on the topology. This is required until PR sonic-net#3115 is merged.
changed check for 'is_supervisor_card' to use 'card_type' as the inventory variable, instead of 'type'. For vs, type field is used for the type of vm like 'kvm', 'ceos', 'veos'
How did you verify/test it?
Ran the following

 ./kvmtest.sh -T t2 vms-kvm-t2 vlab-t2-01
Also added routes to be announced to the linecards from the eBGP peers.

~1K routes to linecard1 from T3 VMs (VM0100 and VM0101)
1/3 from LAG eBGP peer: 192.171.x.x - 192.184.x.x
1/3 from single-port eBGP peer: 192.185.x.x - 192.199.x.x
1/3 from both LAG and single-port eBGP: 192.200.x.x - 192.217.x.x
~100 routes to linecard2 from T1 VMs (VM0102 and VM0103)
1/3 from LAG eBGP peer: 192.168.x.x - 192.168.240.x
1/3 from single-port eBGP peer: 192.169.x.x - 192.169.240.x
1/3 from both LAG and single-port eBGP: 192.170.x.x - 192.170.240.x
Also when we start the sonic_vm, we are deleting the arp entry from that VM's mgmt_ip on the server host.
This is required, as otherwise when we run kvmtest.sh twice, the mac address of the VM's mgmt_ip on
server would change in each run, and sometimes in the second run, it would be in 'incomplete' state.

sudo arp -n | grep 10.250.0.12
10.250.0.120                     (incomplete)                              br2
10.250.0.121                     (incomplete)                              br2
10.250.0.122             ether   52:54:00:c4:a2:f1   C                     br2
This results in the VM mgmt_ip not pingable from the server and thus gives 'AnsibleHostUnreachable' error.
vmittal-msft pushed a commit to vmittal-msft/sonic-mgmt that referenced this pull request Sep 28, 2021
Approach
What is the motivation for this PR?
Need to advertise routes to the T2 VoQ chassis from the T1 and T3 to simulate a typical deployment of T2 chassis in a data center.

The requirements for the proposed T2 topology in ansible/vars/topo_t2.yml:

Advertise total 12K routes to the chassis
~100 routes from down-strem T1
12K routes from upstream T3
The upstream routes should have a mix of 8, 16 and 24 ECMP paths to the T3 VMs
The down-stream routes should have a mix of 16,32, and 48 ECMP paths across 2 linecards to the T1 VMs
How did you do it?
For T2, we have 3 sets of routes that we are going to advertise

1st set of 1/3 routes are advertised by the first 1/3 of the VMs
2nd set of 1/3 routes are advertised by the remaining 2/3rd of the VMs
3rd set of 1/3 routes are advertised by all the VMs
Also, T1 VM's are distributed over 2 linecards (asics). The same set of routes should be sent by
the same set in both the linecards. So, if linecard1 and linecard2 have T1 VMs connected, then

1st set of routes should be advertised by the first 1/3 of the VMs on linecard1 and also by the first 1/3 of the VMs on linecard2.
2nd set of routes should be advertised by the remaining 2/3 of the VMs on linecard1 and also by the remaining 2/3 of the VMs on linecard2.
3rd set of routes should be advertised by the all VMs on linecard1 and also by all VMs on linecard2
It is assumed that tne number of T1 VMs that on both the linecards is the same.
If we don't have 2 linecards for T1 VMs, then routes above would be advertised only by the first linecard that has T1 VMs
The total number of routes are controlled by the podset_number, tor_number, and tor_subnet_number from the topology file.
With the proposed T2 topology with 400 podsets, and 32 routes per podset we would have 12K routes.
In this topology, we have 24 VMs on each linecard (24 T3 VMs and 48 T1 VMs over 2 linecards).
We would have the following distribution:

T1 Routes:
192.168.xx.xx (32 routes) from the first 8 T1 VM's from linecard2 and linecard3 (VM25-VM32, and VM49-VM56)
192.169.xx.xx (32 routes) from the remaining 16 T1 VM's on linecard2 and linecard3 (VM33-VM48, and VM64-VM72)
192.170.xx.xx (32 routes) from all T1 VMs on linecard2 and linecard3 (VM25-VM48, and VM49-VM72)
T2 Routes:
192.171.xx.xx to 193.45.xx.xx (4K routes) from from first 8 T3 VM's on linecard1 (VM1-VM8)
193.46.xx.xx to 193.176.xx.xx (4K routes) from the remaining 16 T3 VM's on linecard1 (VM9-VM24)
193.177.xx.xx - 194.55.xx.xx (4K routes) from all 24 T3 VM's on linecard1 (VM1-VM24)
default route from all 24 T3 VM's on linecard1 (VM1-VM24)
Other changes:

testbed.py:

Added 't2' as the topology type.
topo_t2.yml:

Updated topo_t2.yml to reflect the changes for routes (podset_number etc.) and also changing the ASN's to align with a typical T2 deployment
Also added some missing template files for ceos dockers when doing add-topo for a T2 topology.

How did you verify/test it?
Ran announce_routes against a T2 chassis
vmittal-msft pushed a commit to vmittal-msft/sonic-mgmt that referenced this pull request Sep 28, 2021
Approach
What is the motivation for this PR?
Need to add support for a virtual T2 VoQ chassis using KVMs. This will enable ability to run the sonic-mgmt tests in an virtual environment.

How did you do it?
Detailed explanation of the solution is at docs/testbed/README.testbed.vsChassis.md

A KVM based virtual T2 chassis is a multi-dut setup with 3 single-asic KVM's (2 linecards and 1 supervisor card).
Each linecard has 2 eBGP peers - 1 over a 2-port LAG and 1 over a single port as shown below:

          VM0100                                      VM0101
            ||                                          |
  +---------||------------------------------------------|-------------------+
  |         ||                                          |                   |
  |   +----------------------------------------------------------------+    |
  |   |                         Linecard1                              |    |
  |   +----------------------------------------------------------------+    |
  |                                   |                      |              |
  |   +------------+      +--------------------+     +-----------------+    |
  |   | Supervisor |------|  ovs br-T2Midplane |     | ovs br-T2Inband |    |
  |   +------------+      +--------------------+     +-----------------+    |
  |                                   |                      |              |
  |   +----------------------------------------------------------------+    |
  |   |                          Linecard2                             |    |
  |   +----------------------------------------------------------------+    |
  |           ||                                         |                  |
  +-----------||-----------------------------------------|------------------+
              ||                                         |
	VM0102                                     VM0103
For a T2 chassis, we require the following:

midplane connectivity between all the cards - for voq control path.
inband connectivity between the linecards to forward traffic ingressing one linecard to another linecard.
We add the info above into the topology file (topo_t2-vs.yml) under a new vs_chassis section under DUT

For the inband and midplane connectivity, we are going to use ovs bridges and add the frontpanel ports
defined for the respective functionality to the bridges.
The orchestration of the above topology is done as follows:

Bring up the 3 KVM's as pizza boxes with topology file defined in topo_t2-vs.yml, include deploy-mg.

As part of the bringup:
For midplane connectivity, we create br-T2Midplane and add port eth32 (Ethernet124) of all the KVM's
For inband connectivity, we create br-T2Inband and add port eth31 (Ethernet120) of the linecard KVM's
We then run test_vs_chassis_setup that:

Configures the midplane address on each linecard
The needs to happen even before the config_db is loaded.
We do so by appending linux commands for this are to the end of /etc/rc.local before the last 'exit 0' line.
Add 'chassisdb.conf' to /usr/share/sonic/device/x86_64-kvm_x86_64-r0/ directory.
contents for this on linecards are just the chassis_db_address
contents for this on supervisor card include start_chassis_db.
Since we don't have gen-gm/deploy-mg functionality for t2 voq chassis, we copy hard-coded config_db.json
from tests/vs_voq_cfgs/ to /etc/sonic/config_db.json on each card.
Reboot all the cards.
Test for midplane connectivity by ping the chassis_db_address for all the linecards.
Test for chassis-db connectivity by getting SYSTEM_INTERFACE table using 'redis-cli'
For running the vs chassis, we added 't2' Test group to kvmtest.sh. In this group, we:

skip_sanity and disable_loganalyzer:

Until we have all design PR's merged
the iBGP peers on the inband port don't come up and sanity fails.
some log messages are showing up as errors as well.
Call run_test.sh with test_vs_chassis_setup.py with '-E' option to exit if we have a failure in setting up the vs chassis.

As a sample test, call run_test.sh with test_voq_init.py. In this suite, there are 6 tests, 5 pass and 1 is skipped as we don't have
inband iBGP peers established because of missing code from design.

Some other enhancements:

reboot.py:
Added hostname to all the log/error messages. This was required as we are rebooting all the DUTs in parallel.
announce_routes.py:
For an unsupported topology, don't fail, just log message that announce route is not supported on the topology. This is required until PR sonic-net#3115 is merged.
changed check for 'is_supervisor_card' to use 'card_type' as the inventory variable, instead of 'type'. For vs, type field is used for the type of vm like 'kvm', 'ceos', 'veos'
How did you verify/test it?
Ran the following

 ./kvmtest.sh -T t2 vms-kvm-t2 vlab-t2-01
Also added routes to be announced to the linecards from the eBGP peers.

~1K routes to linecard1 from T3 VMs (VM0100 and VM0101)
1/3 from LAG eBGP peer: 192.171.x.x - 192.184.x.x
1/3 from single-port eBGP peer: 192.185.x.x - 192.199.x.x
1/3 from both LAG and single-port eBGP: 192.200.x.x - 192.217.x.x
~100 routes to linecard2 from T1 VMs (VM0102 and VM0103)
1/3 from LAG eBGP peer: 192.168.x.x - 192.168.240.x
1/3 from single-port eBGP peer: 192.169.x.x - 192.169.240.x
1/3 from both LAG and single-port eBGP: 192.170.x.x - 192.170.240.x
Also when we start the sonic_vm, we are deleting the arp entry from that VM's mgmt_ip on the server host.
This is required, as otherwise when we run kvmtest.sh twice, the mac address of the VM's mgmt_ip on
server would change in each run, and sometimes in the second run, it would be in 'incomplete' state.

sudo arp -n | grep 10.250.0.12
10.250.0.120                     (incomplete)                              br2
10.250.0.121                     (incomplete)                              br2
10.250.0.122             ether   52:54:00:c4:a2:f1   C                     br2
This results in the VM mgmt_ip not pingable from the server and thus gives 'AnsibleHostUnreachable' error.
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
…atically (sonic-net#17868)

#### Why I did it
src/sonic-utilities
```
* 83a548de - (HEAD -> 202305, origin/202305) Disable Key Validation feature during sonic-installation for Cisco Platforms (sonic-net#3115) (22 hours ago) [selvipal]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants