Skip to content

LAG Feature Test Suite

Xin Liu edited this page Oct 25, 2017 · 2 revisions

Related documents Teamd documentation N/A

Overview

This LAG feature test suite is targeting on testing basic LAG feature functionalities on SONiC. In our testbed, 't0', 't1-lag' and 't0-64' have LAG configurations. This test suite includes several basic tests - FIB test, min-link test, and LACP test. Each test covers a basic functionality of LAG feature and ensures the switch works as expected under production scenarios.

Testbed


The test is targeting a running SONIC system with fully functioning configuration. The purpose of the test is not to test specific SAI API, but functional testing of LAG on SONIC system, making sure that traffic flows correctly, according to BGP routes advertised by BGP peers of SONIC switch, and the LAG configuration.

NOTE: Test will be able to run only in the testbed specifically created for LAG.

###Scale / Performance

###Related DUT CLI commands

DUT configuration is done via minigraph. See more information below

###Related SAI APIs

DUT Requirement


#Test structure

##Setup configuration

  • 8 LAGs per switch from the DUT to 8 EOS devices.
  • Each of the lag contains 2 members and the min-links is set to 2.
  • BGP sessions:
    • 16 front panel ports north bound towards spine devices
    • 16 front panel ports combine each two to have 8 LAGs south bound towards spines.
  • All TORs advertise 6 routes and all spine routers advertise 6402 routes. It is similar to the test environment set up for leaf devices without lags.

Ansible infrastructure changes

Minigraph file

sonic-mgmt uses minigraph files to described VM set (Arista vEOS devices) and the DUT switch. We will need to create a new minigraph file, describing VMs and the switch, with LAGs. The switch file will be named switch_lag.yml, and located at https://github.com/Azure/sonic-mgmt/blob/master/ansible/minigraph.

LAG related minigraph data

The LAG related data will be built from minigraph XML (minigraph_facts), namely from section.

Sample:

<PortChannelInterfaces>
    <PortChannel>
        <ElementType>PortChannelInterface</ElementType>
        <Name>PortChannel0</Name>
        <AttachTo>fortyGigE0/0;fortyGigE0/4</AttachTo>
        <SubInterface/>
    </PortChannel>
</PortChannelInterfaces>

This information will be consumed by teamd.j2 template introduced in LAG testbed Pull Reqeust (see port_channel variable) to produce LAG json configuration for teamd. see Setup of DUT switch for details on teamd json.

####LAG information file To properly validate traffic flow, PTF test will need to have know:

  • route ip prefix
  • list of LAGs which are members of ECMP group for given route
  • member ports of a LAG

#####route_info.txt example

1.1.1.1 [0,1],[2,3] // ECMP members are LAG0, LAG1. LAG0 with members p0,p1, LAG1 with members p2,p3
2.2.2.2 [2,3],[3,4] // ECMP members are LAG1, LAG2. LAG1 with members p2,p3, LAG2 with members p3,p4

Ansible test setup script for LAG will generate route_info.txt file containing information mentioned above. When invoking LAG PTF test, Ansible script will pass this file to the test.

Scripts for generating LAG configuration on SONIC

There will be lag.j2 script which will iterate over minigraph_facts and generate route_info.txt described above. lag.j2 will be invoked by Ansible playbook which will setup and start LAG PTF test.

Ansible scripts to setup and run LAG test

lag.common.yml

We'll introduce lag.common.yml ansible playbook to setup and run LAG test case.

In high level description, the script will perform following steps:

  1. Run lognanalyzer 'init' phase
  2. Run LAGTest and pass to it route_info.txt
  3. Run loganalyzer 'analyze' phase
lag.yml

We'll introduce lag.yml playbook, which will perform setups specific for each test case, and invoke the testtest.

lag.yml will:

  • Generate BGP route_info.txt file with information about all BGP routes - see /test/files/template/fib.j2
  • Generate route_info.txt once and pass to each test run.
  • perform test-case-specific setup

[LG] Not sure I totally understand how the common part will be used. Will you forward it the testcase itself and there will be such a switch case on this input value? [answer] you're right, inside will have a switch.

Setup of DUT switch

Setup of SONIC DUT will be done by Ansible scripts. During setup Ansible will push json file containing configuration for LAG. Data will be consumed by teamd.

Sample:

{
     "device": "team0",
     "runner": {
         "name": "lacp",
         "active": true,
         "min_ports": 2,
         "tx_hash": ["eth", "ipv4", "ipv6"]
     },
     "link_watch": {
         "name": "ethtool"
     },
     "ports": {
			Ethernet0:{},
			Ethernet4:{}
     }
}

Here team0 is the name of LAG port. Ethernet0, Ethernet4 are the front panel ports-members of the team0 lag port.

NOTE

  • According to current implementation in SONIC, there will be 1 json file for each LAG port.
  • For each LAG a separate teamd process will be started 1 json file.

J2 template to generate LAG configuration on SONIC

LAG testbed anbisle playbooks are using j2 scripts to generate JSON content to define the LAG structure for DUT. For each LAG port on DUT there be 1 json file is generated. So, in case of the setup with 8 LAGs, there will be 8 json files generated, each describing LAG and its member ports. Dedicated instance of teamd will be started with each json file.

An ansible playbook will push these files to the DUT during testbed setup.

JSON file settings:

  • min-link will be set == 2, which means if 1 or both of ports become nonoperational - the whole LAG will be non-operational as well.
  • fast_rate will be set to false always.

Setup of fanout switch

important No changes need to be done on the fanout switch. Fanout switch is setup once and its setup does not change across testbed setups. todo Clarify with MS team details of fanout switch setup, place link to describing document here.

Setup of VMs


vEOS VMs will be setup during testbed setup with proper LAG layout.

#PTF Test

Input files for PTF test


PTF test will be provided with a text input files describing the LAG layout and BGP routes on the DUT switch, route_info.txt and route_info.txt. Please see Ansible infrastructure changes section for description of both files.

Data in the files will be used to

  • generate traffic (using route_info.txt)
  • properly validate traffic is passing through valid LAGs physical ports.(using route_info.txt)

##Validation of traffic Each LAG will be mentioned in route_info.txt, as member of ECMP group. For each port we'll have port-to-lag mapping, see route_info.txt example

Traffic validation

For each route (src_ip):

  • validate that physical port through which packet was received belongs to one of the LAG ports, mentioned as ECMP members in the route_info.txt for given src__ip.
    1. using data from route_info.txt find mapping from physical port index to LAG index
    2. using route_info.txt check LAG index is in the ECMP group for the src__ip
  • traffic distributed evenly between LAG member ports - by keeping packet counters in PTF test for each port
  • traffic is distributed evenly between LAGs - by keeping counters in PTF for each LAG

The PTF test will keep per-port counter variables for counting packets arriving on different ports. The counters for LAG member ports will be used to compare for event traffic distribution. SONIC doesn't have LAG counters.

Address types to validate

PTF test will send traffic for both IPV4 and IPV6 routes.

##Test cases

The test assumes there is a mechanism to validate logs on the DUT where we should be able to analyze /var/log/syslog for error and warning messages related to the current test. In case such messages are detected, the test is considered failed. See lognalyzer related comments in lag.common.yml section

Test case #1 - Verify TCP traffic

Test objective

Verify traffic between legs evenly distributed. Traffic is forwarded by SONIC DUT.

  • PTF host will send packets according to the route_info.txt - will create packets with dst_ip according to route prefixes.
  • When packet reaches to SONIC DUT, it will route packet according to BGP routes, and send it to one of vEOS BGP peers.
  • PTF test will receive a copy of the packet and perform validations described in Validation of Traffic

NOTE: We are not targeting testing traffic coming into the DUT from BGP peers.

Test case #2 - min-link verification

Test objective

Verify min-link functionality.

Test configurations

[TODO][WORKITEM][clarify] - vEOS VM port shutdown/UP. Should be done using Arista command, from Ansible.

  • Arista command details, for both shutdown/up.
  • ssh possible from PTF host to the VMs?

####Test description

Admin down one interface at a time on DUT switch and verify that on both end the lag interface is down.

For each LAG interface on DUT:

  1. Bring down LAG port on DUT
  • lag.yml uses ifconfig command to put 1 member port of the LAG to 'down' state on DUT.
    • ansible will invoke interface_facts.py to validate corresponding LAG on DUT went down.
  • lag.yml passes --lag-down-index to PTF lag_test.py indicating which LAG has member port shutdown on DUT.
  • lag_test.py will send packets to all other LAGs (except lag-down-index) and validate
    • packets arrive on those LAGs
    • packets are spread evenly between LAGs
    • packets are spread evenly between port-members of the LAGs.
  1. Bring back 'up' the LAG port on DUT and verify that the LAG interface is back to normal.
  • ansible will call ifconfig to bring port 'up' on DUT.
  • ansible will invoke interface_facts.py to validate corresponding LAG on DUT changed to UP.
  • PTF test sends packets for routes which have given LAG as their ECMP member and validate packets are arriving.
Shutdown each LAG interface on fanout and VM.

To simulate the real production environment we'll shutdown the EOS interface which simulates the halt of sending LACP packets on one of the LAG members; then shutdown the fanout switch interface simulates the carrier down events. Combine them together we can better simulate the scenario in the production that the neighbor device ‘shutdown’ command.

Following steps will be performed for each LAG port:

######Shutdown LAG from fantout swtich and VMs and validate no traffic flows.

  1. Shutdown the EOS interface
    • [clarify][how] lag.yml runs commands on VM to shutdown port.
      • [clarify] What is mapping betweetn LAG-port on DUT and EOS (VM)? Which port(s) to shutdown on VM for given LAG port on DUT?
  2. Shutdown the fanout switch interface
    • [clarify][how] lag.yml runs commands on fanout switch to shutdown port.
      • [clarify] What is mapping betweetn LAG-port on DUT and port(s) on fanout switch?
  3. Verify the LAG port is down on DUT
    • lag.yml invokes interface_facts.py to check LAG went down on DUT.
  4. Verify that the traffic will be distributed to the rest of the 'up' LAGs
    • PTF test will run traffic, count received packets and perform comparisons on counters to check traffic evenly distributed among LAGs which are 'up'

######Bring the LAG 'up' again and validate the traffic flows through it.

  1. Bring 'up' the fanout switch interface
    • [clarify][how] lag.yml runs commands on fanout switch to bring 'up' the port.
  2. Bring 'up' the VM interface
    • [clarify][how] lag.yml runs commands on VM to bring the port up.
  3. Verify LAG interface is 'up' on DUT
    • lag.yml invokes interface_facts.py to check LAG went down on DUT.
  4. Verify traffic flows evenly through all LAGs, including the LAG which was made 'up'
    • PTF test will run traffic, count received packets and perform comparisons on counters to check traffic evenly distributed among LAGs which are 'up'

Test case #3 - LACP verification

Test objective

Verify LACP packet slow mode.

In Slow mode, the rate is - 1 packet/30 seconds.

####Test description

The purpose of this test is to make sure that the LACP rate is correctly negotiated and set on both ends of the LAG.

Following are pre-conditions for the test case:

  • The DUT switch is always started with the LACP rate set to 'slow'.
  • The VMs are always set with rate 'fast' on startup.
  • After startup all VMs will negotiate LACP rate with DUT and set their own rate to 'slow'.

The validation will be implemented as Ansible instructions in lag.yml, without invoking PTF:

  1. Ansible playbook connects to each VM.
  2. Ansible playbook validates that each VM has LACP rate set to 'slow'.
  3. Ansible connects to DUT and validates LACP rate is set to 'slow'.

[workitem] Clarify what are the commands to access the VMs and check for LACP rate setting.

Open Questions

Tests affected by LAG testbed setup

FIB Test

FIB test needs modification to be able to run in LAG test bed setup. [LG] the test should be agnostic to the setup. the test exptect route info so not clear to me why the test need to be changed. Please elaborate. [answer] In LAG testbed the route information will contain LAG-port indices inside ECMP group. However PTF is only aware of physical ports. we need to introduce mapping of physical-port-to-LAG port to validate packets properly. Fib test needs to change to use this mapping. Another option is to have 2 different versions of fib.j2 - one for fib testbed, another for LAG testbed.

It's required to run in this setup, so we could validate the BGP routes setup.

Other tests

TBD

FAQ

Investigating issues in VM

To ssh into VM peers:

  1. To find out peers, run on DUT
  • docker exec -it bgp bash
  • vtysh -c 'sh ip bgp summ' - will print IPs of peers
  1. ssh into the needed peer IP with credentials.
Clone this wiki locally