Skip to content

Latest commit

 

History

History
216 lines (136 loc) · 12.7 KB

pmon_multiasic_design.md

File metadata and controls

216 lines (136 loc) · 12.7 KB

Platform Monitor design for Multi-Asic platforms

In the Multi-asic architecture, there is a global database docker container running on the linux host and multiple database docker containers running in linux network namespaces. The DB instances in the redis server running in global database docker container will have the system wide config attributes like syslog, AAA etc., while those DB instances in the namespace database docker containers will have ASIC resources related data.

Design Approach

For multi-asic platforms, the Platform monitor service will continue to be a single instance docker running in Linux host.

Following are the tables used/updated by various daemons in PMON docker,

  • PSU daemon: CHASSIS_INFO , PSU_INFO tables in state_db
  • Syseepromd daemon: EEPROM_INFO in state_db
  • Thermalctld daemon: FAN_INFO, TEMPERATURE_INFO in state_db
  • Ledd daemon: PORT_TABLE in app_db
  • Xcvrd daemon: TRANSCEIVER_INFO , TRANSCEIVER_DOM_SENSOR , TRANSCEIVER_STATUS in state_db. PORT_TABLE in app_db

The design approach taken is as given below,

  • The interface related platform tables like TRANSCEIVER_INFO , TRANSCEIVER_STATUS etc. will be stored in the STATE_DB instance of Asic database (the database docker running in asic network namespace)

  • In the multi-asic design there are namespaces created for front-end and back-end ASIC's. Since PMON is interested only in the front-panel interfaces (not in the backplane interfaces between ASIC's) it works with the namespaces relating to front-end ASIC's.

  • The system wide platform tables like PSU_INFO, FAN_INFO, EEPROM_INFO etc. will be kept in the STATE_DB instance of Global database (the database docker running in linux host)

  • The platform plugins are generally agnostic of whether it is single-asic or multi-asic platform. It interacts with the platform drivers in the linux kernel, sys/proc filesystems.(In Mellanox platforms this is different as the events like transceiver plug in/out events are exposed by mlnx SDK which reside in syncd container. With multi-asic there is syncd per namespace, it would need a change in plugins for Mellanox multi-asic platforms)

  • In the multi-asic platform there are port_config.ini files per ASIC. They will be present in the directories named with the asic_index under the device/platform/hwsku directory. These files are parsed to create the interface to asic_id mapping.

Design changes for multi-asic Architecture

This section will detail more on the changes planned to various platform classes and daemons

DaemonBase

  • Introduce namespace parameter to db_connect to connect to DB in a namespace
  • Additional API's needed for the following,
    • Check if it is multi-asic platform
    • Get the number of asic's in the device.
    • Get the namespaces mapped to front-end ASIC's in the device.

SfpUtilHelper/ SfpUtilBase

This platform_sfputil class handles parsing the port_config.ini file to create the port list. For multi-asic platform support, the functionality here needs to be extended to

  • Parse multiple port_config.ini files and create a single port list irrespective of which ASIC/namespace they belong to.
  • Only the front-panel interfaces will be added to the port list
  • Introduce a new map struct (while parsing each port_config.ini file) the store the interface to asic_id relation. This will be helpful to get the asic_index directly from the interface name.

Platform monitor daemons

This section details the changes that would be needed in various PMON daemons for multi-asic architecture.

Psud

The power supply unit daemon connects to STATE_DB and updates the following tables viz. CHASSIS_INFO and PSU_INFO. In multi-asic architecture, this daemon continues to update the STATE_DB which is present in the global database instance running in the host. No change needed here.

Syseepromd:

Syseepromd connects to STATE_DB and updates the EEPROM_INFO table. In multi-asic architecture, this daemon continues to update the STATE_DB which is present in the global database instance running in the host. No change needed here

Thermalctld:

Thermalctld connects to STATE_DB and updates the following tables viz. FAN_INFO and TEMPERATURE_INFO. In multi-asic architecture, this daemon continues to update the STATE_DB which is present in the global database instance running in the host. No change needed here

Ledd:

LED daemon which updates the port LED based on the port state change events from PORT_TABLE, currently does the following

  • connect to the APPL_DB
  • subscribe to port state change events from PORT_TABLE
  • Call the platform plugin API's to update the port LED in the device.
      # Open a handle to the Application database
      # Subscribe to PORT table notifications in the Application DB
      ……
      while True:
         (state, c) = sel.select(SELECT_TIMEOUT)
         (key, op, fvp) = sst.pop()
         ……
         led_control.port_link_state_change(key, fvp_dict["oper_status"])

The design changes for multi-asic platform would be to subscribe for port state change events from PORT_TABLE present in the APP_DB instance present in different namespaces.

  • Connect to APPL_DB's in all front-end namespaces and get the db_connectors.
  • Create multiple subscribers, one per namespace for PORT_TABLE in APP_DB and add to the select object. With this it should get the port state change events from the PORT_TABLE in each namespace.
     for namespace in namespaces:
         # Open a handle to the Application database, in all namespaces
         appl_db[namespace] = daemon_base.db_connect("APPL_DB", namespace=namespace)
         sst[namespace] = swsscommon.SubscriberStateTable(appl_db[namespace], ..)
         sel.addSelectable(sst[namespace])
  • In the select while(true) loop, I check if there data to be retrieved from any of the selectable objects which we created earlier per namespace and pop it out. If the event is for a Backplane interface skip processing it.
   while True:
     (state, c) = sel.select(SELECT_TIMEOUT)
     # Get the namespace from the selectable object and use it to index the SubscriberStateTable handle.
     ns=c.getDbNamespace()
     (key, op, fvp) = sst[ns].pop()	            
     ……
     led_control.port_link_state_change(key, fvp_dict["oper_status"])

Alternative approach for Ledd daemon:

In Ledd process we could spawn multiple threads, one thread per ASIC to handle events for interfaces which it owns. Each thread would subscribe for the events from PORT_TABLE in the APP_DB of the namespace mapped to the ASIC. Didn't opt this as this would result in more threads depending on the number of ASIC's.

Xcvrd:

Xcvrd currently will spawn two threads

  1. a thread to wait for the SFP plug in/out event, when event received, it will update the DB entries accordingly.
  2. A timer will be started to periodically refresh the DOM sensor information.

These threads updates the transceiver info in various tables viz. TRANSCEIVER_INFO, TRANSCEIVER_DOM_SENSOR, TRANSCEIVER_STATUS.

In the multi-asic architecture since the interfaces are spread among databases in different namespaces, this daemon needs intelligence to update the STATE_DB instance in the correct namespace.

Approach 1

In this approach

  • Continue the current Xcvrd process architecture (1 xcvrd daemon which is the parent process + 2 threads to update the sfp, dom status).
  • The Xcvrd daemon will wait for the "PortInitDone" in the PORT_TABLE in the APP_DB of all the front-panel namespaces.
  • The xcvr thread would poll/monitor platform plugins and update the transceiver tables in the STATE_DB instance in the respective namespace where the interface belongs.s

The changes with this approach will be

  1. Use SfpUtil helper routines to parse multiple port_config.ini files. This would have created the interface to asic_id mapping for the front-panel interfaces.
  2. Connect to the APP_DB and STATE_DB in various namespaces, store the db_connector/table handle objects indexed by the "asic id".
  3. Use the interface to asic_id mapping which is populated by the platform_sfputil to fetch the correct db_connector/table handle created in step 2.
  4. In the Xcvrd API's pass the correct db_connector/table handle based on the interface_name.
def post_port_sfp_info_to_db(logical_port_name, table, ..) 
def post_port_dom_threshold_info_to_db(logical_port_name, table,..)
def post_port_dom_info_to_db(logical_port_name, table,..)
def update_port_transceiver_status_table(logical_port_name, status_tbl,..)
def delete_port_from_status_table(logical_port_name, status_tbl)
def detect_port_in_error_status(logical_port_name, status_tbl)

Approach 2

In this approach we could spawn multiple XCVR threads per asic/namespace.

  1. Parse the port_config.ini files for the asic using sfphelper util routines, in the respective threads and create the port_list.
  2. Currently Xcvrd has a main daemon process + 2 additional python process/thread for SFP-monitor, DOM-update tasks. Here in this approach we have to spawn these 3 threads/processes for all front-end namespaces.
  3. Pass the "asic_id" while creating thread as an argument or as the thread class attribute. This attribute would be used in each thread to to connect to the DB instance in that namespace.

Pros and Cons

Here are the pros and cons of both the approaches,

  • Approach1 will take less cpu/memory compared to Approch2 as we don't spawn more threads. Using the ASIC count as the constraint to spawn threads might not scale with more asic's.
  • Approach2 might have less code change in the xcvrd daemon as we can keep the same code in most places and pass a different asic_id while creating the process/thread. Each thread/process talks only to the DB's of a particular namespace.

Conclusion

Approach 1 above was taken as there is no real need of increasing the number of python threads since we have only at max 64 interfaces currently.The problem we are trying to solve is to post the data into DB's in different namespaces which can be easily achieved with interface to asic_id mapping table.

Namespace support in swss-common DBConnector

The swss-common::DBconnector class needs to be enhanced to use the "namespace" context information to connect to the DB instance in that namespace.

The DB connector classes will have capability to parse the new "database_global.json" file and retrieve the namespaces present in the platform and the database instances in each namespace. Please refer multi_namespace_db_instances design document for more details.

Config/Show commands changes

There are configuration commands below will have additional argument "-n namespace" if the user needs to specify the interface namespace – else if not given, it will derive the namespace from the interface name.

  • "config interface transceiver lpmode"
  • "config interface reset"

The sfputil scripts needs to be updated to parse multiple port_config.ini files for multi-asic.

The show commands for displaying the transceiver info viz. "show interfaces transceiver" also needs update to fetch data from DB's in different namespaces.

SNMP changes

The snmp agent needs to connect to the STATE_DB in different namespaces depending on the interface and get the transceiver information.

Open questions

  1. In the case of chassis based platforms, the interfaces are fine, will be in the DB's in respective namespaces – but what about the system wide tables like PSU, EEPROM, TEMPERATURE control etc will it be in the global DB in the linecard or in the supervisor ?
  • The PSU tables will be in the supervisor card while that of EEPROM and TEMPERATURE CONTROL would be in both the supervisor/linecard DB's. There could be different ways to sync the linecard DB data with the Chassis DB in supervisor.
  1. Which daemon in Sonic takes care of the system/unit LED's?