Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM support, Sfputil not showing qsfp devices after powercycle bugfix #1338

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Balaselvi
Copy link
Contributor

Added the OOM (Open optical monitoring) support. Fixed a bug on D7054 device, where qsfp devices are not listed by the sfputil or the oom after a powercycle.

Added the optoe driver and changed the init script accordingly. Added bugfix in inventec swps driver.

The oom git can be cloned on the device and the results can be seen both on D7032/D7054 device. After powercycle both sfputil/oom will show the qsfp and the sfp devices on the D7054 device.

- A picture of a cute animal (not mandatory but encouraged)

@lguohan
Copy link
Collaborator

lguohan commented Jan 26, 2018

We are moving to kernel 4.9. We have ported sff_8436 driver to 4.9 kernel. However, this oom driver does not 4.9 kernel version. How do you plan to support 4.9 then?

@Balaselvi
Copy link
Contributor Author

Hello Guohan,

May I know the schedule for porting to 4.9.
We will plan the porting discussing with the OOM team.
Basically the optoe support sfp's as well as it reads all pages, that is the reason we plan to use optoe.

Thanks and regards,
Bala

@lguohan
Copy link
Collaborator

lguohan commented Jan 27, 2018

I think this one supports ftp too and read all pages.

sonic-net/sonic-linux-kernel#20

@Balaselvi
Copy link
Contributor Author

Hello Guohan,

In that case, I will remove the optoe driver and check oom part with sff driver.
I was seeing only 640 bytes of eeprom with sff, where as 32896 bytes incase of optoe, so I guessed all pages were not supported yet in sff.
I will check again.
If all pages are supported then, I will edit the patch for port mapping to support oom and swps driver bugfix and resubmit again.

Thanks and regards,
Bala

@donboll
Copy link

donboll commented Jan 31, 2018

As the author and supporter of opote, I have left a comment in pull request #1327 (#1327), which is a request from accton/edgecore to submit optoe to Sonic. Both requests raise the same issues, I suggest we discuss them at 1327.

Don Bollinger

@lguohan
Copy link
Collaborator

lguohan commented Feb 8, 2018

@Balaselvi , based on yesterday discussion, we plan to wait donboll to submit optoe patch on sonic-linux-kernel and let you adapt use the optoe driver in the sonic-linux-kernel. let me know if you have any concern on this?

@lguohan
Copy link
Collaborator

lguohan commented Feb 12, 2018

@Balaselvi , optoe driver has been merged into latest sonic kernel. can you update the PR to use that?

sonic-net/sonic-linux-kernel@3525f35

@lguohan
Copy link
Collaborator

lguohan commented Aug 12, 2018

@Balaselvi , optoe driver has been merged into latest sonic kernel. can you update the PR to use that?

qiluo-msft pushed a commit that referenced this pull request Dec 13, 2020
**- Why I did it**
To support dynamic buffer calculation.
This PR also depends on the following PRs for sub modules
- [sonic-swss: [buffermgr/bufferorch] Support dynamic buffer calculation #1338](sonic-net/sonic-swss#1338)
- [sonic-swss-common: Dynamic buffer calculation #361](sonic-net/sonic-swss-common#361)
- [sonic-utilities: Support dynamic buffer calculation #973](sonic-net/sonic-utilities#973)

**- How I did it**
1. Introduce field `buffer_model` in `DEVICE_METADATA|localhost` to represent which buffer model is running in the system currently:
    - `dynamic` for the dynamic buffer calculation model
    - `traditional` for the traditional model in which the `pg_profile_lookup.ini` is used
2. Add the tables required for the feature:
   - ASIC_TABLE in platform/\<vendor\>/asic_table.j2
   - PERIPHERAL_TABLE in platform/\<vendor\>/peripheral_table.j2
   - PORT_PERIPHERAL_TABLE on a per-platform basis in device/\<vendor\>/\<platform\>/port_peripheral_config.j2 for each platform with gearbox installed.
   - DEFAULT_LOSSLESS_BUFFER_PARAMETER and LOSSLESS_TRAFFIC_PATTERN in files/build_templates/buffers_config.j2
   - Add lossless PGs (3-4) for each port in files/build_templates/buffers_config.j2
3. Copy the newly introduced j2 files into the image and rendering them when the system starts
4. Update the CLI options for buffermgrd so that it can start with dynamic mode
5. Fetches the ASIC vendor name in orchagent:
   - fetch the vendor name when creates the docker and pass it as a docker environment variable
   - `buffermgrd` can use this passed-in variable
6. Clear buffer related tables from STATE_DB when swss docker starts
7. Update the src/sonic-config-engine/tests/sample_output/buffers-dell6100.json according to the buffer_config.j2
8. Remove buffer pool sizes for ingress pools and egress_lossy_pool
   Update the buffer settings for dynamic buffer calculation
stephenxs added a commit to stephenxs/sonic-buildimage that referenced this pull request Dec 16, 2020
sonic-swss:
[Dynamic buffer calc]  Support dynamic buffer calculation (sonic-net#1338)
[dvs] Clean-up dvs_database and dvs_common (sonic-net#1541)
[VxlanMgr] changes for EVPN VXLAN (sonic-net#1266)
Statistics support for Tx and Rx counters of different frame sizes (sonic-net#1536)
[orchagent/phy]: Add firmware info propagation (sonic-net#1540)
[vxlanorch] Use PRI instead of %l to avoid warnings in 32-bit arch (sonic-net#1539)
[FDBSYNCD] Added support for EVPN as described in the PR sonic-net/SONiC#437 (sonic-net#1276)
[everflow] Add retry mechanism for mirror sessions and policers (sonic-net#1486)
Enable ACL table type  mirror_v6 for Innovium Platform (sonic-net#1527)
[fgnhgorch] Change format specifier %lu to %zu for size_t (sonic-net#1529)
[dvs] Fix issue where concurrent netns operations cause test setup to fail (sonic-net#1535)
Add support for headroom pool watermark (sonic-net#1453)
Change gAsicInstance to type string with max length limit (sonic-net#1526)

sonic-utilities:
[Dynamic buffer calc] Support dynamic buffer calculation (sonic-net#973)
show tech with platform dump option (sonic-net#1158)
[kdump]: Parse sonic_platform kernel command line parameter to read the platform identifier string (sonic-net#1291)
[pcieutil] Remove 'pcie-' prefix from arguments (sonic-net#1297)
Added 'detailed' option for 'show interface counters' command (sonic-net#1299)
Fix show ip route summary on pizzabox platforms (sonic-net#1302)
[acl_loader] Fix default DENY rule for V6 dataplane ACLs (sonic-net#1281)
Add show and clear commands for headroom pool watermark  (sonic-net#1144)
[unit test][CLI][pfcwd] Added pfcwd config tests for single and multi ASIC platform. (sonic-net#1248)
[sflow] Fix traceback seen for show sflow interface (sonic-net#1282)
[config/console][consutil] Support enable/disable console switch (sonic-net#1275)
[fast-reboot] Fix fast-reboot when NDP entries are present (sonic-net#1295)
Fast-reboot: add a new flag to ignore ASIC config checksum verification failures (sonic-net#1292)
Kdump improvements (sonic-net#1284)

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Sabareesh-Kumar-Anandan pushed a commit to Sabareesh-Kumar-Anandan/sonic-buildimage that referenced this pull request Dec 20, 2020
[crm]: Typecast to unit64_t to avoid divide by 0 during overflow (sonic-net#1550)
[vxlanmgr] Fix build error when compiling for armhf (32-bit) (sonic-net#1552)
[Dynamic buffer calc]  Support dynamic buffer calculation (sonic-net#1338)
[dvs] Clean-up dvs_database and dvs_common (sonic-net#1541)
[VxlanMgr] changes for EVPN VXLAN (sonic-net#1266)
Statistics support for Tx and Rx counters of different frame sizes (sonic-net#1536)
[orchagent/phy]: Add firmware info propagation (sonic-net#1540)

Signed-off-by: Sabareesh Kumar Anandan <sanandan@marvell.com>
jleveque added a commit that referenced this pull request Dec 24, 2020
* src/sonic-swss c7ee75f...cadf28f (24):
  > Revert "Add support for headroom pool watermark (#1453)"
  > [VxlanOrch] pytest for EVPN VXLAN (#1318)
  > [restore_neighbors] python3 support for restore_neighbors.py (#1542)
  > [buffermgmt] more build error fixes when compiling for armhf (32-bit) (#1559)
  > Sflow fix to avoid NULL in field. (#1531)
  > [fgnhgorch] Fg Nhg link handling (#1537)
  > [dpb]: make sure port is in admin down state before remove port. (#1513)
  > [FPMSYNCD/FDBSYNCD] EVPN Type-5 route removing prefix-len for host route and removing junk character present in the mac (#1553)
  > Added support for EVPN L3 VXLAN as described in the PR sonic-net/SONiC#437 (#1267)
  > [crm]: Typecast to unit64_t to avoid divide by 0 during overflow (#1550)
  > [vxlanmgr] Fix build error when compiling for armhf (32-bit) (#1552)
  > [Dynamic buffer calc]  Support dynamic buffer calculation (#1338)
  > [dvs] Clean-up dvs_database and dvs_common (#1541)
  > [VxlanMgr] changes for EVPN VXLAN (#1266)
  > Statistics support for Tx and Rx counters of different frame sizes (#1536)
  > [orchagent/phy]: Add firmware info propagation (#1540)
  > [vxlanorch] Use PRI instead of %l to avoid warnings in 32-bit arch (#1539)
  > [FDBSYNCD] Added support for EVPN as described in the PR sonic-net/SONiC#437 (#1276)
  > [everflow] Add retry mechanism for mirror sessions and policers (#1486)
  > Enable ACL table type  mirror_v6 for Innovium Platform (#1527)
  > [fgnhgorch] Change format specifier %lu to %zu for size_t (#1529)
  > [dvs] Fix issue where concurrent netns operations cause test setup to fail (#1535)
  > Add support for headroom pool watermark (#1453)
  > Change gAsicInstance to type string with max length limit (#1526)
lguohan pushed a commit that referenced this pull request Jan 7, 2021
…xpected queue causing _brcm_sai_switch_assert () after warm reboot (#6374)

Starting from build (master) 176 the warm reboot on BRCM Platform started to experience syncd crash. Upon further debug by Ying it was determined that the crash was related to the following new change:
[Dynamic buffer calc] Support dynamic buffer calculation (#1338)

Ying also debugged further and found The crash was caused by buffer pool profile setting operation SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH

A case has filed with BRCM while a potential fix was tried by Ying that seems to have addressed this issue and we are making this change available in master branch so that it will allow further feature validation/testing especially in the warm reboot area.
Once an official fix is provided by BRCM, we will then remove this in house fix and apply the official fix.

- How to verify it
Just perform warm reboot with any master code 175 or above you should see this issue or issue the following cmd will also cause the crash: "mmuconfig -p egress_lossy_profile -a 0"
yxieca added a commit to yxieca/sonic-buildimage that referenced this pull request Jan 8, 2021
- (HEAD, github/master) [storyteller] adding a grep wrapper with predefined scenarios (sonic-net#1349)
- Adding global-timeout, individual command timeout, log files collection (sonic-net#1249)
- Add FW dump with new SAI implementation (sonic-net#1338)
- [unit test][pfcwd] Fix tests that require sudo access (sonic-net#1340)

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
lguohan pushed a commit that referenced this pull request Jan 9, 2021
…xpected queue causing _brcm_sai_switch_assert () after warm reboot (#6374)

Starting from build (master) 176 the warm reboot on BRCM Platform started to experience syncd crash. Upon further debug by Ying it was determined that the crash was related to the following new change:
[Dynamic buffer calc] Support dynamic buffer calculation (#1338)

Ying also debugged further and found The crash was caused by buffer pool profile setting operation SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH

A case has filed with BRCM while a potential fix was tried by Ying that seems to have addressed this issue and we are making this change available in master branch so that it will allow further feature validation/testing especially in the warm reboot area.
Once an official fix is provided by BRCM, we will then remove this in house fix and apply the official fix.

- How to verify it
Just perform warm reboot with any master code 175 or above you should see this issue or issue the following cmd will also cause the crash: "mmuconfig -p egress_lossy_profile -a 0"
yxieca added a commit that referenced this pull request Jan 11, 2021
- (HEAD, github/master) [storyteller] adding a grep wrapper with predefined scenarios (#1349)
- Adding global-timeout, individual command timeout, log files collection (#1249)
- Add FW dump with new SAI implementation (#1338)
- [unit test][pfcwd] Fix tests that require sudo access (#1340)

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
daall pushed a commit that referenced this pull request Jan 11, 2021
- (HEAD, github/master) [storyteller] adding a grep wrapper with predefined scenarios (#1349)
- Adding global-timeout, individual command timeout, log files collection (#1249)
- Add FW dump with new SAI implementation (#1338)
- [unit test][pfcwd] Fix tests that require sudo access (#1340)

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
lguohan pushed a commit that referenced this pull request Feb 12, 2021
sonic-utilities 28d358f...f5b8a1e (22):
> Fix deprecation warnings (#1423)
> Fix: initialize SonicDBConfig differently for single or multi_asic (continued) (#1417)
> [multi-asic] show ip interface changes for multi asic (#1396)
> [show mux]: Sort output by intf name (#1418)
> [ci] Test and build package using Azure Pipelines (#1406)
> [GitHub] Tweak PR and issue templates (#1419)
> Import 'mock' module from 'unittest' library (#1415)
> Revert "Add FW dump with new SAI implementation (#1338)" (#1407)
> [config reload]: Restart macsec container (#1410)
> [pcieutil] Remove the warning message and change the config file location (#1362)
> Fix: initialize SonicDBConfig differently for single or multi_asic (#1409)
> Support shared headroom pool on top of dynamic buffer calculation (#1348)
> Fix unsupported fs.squashfs extraction in sonic-installer (#1366)
> [show] Use proper variable to avoid exception in natshow script (#1383)
> Set up CI with Azure Pipelines
> [config reload]: Restart mux container (#1401)
> Advertise ipv6 link local address (#1402)
> [storyteller] Enhance the storyteller utility (#1400)
> [show] Fix int status when portchannel is in the system (#1376)
> [config][show] cli support for retrieving ber, eye-info and configuring prbs, loopback on Y-cable  (#1386)
> Skip route check for tun0 interfaces (#1399)
> do not parse stderr to get correct routing stack (#1398)
> [storyteller] allow storyteller to work on downloaded logs (#1388)
> [vrf]: Fix freezing during interface binding (#1325)
> Use SonicV2Connector/ConfigDBConnector/SonicDBConfig from swsscommon instread of swsssdk (#1392)

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
stepanblyschak pushed a commit to stepanblyschak/sonic-buildimage that referenced this pull request May 10, 2021
Remove mst dump

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
stepanblyschak pushed a commit to stepanblyschak/sonic-buildimage that referenced this pull request May 10, 2021
…nic-net#1407)

This reverts commit b10622e.

**What I did**
revert changes to call sdkdump and replace with old call to mstdump

**How I did it**
reverting a previous commit [Mellanox] Add FW dump with new SAI implementation and remove mst dump sonic-net#1338

**How to verify it**
run techsupport
theasianpianist pushed a commit to theasianpianist/sonic-buildimage that referenced this pull request Feb 5, 2022
)

**What I did**

***Support dynamic buffer calculation***

1. Extend the CLI options for buffermgrd:
   - -a: asic_table provided,
   - -p: peripheral_table provided

   The `buffermgrd` will start the dynamic headroom calculation mode with -a provided.
   Otherwise, it will start the legacy mode (pg_headroom_profile looking up)
2. A new class is provided for dynamic buffer calculation while the old one remains.
   The daemon will instantiate the corresponding class according to the CLI option when it starts.
3. In both modes, the `buffermgrd` will copy BUFFER_XXX tables from CONFIG_DB to APPL_DB and the `bufferorch` will consume BUFFER_XXX tables from APPL_DB

***Backward compatibility***
For legacy mode, the backward compatibility is provided. As mentioned above, `buffermgrd` will check whether the json file representing the `ASIC_TABLE` exists when it starts.
- If yes it will start the dynamic buffer calculating mode
- Otherwise, it will start the compatible mode which is the old looking up mode in the new code committed in this PR.
This logic is in `cfgmgr/buffermgrd.cpp`.

The logic of buffer handling in `buffermgrd` isn't changed in the legacy mode. The differences are:
- in legacy mode which is the old code, there isn't any buffer related table in `APPL_DB`. All tables are in `CONFIG_DB`.
  - `buffermgrd` listens to `PORT` and `CABLE_LENGTH` tables in `CONFIG_DB` and inserts the buffer profiles into `BUFFER_PROFILE` table.
  - `bufferorch` listens to buffer related tables in `CONFIG_DB` and call SAI API correspondingly.
- In the compatible mode, `buffermgrd` listens to tables in `CONFIG_DB` and copies them into `APPL_DB` 
  - `buffermgrd`
    - listens to `PORT` and `CABLE_LENGTH` tables in `CONFIG_DB` and inserts the buffer profiles into `BUFFER_PROFILE` table in `CONFIG_DB` (not changed)
    - listens to buffer related tables in `CONFIG_DB` and copies them into `APPL_DB`
  - `bufferorch` listens to `APPL_DB` and call SAI API correspondingly. (the difference is the db it listens to).
  - `db_migrator` is responsible to copy the buffer related tables from `CONFIG_DB` to `APPL_DB` when system is warmbooted from the old image to the new image for the first time.

The compatible code is in `cfgmgr/buffermgr.cpp`, `orchagent/bufferorch.cpp` and `db_migrator` (in the [sonic-utilities PR](sonic-net/sonic-utilities#973)).

**Why I did it**

**How I verified it**

1. vs test
2. regression test [PR: [Dynamic buffer calc] Test cases for dynamic buffer calculation](sonic-net/sonic-mgmt#1971)

**Dynamic buffer details**

1. In the dynamic buffer calculation mode, there are 3 lua plugins are provided for vendor-specific operations:
   - buffer_headroom_<vendor>.lua, for calculating headroom size.
   - buffer_pool_<vendor>.lua, for calculating buffer pool size.
   - buffer_check_headroom_<vendor>.lua, for checking whether headroom exceeds the limit
2. During initialization, The daemon will:
   - load asic_table and peripheral_table from the given json file, parse them and push them into STATE_DB.ASIC_TABLE and STATE_DB.PERIPHERAL_TABLE respectively
   - load all plugins
   - try to load the STATE_DB.BUFFER_MAX_PARAM.mmu_size which is used for updating buffer pool size
   - a timer will be started for periodic buffer pool size audit
3. The daemon will listen to and handle the following tables from CONFIG_DB
   The tables will be cached internally in the daemon for the purpose of saving access time
   - BUFFER_POOL:
     - if the size is provided: insert the entry to APPL_DB
     - otherwise: cache them and push to APPL_DB after the size is calculated by lua plugin
   - BUFFER_PROFILE and BUFFER_PG:
     - items for ingress lossless headroom need to be cached and handled (according to the design)
     - other items will be inserted to the APPL_DB directly
   - PORT_TABLE, for ports' speed and MTU update
   - CABLE_LENGTH, for ports' cable length
4. Other tables will be copied to APPL_DB directly:
   - BUFFER_QUEUE
   - BUFFER_PORT_INGRESS_PROFILE_LIST
   - BUFFER_PORT_EGRESS_PROFILE_LIST
5. BufferOrch modified accordingly:
   - Consume buffer relevant tables from APPL_DB instead of CONFIG_DB
   - For BUFFER_POOL, don't set ingress/egress and static/dynamic to sai if the pool has already existed because they are create-only
   - For BUFFER_PROFILE, don't set pool for the same reason
6. Warm reboot:
   - db_migrator is responsible for copying the data from CONFIG_DB to APPL_DB if the switch is warm-rebooted from an old image to the new image for the first time
   - no specific handling in the daemon side
7. Provide vstest script
mssonicbld added a commit that referenced this pull request Jan 13, 2024
…tically (#17774)

#### Why I did it
src/sonic-sairedis
```
* 4f4c6d1 - (HEAD -> master, origin/master, origin/HEAD) Fix code coverage and ASAN not being enabled (#1338) (9 hours ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
mssonicbld added a commit that referenced this pull request Feb 15, 2024
…tically (#18083)

#### Why I did it
src/sonic-sairedis
```
* 23481f0 - (HEAD -> 202311, origin/202311) Skip FABRIC PORT Attributes from sairedis logging (#1339) (2 days ago) [saksarav-nokia]
* 682e860 - Revert "add if statement for module control mode support" (#1341) (4 days ago) [dbarashinvd]
* 3621a18 - SAI submodule update to pick the sai-thrift support added to read VOQ counters (#1332) (4 days ago) [saksarav-nokia]
* 52cd15b - Fix code coverage and ASAN not being enabled (#1338) (5 days ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants