Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] - Check memory calibration status before running host_exerciser #3094

Merged
merged 6 commits into from
Feb 5, 2024

Conversation

anandhv
Copy link
Contributor

@anandhv anandhv commented Feb 1, 2024

Description

This change addresses this bug/case.

The bug is that the host_exerciser utility (which pushes traffic between the host and FPGA-attached DDR) would fail non-gracefully if the DDR calibration had failed. Fortunately the calibration status is reported by the dfl-emif driver via sysfs entries (one per mem channel). This change simply errors-out gracefully, with a useful error message, after reading the sysfs entries.

The host_exerciser utility is built on top of the afu_test framework. The corresponding AFU is exposed as a PCIe Virtual Function (VF). As described in the docs, the user must first bind the VF endpoint to the vfio-pci driver. This seems to also result in OPAE using the opae-v (VFIO-specific) plugin rather than the xfpga plugin. The opae-v plugin only exposes a subset of the OPAE-API and notably omits functions related to sysobjects (which are used to access sysfs entries). Therefore the afu_test framework has also been updated to grab a handle to the FPGA_DEVICE (i.e. the FIM) so that we can access sysobjects.

I've noted in the code comments that the change is a bit kludgy, owing to the fact that we don't know how many mem channels are present on the given board and therefore don't know how many sysfs entries we need to poll. While xfpga's sysobject implementation supports wildcarded searches for sysfs entries (via glob) and can return arrays of such objects, it oddly, specifically, disables this behaviour if the glob string uses a recursive wildcard ("/**/"). I think such a wildcard is necessary since we shouldn't assume a fixed sysfs folder hierarchy. So future work is to remove this, likely, unnecessary restriction in the sysobject implementation. I just don't have time for this given the release deadline is next week.

Collateral (docs, reports, design examples, case IDs):

Tests added:

Tests run:

  • Manually tested on N6001. I don't know how to force a calibration failure in the hardware so I tested by simply flipping the polarity of the sysfs-read() result and ensuring the program fails gracefully.

@anandhv anandhv changed the title DRAFT bug fix [Fix] - Check memory calibration status before running host_exerciser Feb 1, 2024
@anandhv anandhv marked this pull request as ready for review February 2, 2024 22:09
@anandhv anandhv requested review from a team as code owners February 2, 2024 22:09
pcolberg
pcolberg previously approved these changes Feb 2, 2024
Copy link
Contributor

@pcolberg pcolberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @anandhv, I appreciate your in-depth commit message! One minor nit below.

Comment on lines 766 to 769
if (testobj) { // if !=null, the sysfs entry was found
// Error out if calibration has failed
if (testobj->read64(0)) { // Non-zero value (typically '1') means
// calibration has failed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These may be combined to avoid one level of indentation:

Suggested change
if (testobj) { // if !=null, the sysfs entry was found
// Error out if calibration has failed
if (testobj->read64(0)) { // Non-zero value (typically '1') means
// calibration has failed
if (testobj && testobj->read64(0)) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Peter, I have made the change.

@anandhv anandhv merged commit 44f9f53 into OFS:master Feb 5, 2024
19 checks passed
@anandhv anandhv deleted the host_ex_mem_cal branch February 5, 2024 17:06
anandaravuri added a commit that referenced this pull request Feb 15, 2024
…ties to enumerate afu

 Set PCIe segment, bus, and device properties to enumerate FPGA DEVICE (FME)
 Set PCIe segment, bus, device, and function properties to enumerate FPGA ACCELERATOR

 hssi hssi_10g works fine,
 hssi—pci-address b1:00.6 hssi_10g fails if the user provides pci-address cmd args due to PR change. #3094

 Set the HSSI/VF PCIe segment, bus, device, and function attributes to try to enumerate FPGA DEVICE (FME) triggers failure, to enumerate FPGA DEVICE set seegment, bus, device attributes to OPAE filter.

Signed-off-by: anandaravuri <ananda.ravuri@intel.com>
anandaravuri added a commit that referenced this pull request Feb 15, 2024
…ties to enumerate afu (#3103)

Set PCIe segment, bus, and device properties to enumerate FPGA DEVICE (FME)
 Set PCIe segment, bus, device, and function properties to enumerate FPGA ACCELERATOR

 hssi hssi_10g works fine,
 hssi—pci-address b1:00.6 hssi_10g fails if the user provides pci-address cmd args due to PR change. #3094

 Set the HSSI/VF PCIe segment, bus, device, and function attributes to try to enumerate FPGA DEVICE (FME) triggers failure, to enumerate FPGA DEVICE set seegment, bus, device attributes to OPAE filter.

Signed-off-by: anandaravuri <ananda.ravuri@intel.com>
anandaravuri added a commit that referenced this pull request Feb 15, 2024
…ties to enumerate afu (#3103)

Set PCIe segment, bus, and device properties to enumerate FPGA DEVICE (FME)
 Set PCIe segment, bus, device, and function properties to enumerate FPGA ACCELERATOR

 hssi hssi_10g works fine,
 hssi—pci-address b1:00.6 hssi_10g fails if the user provides pci-address cmd args due to PR change. #3094

 Set the HSSI/VF PCIe segment, bus, device, and function attributes to try to enumerate FPGA DEVICE (FME) triggers failure, to enumerate FPGA DEVICE set seegment, bus, device attributes to OPAE filter.

Signed-off-by: anandaravuri <ananda.ravuri@intel.com>
anandaravuri added a commit that referenced this pull request Feb 20, 2024
…ties to enumerate afu (#3103) (#3105)

Set PCIe segment, bus, and device properties to enumerate FPGA DEVICE (FME)
 Set PCIe segment, bus, device, and function properties to enumerate FPGA ACCELERATOR

 hssi hssi_10g works fine,
 hssi—pci-address b1:00.6 hssi_10g fails if the user provides pci-address cmd args due to PR change. #3094

 Set the HSSI/VF PCIe segment, bus, device, and function attributes to try to enumerate FPGA DEVICE (FME) triggers failure, to enumerate FPGA DEVICE set seegment, bus, device attributes to OPAE filter.

Signed-off-by: anandaravuri <ananda.ravuri@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants