Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to enable sriov, status = fffffffb #18

Closed
markednmbr1 opened this issue Mar 3, 2019 · 8 comments
Closed

Fail to enable sriov, status = fffffffb #18

markednmbr1 opened this issue Mar 3, 2019 · 8 comments

Comments

@markednmbr1
Copy link

Hi All,

I'm trying to set up an AMD Firepro 7150 in proxmox 5.3-11 (which is up to date debian stretch) and I'm getting the following issue when running modprobe gim

[Fri Mar 1 12:31:49 2019] gim info:(enable_sriov:299) Enable SRIOV
[Fri Mar 1 12:31:49 2019] gim info:(enable_sriov:300) Enable SRIOV vfs count = 16
[Fri Mar 1 12:31:49 2019] pci 0000:61:02.0: [1002:692f] type 7f class 0xffffff
[Fri Mar 1 12:31:49 2019] pci 0000:61:02.0: unknown header type 7f, ignoring device
[Fri Mar 1 12:31:50 2019] gim error:(enable_sriov:311) Fail to enable sriov, status = fffffffb
[Fri Mar 1 12:31:50 2019] gim error:(set_new_adapter:668) Failed to properly enable SRIOV
[Fri Mar 1 12:31:50 2019] gim info:(gim_probe:91) AMD GIM probe: pf_count = 1

Hardware wise I am using an ASRock EPYCD8-2T which has an AMD EPYC 7351P processor.

In the BIOS, I have IOMMU, SR-IOV and ACS enabled.

Can anyone please advise why I might be having this issue? Or is there anything I can try?

Thank you!
Mark

@vigchand2705
Copy link

Could you check if ARI is enabled in BIOS?

What is the kernel version?

@markednmbr1
Copy link
Author

Hi Vignesh,

Kernel is 4.18

There is no option for ARI in the bios that I can find.

Thanks,
Mark

@markednmbr1
Copy link
Author

lspci seems to show the compatibility though:

42:00.0 0300: 1002:6929 (prog-if 00 [VGA controller])
Subsystem: 1849:6929
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 5
NUMA node: 2
Region 0: Memory at fce0000000 (64-bit, prefetchable) [size=256M]
Region 2: Memory at fcf4000000 (64-bit, prefetchable) [size=2M]
Region 4: I/O ports at 3000 [size=256]
Region 5: Memory at eb400000 (32-bit, non-prefetchable) [size=256K]
Expansion ROM at eb440000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+ EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- Capabilities: [a0] MSI: Enable- Count=1/4 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010
Capabilities: [150 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [200 v1] #15
Capabilities: [270 v1] #19
Capabilities: [2b0 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable+, Smallest Translation Unit: 00
Capabilities: [2c0 v1] Page Request Interface (PRI)
PRICtl: Enable- Reset-
PRISta: RF- UPRGI- Stopped+
Page Request Capacity: 00000020, Page Request Allocation: 00000000
Capabilities: [2d0 v1] Process Address Space ID (PASID)
PASIDCap: Exec+ Priv+, Max PASID Width: 10
PASIDCtl: Enable- Exec- Priv-
Capabilities: [328 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [330 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
IOVSta: Migration-
Initial VFs: 16, Total VFs: 16, Number of VFs: 0, Function Dependency Link: 00
VF offset: 16, stride: 1, Device ID: 692f
Supported Page Size: 00000553, System Page Size: 00000001
Region 0: Memory at 000000fbe0000000 (64-bit, prefetchable)
Region 2: Memory at 000000fcf0000000 (64-bit, prefetchable)
Region 5: Memory at e7400000 (32-bit, non-prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [400 v1] Vendor Specific Information: ID=0002 Rev=1 Len=070 <?>
Kernel modules: amdgpu

@vigchand2705
Copy link

Oh, try blacklisting amdgpu.

@markednmbr1
Copy link
Author

it is blacklisted (lsmod shows it is not loaded)

@vigchand2705
Copy link

Hmm, dont see any obvious problems so far.
Any idea where this gets logged from "pci 0000:61:02.0: unknown header type 7f, ignoring device". It doesnt seem to be GIM.

@markednmbr1
Copy link
Author

It is when gim is bringing up the card I think. It is the slot the s7150 is in. (sorry it was moved from the slot when it was 61). Updated log with this slot is:

[Mon Mar 4 14:32:19 2019] gim info:(enable_sriov:299) Enable SRIOV
[Mon Mar 4 14:32:19 2019] gim info:(enable_sriov:300) Enable SRIOV vfs count = 16
[Mon Mar 4 14:32:19 2019] pci 0000:42:02.0: [1002:692f] type 7f class 0xffffff
[Mon Mar 4 14:32:19 2019] pci 0000:42:02.0: unknown header type 7f, ignoring device
[Mon Mar 4 14:32:20 2019] gim error:(enable_sriov:311) Fail to enable sriov, status = fffffffb
[Mon Mar 4 14:32:20 2019] gim error:(set_new_adapter:668) Failed to properly enable SRIOV
[Mon Mar 4 14:32:20 2019] gim info:(gim_probe:91) AMD GIM probe: pf_count = 1

@markednmbr1
Copy link
Author

The problem was ARI not being enabled. I spoke to ASRock Rack and got an as-yet unreleased BIOS that has the ARI Forwarding option. Enabled it and now all working!

Closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants