Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong PCIe generation and lane width for some amdgpus #138

Closed
bachandi opened this issue Apr 4, 2022 · 5 comments
Closed

Wrong PCIe generation and lane width for some amdgpus #138

bachandi opened this issue Apr 4, 2022 · 5 comments

Comments

@bachandi
Copy link

bachandi commented Apr 4, 2022

My AMD Radeon Pro WX 5100 Graphics card reports PCIe GEN 6@1x which is not correct as the card only supports PCIe GEN 3.

I just quickly checked. The card reports linkSpeed=8000 and laneWidth=1 which results in a laneSpeed of 8000 resulting in pcieGen to be set to 6.

In the hwmon file system I checked and the card reports correctly as 8.0 GT/s PCIe which is the correct speed for PCIe Gen 3@1x.

In another machine an amdgpu reports as PCIe GEN 3@16x but is actually PCIe GEN 3@1x and yet on another machine an amdgpu reported as PCIe GEN 3@16x is actually correct.

Maybe there is something off with the pcieGen and laneWidth detection?

@bachandi bachandi changed the title amdgpu reports wrong PCIe generation and link width. amdgpu reports wrong PCIe generation and lane width. Apr 4, 2022
@bachandi bachandi changed the title amdgpu reports wrong PCIe generation and lane width. Wrong PCIe generation and lane width for some amdgpus Apr 4, 2022
@Syllo
Copy link
Owner

Syllo commented Apr 5, 2022

I am genuinely confused to what the kernel/driver is reporting through sysfs.
On my side it reports 16.0 GT/s which if I understand should be generation 4. Although my CPU and motherboard does only support version 3.
Hence, I thought that I had to divide the 16 by the number of lanes to get the speed on one lane and deduce the generation.
I obviously was wrong about that.

I'll investigate and try to come up with a fix.

@Syllo
Copy link
Owner

Syllo commented Apr 6, 2022

Could you please verify that the patch 28fdcd1, merged in the master branch, fixed this issue?

@bachandi
Copy link
Author

bachandi commented Apr 6, 2022

Thanks for the quick patch. The PCIe generation is now correct for the three cards I tested but the link width is still wrong for one card connected as PCIe GEN 3@1x but shows as PCIe GEN 3@16x.

But I suspect this could be a driver issue as the pp_dpm_pcie file has also the wrong x16 information:

cat device/pp_dpm_pcie
0: 2.5GT/s, x8 
1: 8.0GT/s, x16 *

Where as lspci -vv gives:

Subsystem: Dell Ellesmere [Radeon Pro WX 5100]
Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
	LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM L1, Exit Latency L1 <1us
		ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
	LnkSta:	Speed 8GT/s (ok), Width x1 (downgraded)
		TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

It seems the link capabilities are reported for this card in pp_dpm_pcie instead of the actually currently used link width. For a different card PCIe GEN 3@ 1x is reported correctly and another with PCIe GEN 3@16x is also correct.

@Syllo
Copy link
Owner

Syllo commented Apr 6, 2022

Nice. Either way the info in pp_dpm_pcie seems more trustworthy than the one reported by current_link_speed.
I opened a bug report for the info discrepancy. We'll see if that was a bug or not.

@Syllo Syllo closed this as completed Apr 6, 2022
@PIPIPIG233666
Copy link

Could be my own specific issue but posting here: the kernel I have (xanmod) does not load amdgpu fw by itself, after picking up the correct firmware /sys/class/drm/card0/device/pp_dpm_pcie shows the correct speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants