Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pcie_bw issue with AMDGPU support #208

Open
Umio-Yasuno opened this issue Apr 29, 2023 · 7 comments
Open

pcie_bw issue with AMDGPU support #208

Umio-Yasuno opened this issue Apr 29, 2023 · 7 comments

Comments

@Umio-Yasuno
Copy link

Umio-Yasuno commented Apr 29, 2023

nvtop calculates PCIe bandwidth usage based on KiB/s, but the correct value is B/s.
rocm_smi_lib uses number_of_received * max_packet_size (max_payload_size) / 1024.0 / 1024.0 or number_of_sent * max_packet_size (max_payload_size) / 1024.0 / 1024.0 to calculate PCIe bandwidth usage (MiB/s).
https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/master/python_smi_tools/rocm_smi.py#L1862-L1883

Also, pcie_bw needs at least 1s to read a file because it uses msleep(1000) to count on the AMDGPU driver side.  
https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/amdgpu/vi.c#L1379

-d, --delay option of nvtop will not work if pcie_bw is supported.
I think we should have a separate thread for pcie_bw if possible.

@Syllo
Copy link
Owner

Syllo commented May 21, 2023

Nvtop shows B/KiB/MiB depending on how much data is being transferred.
The data is gathered from the pcie_bw interface and scaled accordingly https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/pm/amdgpu_pm.c#L1579

@Syllo Syllo closed this as completed May 21, 2023
@Umio-Yasuno
Copy link
Author

Umio-Yasuno commented May 21, 2023

Umm, NVML returns the value in KiB/s, but the AMDGPU driver returns it in B/s (packet_count * max_payload_size[Byte]).

https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceQueries.html#group__nvmlDeviceQueries_1gd86f1c74f81b5ddfaa6cb81b51030c72

Does nvtop convert KiB/s to B/s (for NVIDIA GPU) or B/s to KiB/s (for AMD GPU)?

P.S. nvtop currently is not detecting devices correctly in APU+dGPU environments and therefore cannot be tested, sorry.

@Syllo
Copy link
Owner

Syllo commented May 21, 2023

Sorry I did not get what you meant the first time. Indeed the code was missing a division by 1024 to get in the kilobyte range, thanks. I pushed 04721e3 to fix it.

Could you please elaborate on what is wrong with APU+dGPU? Are one, the other or both GPUs not found or missing info?

@Umio-Yasuno
Copy link
Author

Sorry I did not get what you meant the first time. Indeed the code was missing a division by 1024 to get in the kilobyte range, thanks. I pushed 04721e3 to fix it.

Thanks.

Could you please elaborate on what is wrong with APU+dGPU? Are one, the other or both GPUs not found or missing info?

nvtop detects both GPUs but uses the wrong index.
As a result, the processes on Device1 (RX 560) will be displayed as the processes on Device0 (APU).

#209

nvtop
amdgpu_top_000

@Umio-Yasuno
Copy link
Author

Fixed by 3e9ddef

nvtop detects both GPUs but uses the wrong index. As a result, the processes on Device1 (RX 560) will be displayed as the processes on Device0 (APU).

But the pcie_bw problem still remains.
PCIe RX/TX will always be 0 because maxPayloadSize (256) is divided by 1024.

04721e3

-      received *= maxPayloadSize;
-      transmitted *= maxPayloadSize;
+      // Compute received/transmitter in KiB
+      received *= maxPayloadSize / 1024;
+      transmitted *= maxPayloadSize / 1024;

Also, the pcie_bw sysfs causes a 1s sleep on each read, during which the nvtop thread stops.
Probably, for multiple AMDGPUs that support pcie_bw, the nvtop threads will stop for that amount.

@Syllo Syllo reopened this Jun 11, 2023
@Syllo
Copy link
Owner

Syllo commented Jun 11, 2023

Oh my, I did not think hard enough about operator precedence in that case, thanks!

So is reading the file pcie_bw blocking when nvtop reads it faster than the driver refresh rate (1sec)?

I've been thinking about separating the data gathering and interface logic in two threads (and frankly should have done that from the start) but I have unfortunately little time to allocate to that right now.

@Umio-Yasuno
Copy link
Author

I'm not sure about blocking, but pcie_bw sysfs reads are synchronous, so the thread waits, and both user input and interface updates stop for 1s.
This makes nvtop terribly difficult to use.

So is reading the file pcie_bw blocking when nvtop reads it faster than the driver refresh rate (1sec)?

I am not confident in safely using multithreading in C.
I think it would be reasonable to remove pcie_bw sysfs support or allow pcie_bw sysfs reading to be disabled from the configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants