-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PCIe Bus Error when running vector_copy sample #32
Comments
Perhaps you aren't using the first motherboard slot? Looks like the second may be wired through the chipset. How were you able to install ROCm on Ubuntu 16.04? |
We found the issue, it is system bios is not properly configuring the CPU correctly and enabling PCIe Platform Atomics. Gigabyte has to fix the SBIOS. greg On Sep 27, 2016, at 2:15 AM, almson <notifications@github.commailto:notifications@github.com> wrote: Perhaps you aren't using the first motherboard slot? Looks like the second may be wired through the chipset. How were you able to install ROCm on Ubuntu 16.04? — |
@gstoner Would you report it to Gigabyte? |
@gstoner Will atomics work through the CPU->DMI->Chipset->PCIe path that's used for the 2nd slot of that motherboard? |
PCIe Atomics only work with PCIe Gen 3 Here is what Atomic Operations do Atomic Operations – Goal: Support SMP-type operations across a PCIe network to allow for things like offloading tasks between CPU cores and accelerators like a GPU. The spec says this enables advanced synchronization mechanisms that are particularly useful with multiple producers or consumers that need to be synchronized in a non-blocking fashion. Three new atomic non-posted requests were added, plus the corresponding completion (the address must be naturally aligned with the operand size or the TLP is malformed):
Since AtomicOps are not locked they don’t have the performance downsides of the PCI locked protocol. Compared to locked cycles, they provide “lower latency, higher scalability, advanced synchronization algorithms, and dramatically lower impact on other PCIe traffic.” The lock mechanism can still be used across a bridge to PCIe to achieve the desired operation. AtomicOps can go from device to device, device to host, or host to device. Each completer indicates whether it supports this capability and guarantees atomic access if it does. The ability to route AtomicOps is also indicated in the registers for a given port. On Sep 28, 2016, at 8:32 AM, almson <notifications@github.commailto:notifications@github.com> wrote: @gstonerhttps://github.com/gstoner Will atomics work through the CPU->DMI->Chipset->PCIe path that's used for the 2nd slot of that motherboard? — |
Thank you for the background information. Can you tell us what hardware scenarios are or aren't supported? Here is an illustration of the Z170 chipset (first image in the article). As you can see, some PCIe slots may be connected through the Z170 PCH. It seems that atomics do not work in this case. Is this a hardware or a BIOS limitation? Do you plan to create a utility to analyze which slots are compatible, or a database of hardware that can be consulted prior to purchase? Do PCIe switches (such as those in the ASUS X99E-WS) always work? |
To be clear, Z170 advertises PCIe 3.0. |
The issue was it was in the wrong slot, He had the device in x16 slot that was x4 which could be not comping off the main CPU. Which could be Gen2 slot or BIOS is not configured correctly. The CPU support PCIe Gen3 with Atomics greg To be clear, Z170 advertises PCIe 3.0. — |
I followed the README to install ROCm driver. However, when I try to run
vector_copy
sample, the program hangs, and I see report about a PCIe error ondmesg
.Configuration:
uname -r
:4.4.0-kfd-compute-rocm-rel-1.2-31
Kernel log (
dmesg
) messages after running thevector_copy
sample:/opt/rocm/rocm-smi -a
output:The text was updated successfully, but these errors were encountered: