Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU Node [0] has no GPU connected #23

Closed
phschaad opened this issue Jun 9, 2017 · 12 comments
Closed

CPU Node [0] has no GPU connected #23

phschaad opened this issue Jun 9, 2017 · 12 comments

Comments

@phschaad
Copy link

phschaad commented Jun 9, 2017

I am continuously running into the same error when installing the ROCm stack from the Ubuntu repositories after a fresh system install (tried multiple times, starting with a fresh Ubuntu 16.10 install every time). The setup is using an AMD R9 Nano and runs Ubuntu 16.10.
After installing while following the instructions from here, (Both with and without rocm-opencl), running the included vector_copy sample results in the following output:

CPU Node [0] has no GPU connected
Initializing the hsa runtime succeeded.
Checking finalizer 1.0 extension support succeeded. 
Generating function table for finalizer succeeded.
Getting a gpu agent failed.

Installing and running clinfo results in:

CPU Node [0] has no GPU connected
Number of platforms                               0

This is the dmesg output, where there are some ACPI errors that occur (line 699+) and further down some EDAC errors (1242+). Those errors do not appear when booting into the non-rocm kernel. Possibly related?

@jedwards-AMD
Copy link
Contributor

Can you check if you have a valid kfd device: 'ls -la /dev/kfd'

The permissions on it should be 600. Also, check dmesg to see if your device is getting registered:

'dmesg | grep kfd'

Send the output of that.

@phschaad
Copy link
Author

The output of ls -la /dev/kfd reads

crw-rw-rw- 1 root root 244, 0 Jun 10 08:09 /dev/kfd

and dmseg | grep kfd returns

[    0.000000] Linux version 4.9.0-kfd-compute-rocm-rel-1.5-99 (jenkins@jenkins-raptor-7) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #1 SMP Wed May 31 09:59:28 CDT 2017
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.9.0-kfd-compute-rocm-rel-1.5-99 root=UUID=199481e3-f301-448e-a0e5-96b38e278c20 ro quiet splash vt.handoff=7
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.9.0-kfd-compute-rocm-rel-1.5-99 root=UUID=199481e3-f301-448e-a0e5-96b38e278c20 ro quiet splash vt.handoff=7
[    2.033178] usb usb1: Manufacturer: Linux 4.9.0-kfd-compute-rocm-rel-1.5-99 ehci_hcd
[    2.053167] usb usb2: Manufacturer: Linux 4.9.0-kfd-compute-rocm-rel-1.5-99 ehci_hcd
[    2.053951] usb usb3: Manufacturer: Linux 4.9.0-kfd-compute-rocm-rel-1.5-99 xhci-hcd
[    2.054152] usb usb4: Manufacturer: Linux 4.9.0-kfd-compute-rocm-rel-1.5-99 xhci-hcd
[    2.054652] usb usb5: Manufacturer: Linux 4.9.0-kfd-compute-rocm-rel-1.5-99 xhci-hcd
[    2.054847] usb usb6: Manufacturer: Linux 4.9.0-kfd-compute-rocm-rel-1.5-99 xhci-hcd
[    2.148395] kfd kfd: Initialized module
[    2.434140] kfd kfd: skipped device (1002:7300), PCI rejects atomics

@jedwards-AMD
Copy link
Contributor

This is showing me that the PCI interface you have the card plugged in doesn't support PCIe 3.0 atomics. The interface must support PCIe 3.0 (preferably x16 lanes) and the atomics extension.

@insujang
Copy link

insujang commented Jul 7, 2017

@jedwards-AMD
I have the same issue with this but in different machine setup. I'm trying to use ROC on the virtual machine based on q35 QEMU-KVM virtual machine. However, it seems the emulated hardware provided by QEMU does not provide atomic operation support because I could run it on the host machine.

My hardware consists of:
CPU: Intel Core i7 6700
GPU: AMD Radeon RX 480
and confirmed it works on the host machine, but not worked on a guest machine.

I want to run it on virtual machine that does not support atomicop. Is it possible?
Thanks!

@gstoner
Copy link
Contributor

gstoner commented Jul 7, 2017

No, it is not possible to run with out PCIe Atomics support, we need this for Signaling and some other functions with GFX8 and GFX9 GPU's. QEMU-KVM need to be augmented to support PCIe Atomics.

@insujang
Copy link

insujang commented Jul 7, 2017

@gstoner
Thank you for your reply. Very sad, but helpful for saving my time. :)

Thanks!

@gstoner
Copy link
Contributor

gstoner commented Jul 7, 2017

It not sad, since this is functionality supported by Intel since the release of Ivybridge Xeon E5 v2, and in all Pentium, Core I3, Core I5, Core i7 since Haswell. It also supported in Ryzen, and EPYC processors. Also Cavium Thunder X and X2 ARM processors, and IBM Power9.

Here is more information https://rocm.github.io/ROCmPCIeFeatures.html

@gstoner gstoner closed this as completed Jul 7, 2017
@insujang
Copy link

insujang commented Jul 7, 2017

@gstoner
Umm, My CPU supports PCIe Atomics as it is a Skylake architecture, however, PCIe Atomics functionality just cannot be used in guest virtual machine, and that the reason I'm sad. :)

@sunway513
Copy link

Hi @insujang, ROCm supports docker, please check out the following repo for details:
https://github.com/RadeonOpenCompute/ROCm-docker

@fxkamd
Copy link
Contributor

fxkamd commented Jul 7, 2017

PCIe atomics are disabled by default at boot-up. The GPU driver enables it when it gets loaded. However, that doesn't work in the guest virtual machine. This has to be enabled in the hypervisor. I think a script that manually pokes PCI config space should do the trick. We've done this before, I need to find someone who knows the details ...

@insujang
Copy link

insujang commented Jul 8, 2017

Thank you for the suggestion, @sunway513. But I need a virtualized PCIe controller that I can modify for study. Anyway thank you :D

@insujang
Copy link

insujang commented Jul 8, 2017

@fxkamd Thank you for the information! I should find that script.

Do you mean pci_enanle_atomic_request() in kernel/driver/pci/pci.c?
It is called by amdkfd_topology.c but returns negative integer, which means the virtual machine may not support it..?

So I modified it return 0 without any condition, ROC runtime hang when I run the sample.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants