Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"GPU fault detected" for concurrency::parallel_for_each #71

Closed
LWisteria opened this issue Jun 6, 2016 · 5 comments
Closed

"GPU fault detected" for concurrency::parallel_for_each #71

LWisteria opened this issue Jun 6, 2016 · 5 comments

Comments

@LWisteria
Copy link

I'm trying C++AMP with simple "vector_add" code, but it doesn't work.

Environment is following:

  • Ubuntu 14.04.4
  • Core i7-3770K
  • AMD Radeon R9 FuryX
  • DDR3-1600 2GBx4

Steps to reproduce are following:

  1. Clean-install ubuntu 14.04.4 ("Erase disk and install ubuntu")
  2. Boot normally, and I got "low level graphic mode" (this is expected for FuryX, right?).
  3. Enter console mode with "Ctrl+Alt+F1".
  4. Install hcc; following https://github.com/RadeonOpenCompute/ROCm#add-the-rocm-apt-repository
  5. reboot
  6. Enter console mode again
  7. build vector_add.cpp
  8. ./vector_add gets "Aborted (core dumped)"
  9. dmesg says blow
[ 1634.643988] amdgpu 0000:01:00.0: GPU fault detected: 146 0x003ac414
[ 1634.646454] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000007
[ 1634.648978] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x110C4014
[ 1634.650654] VM fault (0x14, vmid 8) at page 7, write from '' (0x00000000) (196)

No cored dumped when "concurrency::parallel_for_each" is commented out.
I think it's not a problem of the "vector_add.cpp" because it can build and run on VisualStudio2015.

What's the problem and how can I fix this?

@whchung
Copy link
Collaborator

whchung commented Jun 6, 2016

@LWisteria From dmesg it seems there's something wrong in the kernel. Could you first check if ROCm stack is working properly on your system?

In section "Verify Installation" at:
https://github.com/RadeonOpenCompute/ROCm#add-the-rocm-apt-repository

there's a "vector_copy" example which only use HSA API and doesn't involve HCC. Please check if this work or not.

@whchung
Copy link
Collaborator

whchung commented Jun 6, 2016

@LWisteria I've changed your test case a bit, please check my updated version. It works fine on my system with AMD R9 Nano + Ubuntu 14.04 + ROCm 1.1 stack.

I changed:

  1. use "clamp-config", not "hcc-config". for pure C++AMP codes "clamp-config" is preferred.
  2. iterate devices from cbegin() to cend(), instead of crbegin() to crend()
  3. I changed array_view so they are captured by copy, not captured by reference.

According to C++ AMP spec, concurrency::array_view instances shall be captured by copy, and concurrency::array instances shall be captured by reference. HCC front end doesn't check this yet so you need to be cautious on how you capture variables in your kernel.

@LWisteria
Copy link
Author

@whchung

Thank you for your answer.

I overlooked https://github.com/RadeonOpenCompute/ROCm#verify-installation and I found hsa_signal_wait_acquire() doesn't return.
This means the problem is not on hcc but for ROCm run-time or etc?
Could you tell me where I should report?

And thanks for modifying my code.
I run your code. "GPU fault detected" error is erased but ./vector_add doen't return maybe on concurrency::parallel_for_each.
It seems to be the same problem as pure HSA version.

@whchung
Copy link
Collaborator

whchung commented Jun 6, 2016

@LWisteria I noticed you are using an Intel i7-3770k, which is Ivy Bridge. To my knowledge to get ROCm stack working properly you need a Haswell, Skylake or later ones which support PCIe 3.0 atomics.

I would still recommend you raise the question at ROCm git repo so folks there could help you triage your issue better.

@LWisteria
Copy link
Author

@whchung

Thank you for your very helpful advice. I file the problem on the ROCm repo's issue.
And I will try getting Haswell or later CPU and executing on it.

Please close this issue if you don't have any other problem.
Thanks again!

@whchung whchung closed this as completed Jun 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants