Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm 5.0 Segmentation fault #1686

Closed
sampie opened this issue Feb 19, 2022 · 5 comments
Closed

ROCm 5.0 Segmentation fault #1686

sampie opened this issue Feb 19, 2022 · 5 comments

Comments

@sampie
Copy link

sampie commented Feb 19, 2022

Hi

/opt/rocm/bin/rocminfo works just fine and shows GPUs, but when I try to make and run a simple helloworld program, I get a segmentation fault.

I have RDNA2 GPU (6800m). What could be the problem? I am not running kernel module from ROCm 5.0 package, but instead I am running vanilla Linux kernel 5.16.9. Has ROCm 5.0 kernel module been already incorporated to upstream Linux kernel?

--- The simple helloworld.cpp ---

#include
#include <hip/hip_runtime.h>

int main(int argc, char* argv[])
{

std::cout << "Before." << std::endl;

hipDeviceProp_t devProp;
hipGetDeviceProperties(&devProp, 0);

std::cout << "After." << std::endl;

return 0;
}


--- GDB showing the crash dump ---

Core was generated by `./helloworld'.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Section `.reg-xstate/887' in core file too small.
#0 0x00007f79e0257688 in rocr::image::ImageRuntime::GetImageInfoMaxDimension(hsa_agent_s, hsa_agent_info_t, void*) () from /opt/rocm-5.0.0/hip/lib/../../lib/libhsa-runtime64.so.1
[Current thread is 1 (Thread 0x7f79e00fdec0 (LWP 887))]
(gdb) bt
#0 0x00007f79e0257688 in rocr::image::ImageRuntime::GetImageInfoMaxDimension(hsa_agent_s, hsa_agent_info_t, void*) () from /opt/rocm-5.0.0/hip/lib/../../lib/libhsa-runtime64.so.1
#1 0x00007f79e02567b1 in rocr::image::hsa_amd_image_get_info_max_dim(hsa_agent_s, hsa_agent_info_t, void*) () from /opt/rocm-5.0.0/hip/lib/../../lib/libhsa-runtime64.so.1
#2 0x00007f79e0194679 in rocr::AMD::GpuAgent::GetInfo(hsa_agent_info_t, void*) const () from /opt/rocm-5.0.0/hip/lib/../../lib/libhsa-runtime64.so.1
#3 0x00007f79e01ba0ff in rocr::HSA::hsa_agent_get_info(hsa_agent_s, hsa_agent_info_t, void*) () from /opt/rocm-5.0.0/hip/lib/../../lib/libhsa-runtime64.so.1
#4 0x00007f79e0c45bb6 in ?? () from /opt/rocm-5.0.0/hip/lib/libamdhip64.so.5
#5 0x00007f79e0c4738d in ?? () from /opt/rocm-5.0.0/hip/lib/libamdhip64.so.5
#6 0x00007f79e0c47c79 in ?? () from /opt/rocm-5.0.0/hip/lib/libamdhip64.so.5
#7 0x00007f79e0c02fde in ?? () from /opt/rocm-5.0.0/hip/lib/libamdhip64.so.5
#8 0x00007f79e0c3af56 in ?? () from /opt/rocm-5.0.0/hip/lib/libamdhip64.so.5
#9 0x00007f79e0a7bdfe in ?? () from /opt/rocm-5.0.0/hip/lib/libamdhip64.so.5
#10 0x00007f79e0a9c365 in hipGetDeviceProperties () from /opt/rocm-5.0.0/hip/lib/libamdhip64.so.5
#11 0x0000000000201b09 in main (argc=, argv=) at helloworld.cpp:10

@langyuxf
Copy link

Is your configuration APU(Cezanne or Renoir) + RX6800M?
Cezanne and Renoir series APU have problems in ROCm 5.0 release.

For your issue, if you want to run hip application on RX6800M.
First, identify RX6800M deviceIndex, either 0 or 1.

Assume it is 0. Then, run like this

hipDeviceProp_t devProp;
hipGetDeviceProperties(&devProp, 0);

$ ROCR_VISIBLE_DEVICES=0 ./helloworld

@sampie
Copy link
Author

sampie commented Feb 21, 2022

Hi,

Yes, I have a laptop with 5900HX + 6800M.

Thank you! Running with ROCR_VISIBLE_DEVICES=0 worked!

BTW: Apparently it is 0, but how can I identify 6800M deviceIndex? Is it somewhere visible in rocminfo output (I did not see such field)?

@langyuxf
Copy link

langyuxf commented Feb 22, 2022

You can use following fields in rocminfo.

Node: 1
Device Type: GPU

If Device Type is GPU, then device index should be Node-1.
You can confirm it with following code.
Check its name/gcnArchName/pciDeviceID, etc.

hipDeviceProp_t props;
hipGetDeviceProperties(&props, deviceIndex);

std::cout << props.name << std::endl;
std::cout << props.gcnArchName<< std::endl;
std::cout << props.pciDeviceID<< std::endl;

@ROCmSupport
Copy link

Thanks @sampie for reaching out.
Looks like your issue is resolved with the suggestions given by @xfyucg.
Please note that ROCm does not work on integrated GPUs and APUs. So always try on discrete GPUs by mapping the corresponding node.
Thank you.

@sampie
Copy link
Author

sampie commented Feb 23, 2022

Yes, the main issue is resolved and 6800M seems to work under rocm with all the code that I have run with it so far.

However, I guess, if possible, it would be good to try to fix rocm in such a way that it would not crash if ROCR_VISIBLE_DEVICES is not set. It looks like rocminfo works just fine, so I wonder why hipGetDeviceProperties does crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants