Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vega 56 strange behaviour with OpenCL #689

Closed
ghost opened this issue Jan 25, 2019 · 1 comment
Closed

Vega 56 strange behaviour with OpenCL #689

ghost opened this issue Jan 25, 2019 · 1 comment

Comments

@ghost
Copy link

ghost commented Jan 25, 2019

Hi ROCm team
My newly installed RX Vega 56 shows quite a strange behaviour when I run OpenCL apps on it:

  • nothing is executed, but no errors appear anyway
    I run a short benchmark with some crypto mining software.
    The GPU is recognized and the OpenCL compute kernel is compiled for the Vega.
    According to the app, the compute kernel is loaded properly and the benchmark starts.
    In the end, the benchmark results are zero however (see benchmark.txt).

  • in the same time, the GPU does not change any clocks nor voltages and shows GPU% as 0%
    As rocm-smi shows, no values have changed since starting the benchmark.
    It also shows 9W average power consumption... (see rocm-smi.txt)

  • after a reset, the Vega suddenly is heating up
    When I reset the GPU via sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover,
    it still does nothing, but it suddenly starts heating up (see rocm-smi_temperature.txt).
    I did the reset minutes after the benchmark has finished.

The GPU is initialized properly, appears in clinfo/rocminfo and can be controlled via rocm-smi.
Even setting memoryoverdrive to 10% works.
But as I said, it does nothing but heating up when I want to run OpenCL apps on it.

I tested this GPU in a testing machine at home, with a simple board and an old i5 560 on it.
It worked perfectly with ROCm 2.0, so I decided to place it in my production machine.
Only there it behaves like this.

One thing to mention:
The board has an enabled onboard GPU (AST2300), which I use to configure the machine.
It also appears on the syslog at boot (see bootlog.txt).
This exact setup worked fine with two R9 390 before, together with ROCm 1.9 (until 2.0 came out which broke the Hawaii setup #668 )

Do you have any idea where this could come from?
Please let me know, if you need more information.

EDIT: I also tried to do a rollback to ROCm 1.9.2 via ROCM_Experimental, but the Vega behaved the same.

@ghost ghost changed the title Vega 56 strange behaviour with OpenCL on ROCm 2.0 Vega 56 strange behaviour with OpenCL Jan 25, 2019
@ghost
Copy link
Author

ghost commented Feb 18, 2019

Everything works fine with amdgpu-pro 18.50 and the additional package libdrm-amdgpu1, installed via

sudo apt install libdrm-amdgpu1

I have no idea, why this package is necessary though...

Anyway, it works now.

Thank you

@ghost ghost closed this as completed Feb 18, 2019
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants