Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After a Restart Today, Completely Unable to run more than 1 vega 56 #430

Closed
Sayyiditow opened this issue Jun 6, 2018 · 10 comments
Closed

Comments

@Sayyiditow
Copy link

I have been using 6 vega 56 with asus b250 mining with rocm perfectly for the last two days. I had to restart ubuntu for updates and after that, although rocm recognizes the cards, they dont hash. The only card that hashes is the one on the pcie3.0 slot. I am out of ideas, I have reinstalled ubuntu 10 times nothing, still only one out of 6 cards mine. I am unsure what else to do. Please help. I have even tried with 2 cards only, and only 1 hashes.

Stats GPU 0 - lyra2z: 5.628Mh/s (5.609Mh/s)
[2018-06-06 03:54:09] Stats GPU 1 -
[2018-06-06 03:54:09] Stats Total - lyra2z: 5.628Mh/s (5.609Mh/s)

GPU 0 is always the only card hashing. 0 is on the pcie3.0 slot.

I followed these instructions as usual - which worked perfectly the last two days: https://github.com/RadeonOpenCompute/ROCm

./rocm-smi shows:
==================== ROCm System Management Interface ====================

GPU Temp AvgPwr SCLK MCLK Fan Perf SCLK OD
1 N/A N/A N/A N/A 0% N/A N/A
2 35.0c 15.0W 1474Mhz 800Mhz 70.98% manual 0%
0 59.0c 150.0W 1312Mhz 800Mhz 70.98% manual 0%

==================== End of ROCm SMI Log ====================

As you can see only gpu 0 is working.

Any help is appreciated.

Thank you!

@issie81
Copy link

issie81 commented Jun 6, 2018

i am having more or less same issue, i use ROCM the cards are seen in linux even miner detects yet it only hashes with 1 card, also using Vega 56.. Software for Vegas seems to be abit weak:(

@Sayyiditow
Copy link
Author

But this just started today, it was working perfectly, 6 vegas for 2 days continuous.

@issie81
Copy link

issie81 commented Jun 6, 2018

that is really strange so i would blame this on the os? (since after update it didnt work).. i assume you mine with Tdxminer, do you mine with Rocm 1.7.1 or the new 1.8.1?
this has to have an easy fix..

@Sayyiditow
Copy link
Author

Sayyiditow commented Jun 6, 2018

tdxminer, rocm 1.8.151. Yes same here, it must have been the system update. Not sure when we can get a fix :(

@gstoner
Copy link

gstoner commented Jun 6, 2018

Your issue is in 1.8.1, we found a bug with SDMA firmware had to turn back on PCIe Atomics, they are working on a fix for it in the SDMA firmware. But we needed to address an issue with REHL/CENTOS 7.5 support to simplify

You need to disable the SDMA which go back to PCIe atomic free mode to do this you must set HSA_ENABLE_SDMA=0

@Sayyiditow
Copy link
Author

Hi @gstoner how do we set that? Sorry not so good with ubuntu. Thanks so much.

@Sayyiditow
Copy link
Author

Ah it is done on the terminal. Nevermind. Trying it now.

@Sayyiditow
Copy link
Author

I can confirm that HSA_ENABLE_SDMA=0 works perfectly. I have set it up on the "startup applications" All gpus hashing. Thanks @gstoner !

@gstoner
Copy link

gstoner commented Jun 6, 2018

You use
export HSA_ENABLE_SDMA=0

I also cleaned up the language on the Readme.

One thing you can look at is all the debug flags for ROCm here
http://rocm-documentation.readthedocs.io/en/latest/Other_Solutions/Other-Solutions.html

@Sayyiditow
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants