-
Notifications
You must be signed in to change notification settings - Fork 347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vega 56/64 overclock/undervolt under ROCm and upstream Linux kernels in Ubuntu 16.04/18.04 [How-to] #463
Comments
"(you may catch some errors - it's mean that you need to install missing packages for building kernels/packages - simply install it and repeat failed step)" sudo apt-get install git These are what you need to install and to compile. I have followed all of your steps and have tdxminer working on the new kernel... Command: (This will happen if you have an intel gpu) you must change "card0" to each and every vega you have card1, card2, card3, card4... etc sudo -i echo "m 3 945 905" > /sys/class/drm/card1/device/pp_od_clk_voltage echo "s 3 1269 935" > /sys/class/drm/card1/device/pp_od_clk_voltage echo "2" > /sys/class/drm/card1/device/pp_sclk_od cat /sys/class/drm/card1/device/pp_od_clk_voltage cat /sys/class/drm/card1/device/pp_sclk_od echo "c" > /sys/class/drm/card1/device/pp_od_clk_voltage echo "c" > /sys/class/drm/card1/device/pp_sclk_od However what is interesting is even with all of these changes and configurations and even if you change the clocks and voltages performance will remain identical and power will not decrease or be modified on 16.04. Maybe someone can report something different but this seems to be more of a visual hack then a real one. The PP tables do nothing but appear to change things when in reality nothing is changed. |
4.18.0-rc5 #1 SMP Sun Jul 22 13:52:18 CEST 2018 x86_64 x86_64 x86_64 GNU/Linux
Why? |
Mining was slower when I tried this on 16.04. Also I couldn’t get the overclocking to really work it would simply show that the clocks were changed but performance would not change... nor would power usage at the wall. Also you need to make sure that command is Sudo and that you have pushed manual performance mode. |
But card already in manual mode.
May be i miss something? |
Oh! Now it's work.
Thanks. |
Is it possible to run 4.18.0-rc5 and keep the same hash rate? I used the first post guide 16.04. I can under-volt my Vega but I get lower hash rate about 8%. |
Voltage and power consumption does not change with these patches. At the wall measured with a watt meter power consumption is stuck at 300W per card no matter the voltage setting. Did not know is this expected behavior or if Vega support is still in beta? |
Need you help to understand why its happened. |
@Hackintoshihope Just some info of my own that I wanted to share. I was also having trouble getting undervolting and applying custom power play tables to work properly. What turned out to be the trick was applying the power play tables via binary to /sys/class/drm/card$1/device/pp_table instead of what is suggested in the OP. This person put up a new repo with some useful tools for mining with vega: https://github.com/xmrminer01102018/VegaToolsNConfigs In it are some scripts and documents on mining at full CNv1 at full speed with up to 6 vegas with custom power play tables. At this time, I was interested only in applying custom power play tables and undervolting for running tdxminer with rocm. In the repo are a few useful tools: SoftPPT-1.0.0.jar and setPPT.sh. SoftPPT-1.0.0.jar will convert hex power play tables to binary. And setPPT.sh will push your binary power play tables to /sys/class/drm/card$1/device/pp_table. Combined, this will allow you to build custom power play tables, convert to binary, and soft deploy them on your cards. My two vega FEs stats using this method on the latest rocm:
Before, running them at these clocks would always just result in them consuming 220 watts. Now they run at a much more respectable 145 watts (190ish watts at the wall?) |
@mdai843 I’ve waited a long time for confirmation of my issues this post detailed methods to get the functionality to work. But the implementation did nothing of the sort to reduce power. If I am understanding correct you were able to modify voltages and get a lower power consumption using additional tweaks you provided? Or just by pushing changes instead to this path: /sys/class/drm/card$1/device/pp_table? I’ve been anxious wanting to implement this. If what you are saying is correct I’m about to cool down my Vegas considerably. |
Some initial testing I did for underclocks and undervolts for rocm 1.8.3 + tdxminer + 4.18 kernel on 2 vega FEs (~120 watts from the wall):
|
@Hackintoshihope I too have been working on this problem for long and couldn't get it solved. Would be great to confirm my results since this appears to be too good to be true. I combed through everything from tekcomm but never got it sorted out. I finally think this might be it. Again, the method is to generate power play tables in hex form. Convert them to binary. Then to push them to /sys/class/drm/card$1/device/pp_table (included in that repo is a helper script called setPPT.sh). The tools you need for converting hex -> binary are in the repo above. You can generate custom power play tables here: https://docs.google.com/spreadsheets/d/1-rhYsaRXO1ahk3PyrEgT9gXzs7ImAzh-sbqtgwy8HQg/edit#gid=964538665 The hex power play tables need to have all newlines, commas, and back slashes stripped out. I'll try to show a sample of my work here: For example, the power play table for my test above first looks like:
I stripped out all the non essentials leaving plain hex digits like so:
You then convert this to binary like so:
Then push this to each card:
|
@mdai843 Also concerning that under volt that is effectively double the efficiency then what I am currently getting. If that is correct 30mhs+ for 6 Vegas that just might be the best performance per watt I’ve seen. I will be loading up my test rig and see if I can replicate your results in a similar fashion. Exciting stuff. |
I just tested this on my larger rig with 6 Vega FEs to make sure this actually scaled up and worked on risers. Using this same clocks as before but with real undervolts, I was able to drop my wattage by about 70 watts. Before I was pulling 915 watts from the wall, but now am pulling 845 watts from the wall with no hash rate drop since the clocks remained the same. I'm sure one can do much better but I'm convinced that this actually works.
|
@mdai843 Any additional information will help. |
I didn't take any deviations from the OP for setting up the machine aside from how to set up the power play tables. His original instructions will suffice. The order I do things are:
Tekcomm's repos are offline as he removed all traces of himself. You can find some of his old stuff like tdxminer here: https://github.com/earlvanze/AMD-ROCm-Miner |
Tekcomm left for some reason but he did not develop tdxminer. He modified it to work with RX 550 and RX 560. However regardless it seems you’ve done it I’ll report back on a fresh install. |
Good luck, I spent more time on this than I care to admit. So I really hope this will help any one else struggling with undervolting vegas on linux. |
@mdai843 I did need to install java... but in essence it works as you say. Although I would like to know what exactly you are doing to get your hex cleaned up? Is there an easy way to do this? |
That's awesome. Thanks for validating these steps. To clean it up, I just did it in vi with some search and replace. I'm sure you can write a little script to do some regex replacements. It's just dropping the commas, the new lines, the backslashes, and all the text before the actual hex code. As for why this works? Not sure yet. I suppose this will be one of those things that gets fixed over time. For now, we'll just have to do it this way. |
You will need to convert the above to binary just as you showed me. Would you be able to test this HEX and report what hash rate you experience? It has three states that are modified. 5 6 7. To access these states and get performance measures from each one do something like this: run each command each time the miner starts to get the correct state (/opt/rocm/bin/rocm-smi --setsclk 5 --setfan 205), (/opt/rocm/bin/rocm-smi --setsclk 6 --setfan 205), (/opt/rocm/bin/rocm-smi --setsclk 7 --setfan 205) without parenthesis of course. Each state is as follows 1138 MHZ 800mv, 1407 MHZ 900mv, and 1607 MHZ 970mv (all with 500MHZ HBM2) I am getting about a 12-20% hash rate drop at the same clocks as I did before without the undervolting. |
@Hackintoshihope my results below: /opt/rocm/bin/rocm-smi --setsclk 5 --setfan 205 (376 watts from the wall and notice the voltage is 900mV even though we are fixed at 500 mem and 1134 core. For reference, the pp table in my example above does 1138/800 @ 800mv pulling ~300 from the wall. Don't know why, vagaries of working with AMD cards I suppose.)
|
/opt/rocm/bin/rocm-smi --setsclk 6 --setfan 205 (unstable on one card and hangs, will report stats on other card. 330 watts from the wall including hung gpu)
|
/opt/rocm/bin/rocm-smi --setsclk 7 --setfan 205 (both cards hang)
|
These settings are extremely experimental and being used on Vega 64 and Vega 56's Only ran for 10-15min. If I am familiar you are using a Vega FE? But your results do seem to mimic what I am getting. For the tests you have ran. |
Should this work with RX 400 / 500 series cards as well? |
setting
ops, I see in
How do I enable this? Update, ok, I figured, I don't line kernel command lines. It can be enabled by adding a modprobe.conf file and rebuilding initrd
|
Hi @akostadinov Yep, in order to use the OverDrive settings like For instance, in the 4.20 upstream driver, the default is Note also that, rather than writing directly to Note also that, when defining the rules in your new DPM tables, you will need to follow some of the rules that I mention in these posts:
Finally, you may be interested in this discussion about changing the maximum power setting for your GPU and this discussion on performance loss due to thermal throttling / fan speed choices. I just tested the |
Hello,
I want to share with you some information on how to overclock/undervolt GFX9 GPUs (Vega 56/Vega 64) under Ubuntu 16.04 and 18.04:
If you have problems with the standard installation of rocm-dkms, install it without rocm-dkms and rock-dkms packages (verify packages list, it can vary):
If you want to thank me - please send some BTC 3JS1m8XSvS4fcprByLRuMjjjuck9dne4rm
Thanks to everyone who shared interesting information on the web.
Please share your experience with other, maybe our world become less hot and more productive )
The text was updated successfully, but these errors were encountered: