-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized for some nvidian cards #228
Conversation
…formance by 15%, and GTX1060 with 1 GPC performance by more than 30%. Meanwhile, it also increases performance on GTX1070 by 3%, on Telsla M60 by 2%, and should also benefit other chips. However, also find 5% decrease for Nvidia GRID K520.
…formance by 15%, and GTX1060 with 1 GPC performance by more than 30%. Meanwhile, it also increases performance on GTX1070 by 3%, on Telsla M60 by 2%, and should also benefit other chips. However, also find 5% decrease for Nvidia GRID K520.
The first commit is wrong since I added the 3 files to the root directory: "cpp-ethereum", while the correct path should be "cpp-ethereum/libethash-cuda". The second commit fix this issue. |
@davilizh I'm curious what the CUDA hashrate looks like for the GRID K520 with current builds, got any benchmarks to share? |
I do not have K520 at hand, it is tested by someone else in other community. "min/mean/max: 0/4858402/7689557 H/s |
Would love to test this build with my GTX1060 graphics cards. I´m not a developer or programmer, and I´m completely new to GitHub, so I hope my commentary here is not too annoying. |
Hi, compiled it successfully under Ubuntu 16.04.2 with CUDA 8 and NVIDIA drivers 381.22:
I have 2 GTX 1060 graphics cards:
Previously, each line was 41.94 MH/s. With your patch comes more often 47.19MH/s. So works for me. Do I need to adjust something on the graphics or memory clock to make 47 stable? MANY THANKS! |
With this update Genoil/cpp-ethereum#228, ethminer will need the CUDA8 tools. We already have the 381.22 drivers. CUDA 8.0 comes with a driver version 375... that doesn't support the GTX 1080 Ti. As a result, installing CUDA from apt-get doesn't work since it installs this driver version. Thus, you have to install only the `cuda-command-line-tools-8-0` to opt-out of installing the driver.
Wow nice one! I don't have any Nvidia cards anymore to test, but it is tempting to merge! I suggest you submit this to @chfast his fork too, if you haven't already. |
@Cyclenerd Thank you very much. Good news to know. I think you do not need to adjust the graphics or memory clock, the chip would boost to the maximum working frequency automatically if workload is large. But remember not to let it be over hot. We find that performance will degrade if the temperature is too high. @FUNtasticOne which operation system are you using? If windows, you can download the exe from here: https://ci.appveyor.com/project/ethereum-mining/ethminer/build/93/job/ss7k95dsy1kly4vl/artifacts. If linux, then do as Cyclenerd's flow. @Genoil Would be great if my code can be merged to the master branch. |
Thank you @davilizh I´m on windows and just downloaded the exe. For me there is no improvement of the hash rate.. Please let me know if you´re interested in additional information. Thank you again! |
@FUNtasticOne Can you check the runtime working frequency of your DRAM? Mine works at 4.5GHz. and I guess that yours should be 4.0 GHz ( from https://www.lelong.com.my/9734-palit-gtx1060-jetstream-6gb-ddr5-192bit-gtx1060-cwchoo85-188799203-2017-04-Sale-P.htm). |
@davilizh My DRAM frequenzy is 4.0 GHz stock, overclocked to 4.1 GHz. GPU-Z says 2052 MHz which has to be doubled. |
Will this update come into the release-section? |
@seedlord, for Windows you can test this build: https://ci.appveyor.com/project/ethereum-mining/ethminer/build/93/job/ss7k95dsy1kly4vl/artifacts. |
Thanks for link. |
As already written above, for me there isn´t also any difference with the downloadable exe. |
Windows exe is from date 9th July. The optimized code came out 6 days ago, so it should be 20th July or not? |
The build comes from a PR to ethminer: ethereum-mining/ethminer#18, but I'm guessing this one here implements the same optimization. |
Okay, I've made some testing -- 6x1060 3GB: |
@deadgray What are your OC settings on your 3Gs? I've gone from ~19.5 (~117) with claymore to just under 23 (137) with the enhancements. Very, very impressed, sirs. |
@ggilyeat +100 core + 860 mem, cards are with Samsung mem. |
How can i show the average hashrate to show how can i increase with this patch? Im testing with my Asus 1060 Dual OC 3GB |
I tested my Asus GTX 1060 Dual OC 3GB Specs: GPU Power : only 80% GPU Clock: 2000 mhz (OC Stable) Memory Clock: 9300 mhz (OC Stable) Ethminerbeforeethminer -U -M Trial 1... 22953902 min/mean/max: 22953902/22999718/23045626 H/s afterethminer -U -M Trial 1... 24377389 min/mean/max: 24361208/24514868/25032318 H/s inner mean: 24393604 H/s Increase 6% Claymore's v9.4Same specs 22 mh/s So ... Ethmine: 24.393604 Mh/S Claymore: 22 Mh/S without dev fee (using proxy named nofee 5.0) Increase 10.88% Ethminer wins |
What version of genoil should I use? I just changed the .exe file from the version |
use this one https://ci.appveyor.com/api/buildjobs/ss7k95dsy1kly4vl/artifacts/build%2Fethminer-0.11.0.dev0-Windows.zip the changes are not been merged in the master i think |
This has been downloaded, I miss the version of ethermine. |
@marvykkio you can find those files in your system, or in Visual Studio |
@deadgray Terribly sorry, hope this does not affect your use of the code. It seems that someone else have helped me fix it and created a new pull request. |
@Genoil Hi, Genoil. Is there any possibility that this code been merged into your master branch? I have added a switch named "--cuda-parallel-hash" to disable and enable my optimization, and the code is merged into Chfast's master branch now. This switch enables people to scale parallel-hash from 1 to 8 to find the best value for their card. |
@davilizh good joke :-) |
@deadgray What's your dram frequency? |
@davilizh +860 Mhz, so 4665 Mhz. |
@davilizh Hi David, I have chosen to cease further development of the fork, so it's unlikely that the patch will be applied. I don't think you have to worry that people won't be able to find the new ethminer fork by Pavel. That said, I am looking at your code and find the term PARALLEL_HASH slightly confusing. As far as I'm concerned the hashes were already done in parallel in the first place. The fundamental difference is that rather than doing a single coalesced global read of 128 bytes (8 threads * uint4) for a single hash, you do 4 in series of 32 bytes (2 threads * uint4), for 2 different hashes. I guess it's the reduced memory bus width of GP106 that makes this more efficient. Nevertheless I like it very much. |
@deadgray I do not know why. Maybe you have a different dram type. BTW, if you already get the maximum dram bandwidth, my code can not push anymore above the maximum value. |
@davilizh Even if I can't get more, I'm quite impressed with your patch, good work, my total mining hashrate is now 30MH/s up without investment, which is worth new 1070 card :-) |
@Genoil, we just need make sure that 32B is coalesced. From texture of view, both use full texture bandwidth. To downstream unit, we can issue more load instructions to saturate the memory. |
@azazhu I know and I might have even implemented something similar in the last few years, but either in the wrong way or the hardware I used didn't benefit from this. Nice to see such a dramatic improvement 2 years later. |
@Genoil The arch has been changed a lot :) |
Hi, @davilizh! Is there any way to apply this to claymore on windows? |
Tested on windows 10 with 1060 6g, OCed +130 +870 Should I try on Linux? |
@diversuss sorry, I do not know because I do not know the detail of claymore. |
I'm seeing an improvement over claymore on my 1070/1060 machine. This is a bit OT and I am searching -- anyway to have the reported hashrate to show up with this miner? Calculated is looking better on nanopool :) |
Is there anyway to monitor the miner and take action if stops? |
Hey guys. I am trying to run it on 4x1070 cards in Windows 10. I get "Application was not able to start correctly (0xc000007b). Click OK to close application" ethminer.exe --farm-recheck 200 -U -S eth-us-west1.nanopool.org:9999 -FS eth-us- Thank you. |
Is your Windows 32 or 64-bit? (64b is a must) |
my Windows is 10 x64 |
Do I have to download and install visual studio in addition to vcc 2015 x64? |
no |
my driver is the latest 382.53, windows 10 x64. visual c++ redistributable is installed |
Got it working. For those who is having the same problem as I did (missing files and 0xc error download and install all in one run times package from |
@Genoil :) |
Hello all, I'm completely new to mining, I don't have a lot of stats to compare with but I started using the previous version of ethminer where I could get a stable 31M/H with my GTX 1070 (with only memory OC and power brought to 80%) and now with this new version I go above 32M/H so there is definitely a very nice improvement ! I had however another question. When running the new version, I got a warning from my antivirus saying that there is a Trojan/Win64.BitCoinMiner inside the executable. It was not the case with the previous version. I downloaded the file from here: https://github.com/ethereum-mining/ethminer/releases/tag/v0.11.0rc1 Any reason to be worried about? Thanks! |
@Klintistwood Antivirus software mostly detects miners because 'viruses' / 'ransomware' (or however you want to name it) currently deploy the software on the enemies computers to mine coins for the attacker. This means it's a false-positive alarm - just ignore it. The miner is safe. If you worry about it you can try to compile the miner yourself - but this is very technical and should be discussed on e.g. the gitter chat and not on this issue bug tracker. And yes you are right - the newest version has a very good performance improvement, Btw: The project moved to a new git repository here: https://github.com/ethereum-mining/ethminer/ |
The code is optimized for GTX1060, can improve GTX1060 with 2 GPC performance by 15%, and GTX1060 with 1 GPC performance by more than 30%. Meanwhile, it also increases performance on GTX1070 by 3%, on Telsla M60 by 2%, and should also benefit other chips.
We have commented out "launch_bounds" in the code. launch_bound is discussed in http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#axzz4fzSzZc9p in detail.
The state in compute_hash of dagger_shuffle.cuh is modified.
Every thread is the master for calculating one hash value. Each thread initializes its version of state using keccak_f1600_init. Then in the main loop: When i=0 threads 0-7 copy the values of thread 0's state[0-7] into each threads' shuffle[0-7], do the main computation, and then thread 0 captures the result of shuffle[0-3] into state[8-11]. On the next loop when i=1 threads 0-7 copy the values of thread 1's state[0-7] into each threads' shuffle[0-7], do the main computation, and then thread 1 captures the result of shuffle[0-3] into state[8-11].
With the modification this is changed so that if PARALLEL_HASH=2: When i=0 threads 0-7 copy the values of thread 0's state[0-7] into each threads' shuffle[0][0-7] and thread 1's state[0-7] into each threads' shuffle[1][0-7]. They do the main computation on these 2 shuffle vectors in parallel. Then thread 0 captures the result of shuffle[0][0-3] into its state[8-11] and thread 1 captures the result of shuffle[1][0-3] into its state[8-11].
Since the input argument uint2 *s is changed in dagger_shuffle.cuh, we have to modify keccak_f1600_init and keccak_f1600_final in keccak.cuh accordingly.