-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cpuminer: Significantly optimize mining workers. #2977
Conversation
This reworks the speed stat tracking to make it more accurate as well as significantly reduce lock contention. A high level overview of the changes is as follows: - Each worker is given its own stats state to independently update - Each worker now has a unique ID which is used to store and clean up said state - The elapsed hashing time per worker is now tracked - The individual speed stats are updated via lock-free atomic primitives versus the previous single mutex that all workers shared - Each worker now periodically updates its individual speed stats every fixed number of nonces instead of relying on a ticker - The speed monitor goroutine periodically sums the results from all of the individual worker stats Some local test mining shows that prior to these changes the reported hash speed varies pretty wildly between 43 kh/s and 427 kh/s and more wild swings with more workers. With the changes, the reporting is more in line with what I would expect without additional changes and comes in a much tigher (and a bit higher due to a bit of reduced lock contention) range of 408 kh/s and 568 kh/s.
60bccef
to
7176ea9
Compare
The existing code prior to this change updated the relevant fields in the header and called its own hash method which internally makes use of the existing writer interface based serialization and shared buffers to prevent a lot of allocations. That makes sense most places in the code, but it is really bad for concurrent performance in a tight mining loop. Thus, this significantly optimizes the CPU miner workers to instead remove all mutex contention by serializing the header once outside of the loop, directly updating the relevant bytes in the serialized data, using the allocation-free hash method to compute the hash from the serialized bytes directly, and finally updating the block template header when/if a solution is found. It also now makes use of the much more efficient primitives and uint256 packages instead of big integers to further reduce allocations and speed up the calculations. The net result is a major speedup when mining with a single core and nearly linear scalar with multiple cores. Concretely, prior to these changes, I was only seeing around 400-570 kh/s with 1 or 2 cores and mining on more cores actually made the performance worse due to the all of the contention. With these changes, I'm seeing around 1.2Mh/s on a single core and over 10MH/s with 10 cores.
7176ea9
to
5c49ffb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR, but dcrctl generate 1
doesn't report on the speed. Found out after not seeing the H/s report :P
I can confirm the significant speedup in mining using this PR.
// new value. | ||
littleEndian.PutUint64(header.ExtraData[:], extraNonce+enOffset) | ||
// Update the extra nonce in the serialized header bytes directly. | ||
const enSerOffset = 144 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double checked
const bitsOffset = 116 | ||
const timestampOffset = 136 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double checked
It could probably be updated now due to the speed stat tracking changes, but that was actually intentional since it didn't report properly for discrete mining and the // HashesPerSecond returns the number of hashes per second the normal mode
// mining process is performing. 0 is returned if the miner is not currently
// mining anything in normal mining mode.
//
// This function is safe for concurrent access.
func (m *CPUMiner) HashesPerSecond() float64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Can confirm massive hashrate improvement.
This significantly optimizes the CPU miner that is useful for mining on testnet.
The existing CPU mining code prior to this change updated the relevant fields in the header and called its own hash method which internally makes use of the existing writer interface based serialization and shared buffers to prevent a lot of allocations. That makes sense most places in the code, but it is really bad for concurrent performance in a tight mining loop.
Thus, this modifies the CPU miner workers to instead remove all mutex contention by serializing the header once outside of the loop, directly updating the relevant bytes in the serialized data, using the allocation-free hash method to compute the hash from the serialized bytes directly, and finally updating the block template header when/if a solution is found.
It also now makes use of the much more efficient
primitives
anduint256
packages instead of big integers to further reduce allocations and speed up the calculations.The net result is a major speedup when mining with a single core and nearly linear scalar with multiple cores.
It also includes a separate commit that reworks the speed stat tracking to make it more accurate as well as significantly reduce its lock contention.
A high level overview of the changes is as follows:
primtivies
anduint256
packagesConcretely, prior to these changes, I was only seeing around 400-570 kh/s with 1 or 2 cores and mining on more cores actually made the performance worse due to the all of the contention. With these changes, I'm seeing around 1.2Mh/s on a single core and over 10MH/s with 10 cores.