New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve EncodeBase58 performance #7656
Improve EncodeBase58 performance #7656
Conversation
while (pbegin != pend && *pbegin == 0) { | ||
pbegin++; | ||
zeroes++; | ||
} | ||
// Allocate enough space in big-endian base58 representation. | ||
std::vector<unsigned char> b58((pend - pbegin) * 138 / 100 + 1); // log(256) / log(58), rounded up. | ||
int size = (pend - pbegin) * 138 / 100 + 1; | ||
std::vector<unsigned char> b58(size); // log(256) / log(58), rounded up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Shouldn't this comment go up one line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
56472ea
to
3252208
Compare
Interesting result. I was not aware anything was bottlenecked by base58 encoding. |
Indeed, thanks for the concrete result. Concept ack. |
Ping @luke-jr, I think it makes sense for you to review this because of libbase58 |
Untested ACK. This should result in an asymptotic 2x speedup. I didn't expect that it would matter anywhere, but as it seems it does, great. |
assert(carry == 0); | ||
length = i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need both i
and length
?
It seems you could simply ++length;
here. I see your i < length
condition, but I don't see how can it possibly be ever true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't simply ++length
, it can be length += 0|1|2
, but there is possible to remove i
:
length = it - b58.begin()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mhmm, I need to read this more deeply for an utACK, I think. I retire my utACK but maintain the concept ACK.
Very nice, utACK besides nit. |
utACK 3252208 |
// Apply "b58 = b58 * 256 + ch". | ||
for (std::vector<unsigned char>::reverse_iterator it = b58.rbegin(); it != b58.rend(); it++) { | ||
for (std::vector<unsigned char>::reverse_iterator it = b58.rbegin(); (carry != 0 || i < length) && (it != b58.rend()); it++, i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't ++it
faster than it++
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably, yes: http://stackoverflow.com/a/35085/2084795
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for iterators that may well be the case (prefix increment saves a copy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should depend on the compiler and target machine anyway, but it is my understanding that in x86 there are separated instructions with different performance (how big is the difference I have no idea). I also suspect that many compilers are smart enough to do this for you.
So if it may not do anything but if it may do something good, why not?
Of course this is not to say we should do it everywhere. But in new code, why not? It may be useful for some platforms. The cons from stackoverflow seem very week, but I'm very curious if anybody else has some other more solid concerns or benchmarks showing this is not really something to think much about (or just data showing that, yes, compilers are currently smart for this too). This is a recurring discussion that I would like to stop thinking about one way or the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should depend on the compiler and target machine anyway, but it is my understanding that in x86 there are separated instructions with different performance
For integers the compiler is certainly smart enough that there is no difference between prefix and postfix ++, if bare (the result is not actually used).
But this doesn't have much to do with the instructions, just with language design. Iterators are objects. The definition of prefix and postfix ++
respectively is, when overloading them:
Point& operator++(); // Prefix increment operator.
Point operator++(int); // Postfix increment operator.
So: postfix operation returns a copy (the old value), whereas prefix increments in place and returns a reference (to self). This means prefix can, in principle, be implemented more efficiently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change it++ to ++it while we're optimising... (it++ creates an unnecessary copy of the iterator)
https://github.com/cryptocoinjs/base-x/blob/d33156e62ea435073e4b73640f433756124f89d8/src/basex.cc#L51 |
carry += 256 * (*it); | ||
*it = carry % 58; | ||
carry /= 58; | ||
} | ||
|
||
assert(carry == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be transformed into an assert(it != b58.rend()); within the loop.
Concept ACK and (in-depth) utACK 3252208 |
3252208 Improve EncodeBase58 performance (João Barbosa)
aa633c8 CBase58Data::SetString: cleanse the full vector (Kaz Wesley) 0b77f8a Improve readability of DecodeBase58Check(...) (practicalswift) dadadee Use prefix operator in for loop of DecodeBase58. (Jiaxing Wang) 305c382 base58: Improve DecodeBase58 performance. (Jiaxing Wang) 3cb418b Improve EncodeBase58 performance (João Barbosa) 4d17f71 don't try to decode invalid encoded ext keys (Jonas Schnelli) eef4ec6 extend bip32 tests to cover Base58c/CExtKey decode (furszy) 7239aaf fix and extend CBitcoinExtKeyBase template - fix Decode call (req. only one param) - add constructor for base58c->CExtKey (furszy) 13f09c3 remove unused inv from ConnectTip() (furszy) Pull request description: Coming from: * bitcoin#6468 . * bitcoin#7656 . * bitcoin#7922 * bitcoin#8736 . * bitcoin#10961 . ACKs for top commit: random-zebra: ACK aa633c8 Fuzzbawls: ACK aa633c8 Tree-SHA512: 3add3106a847b0b3d768df2c4ab5eae7d009e3820998fb5a4cd274169c64622e83ecd14dca51c31f3f6053199834129a1c6920b7ef1193632339297a041173d6
This modifies `DecodeBase58` and `EncodeBase58` by processing multiple input bytes at once. For `DecodeBase58` we can take 9 bytes at once, and for `EncodeBase58` 7. This reduces the number of calls of the inner conversion loop. Benchmark results: * 37.78 -> 13.73 ns/byte for `Base58Decode`, ~2.8 times faster. * 28.81 -> 7.02 ns/byte for `EncodeBase58`, ~4.1 times faster. Note that I tried to improve `blockToJSON` with this change, but the difference there is not really significant. This optimization might still be relevant though for e.g. `listunspents`, see bitcoin#7656
This modifies `DecodeBase58` and `EncodeBase58` by processing multiple input bytes at once. For `DecodeBase58` we can take 9 bytes at once, and for `EncodeBase58` 7. This reduces the number of calls of the inner conversion loop. Benchmark results: * 37.78 -> 13.73 ns/byte for `Base58Decode`, ~2.8 times faster. * 28.81 -> 7.02 ns/byte for `EncodeBase58`, ~4.1 times faster. Note that I tried to improve `blockToJSON` with this change, but the difference there is not really significant. This optimization might still be relevant though for e.g. `listunspents`, see bitcoin#7656
This modifies `DecodeBase58` and `EncodeBase58` by processing multiple input bytes at once. For `DecodeBase58` we can take 9 bytes at once, and for `EncodeBase58` 7. This reduces the number of calls of the inner conversion loop. Benchmark results: * 37.78 -> 13.73 ns/byte for `Base58Decode`, ~2.8 times faster. * 28.81 -> 7.02 ns/byte for `EncodeBase58`, ~4.1 times faster. Note that I tried to improve `blockToJSON` with this change, but the difference there is not really significant. This optimization might still be relevant though for e.g. `listunspents`, see bitcoin#7656
This modifies `DecodeBase58` and `EncodeBase58` by processing multiple input bytes at once. For `DecodeBase58` we can take 9 bytes at once, and for `EncodeBase58` 7. This reduces the number of calls of the inner conversion loop. Benchmark results: * 37.78 -> 13.73 ns/byte for `Base58Decode`, ~2.8 times faster. * 28.81 -> 7.02 ns/byte for `EncodeBase58`, ~4.1 times faster. Note that I tried to improve `blockToJSON` with this change, but the difference there is not really significant. This optimization might still be relevant though for e.g. `listunspents`, see bitcoin#7656
This change consists in avoiding filling the b58 buffer with zeros. The improvement is about 30% - 45%.
For example, calling
listunspents
with my wallet results in 313ms inEncodeBase58
whereas before was 578ms.