Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve EncodeBase58 performance #7656

Merged
merged 1 commit into from Mar 21, 2016

Conversation

promag
Copy link
Member

@promag promag commented Mar 9, 2016

This change consists in avoiding filling the b58 buffer with zeros. The improvement is about 30% - 45%.

For example, calling listunspents with my wallet results in 313ms in EncodeBase58 whereas before was 578ms.

while (pbegin != pend && *pbegin == 0) {
pbegin++;
zeroes++;
}
// Allocate enough space in big-endian base58 representation.
std::vector<unsigned char> b58((pend - pbegin) * 138 / 100 + 1); // log(256) / log(58), rounded up.
int size = (pend - pbegin) * 138 / 100 + 1;
std::vector<unsigned char> b58(size); // log(256) / log(58), rounded up.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Shouldn't this comment go up one line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@promag promag force-pushed the enhancement/speedup-encodebase58 branch from 56472ea to 3252208 Compare March 9, 2016 10:10
@laanwj
Copy link
Member

laanwj commented Mar 11, 2016

For example, calling listunspents with my wallet results in 313ms in EncodeBase58 whereas before was 578ms.

Interesting result. I was not aware anything was bottlenecked by base58 encoding.
As you've demonstrated a concrete performance improvement: concept ACK

@gmaxwell
Copy link
Contributor

Indeed, thanks for the concrete result. Concept ack.

@laanwj
Copy link
Member

laanwj commented Mar 11, 2016

Ping @luke-jr, I think it makes sense for you to review this because of libbase58

@sipa
Copy link
Member

sipa commented Mar 12, 2016

Untested ACK.

This should result in an asymptotic 2x speedup. I didn't expect that it would matter anywhere, but as it seems it does, great.

assert(carry == 0);
length = i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need both i and length?
It seems you could simply ++length; here. I see your i < length condition, but I don't see how can it possibly be ever true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't simply ++length, it can be length += 0|1|2, but there is possible to remove i:
length = it - b58.begin()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mhmm, I need to read this more deeply for an utACK, I think. I retire my utACK but maintain the concept ACK.

@jtimon
Copy link
Contributor

jtimon commented Mar 16, 2016

Very nice, utACK besides nit.

@dcousens
Copy link
Contributor

utACK 3252208

// Apply "b58 = b58 * 256 + ch".
for (std::vector<unsigned char>::reverse_iterator it = b58.rbegin(); it != b58.rend(); it++) {
for (std::vector<unsigned char>::reverse_iterator it = b58.rbegin(); (carry != 0 || i < length) && (it != b58.rend()); it++, i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't ++it faster than it++?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for iterators that may well be the case (prefix increment saves a copy)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should depend on the compiler and target machine anyway, but it is my understanding that in x86 there are separated instructions with different performance (how big is the difference I have no idea). I also suspect that many compilers are smart enough to do this for you.
So if it may not do anything but if it may do something good, why not?
Of course this is not to say we should do it everywhere. But in new code, why not? It may be useful for some platforms. The cons from stackoverflow seem very week, but I'm very curious if anybody else has some other more solid concerns or benchmarks showing this is not really something to think much about (or just data showing that, yes, compilers are currently smart for this too). This is a recurring discussion that I would like to stop thinking about one way or the other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should depend on the compiler and target machine anyway, but it is my understanding that in x86 there are separated instructions with different performance

For integers the compiler is certainly smart enough that there is no difference between prefix and postfix ++, if bare (the result is not actually used).

But this doesn't have much to do with the instructions, just with language design. Iterators are objects. The definition of prefix and postfix ++ respectively is, when overloading them:

Point& operator++();       // Prefix increment operator.
Point operator++(int);     // Postfix increment operator.

So: postfix operation returns a copy (the old value), whereas prefix increments in place and returns a reference (to self). This means prefix can, in principle, be implemented more efficiently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change it++ to ++it while we're optimising... (it++ creates an unnecessary copy of the iterator)

@fanatid
Copy link
Contributor

fanatid commented Mar 18, 2016

https://github.com/cryptocoinjs/base-x/blob/d33156e62ea435073e4b73640f433756124f89d8/src/basex.cc#L51
another base58 encoding implementation (it was attempt speed up base58 for node.js)
@promag if you apply .reserve to digits I think it can be even faster than it is now.

carry += 256 * (*it);
*it = carry % 58;
carry /= 58;
}

assert(carry == 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be transformed into an assert(it != b58.rend()); within the loop.

@luke-jr
Copy link
Member

luke-jr commented Mar 20, 2016

Concept ACK and (in-depth) utACK 3252208

@laanwj laanwj merged commit 3252208 into bitcoin:master Mar 21, 2016
laanwj added a commit that referenced this pull request Mar 21, 2016
3252208 Improve EncodeBase58 performance (João Barbosa)
@ruimarinho ruimarinho deleted the enhancement/speedup-encodebase58 branch April 6, 2016 19:27
@laanwj laanwj mentioned this pull request Apr 15, 2016
8 tasks
random-zebra added a commit to PIVX-Project/PIVX that referenced this pull request Jun 27, 2020
aa633c8 CBase58Data::SetString: cleanse the full vector (Kaz Wesley)
0b77f8a Improve readability of DecodeBase58Check(...) (practicalswift)
dadadee Use prefix operator in for loop of DecodeBase58. (Jiaxing Wang)
305c382 base58: Improve DecodeBase58 performance. (Jiaxing Wang)
3cb418b Improve EncodeBase58 performance (João Barbosa)
4d17f71 don't try to decode invalid encoded ext keys (Jonas Schnelli)
eef4ec6 extend bip32 tests to cover Base58c/CExtKey decode (furszy)
7239aaf fix and extend CBitcoinExtKeyBase template - fix Decode call (req. only one param) - add constructor for base58c->CExtKey (furszy)
13f09c3 remove unused inv from ConnectTip() (furszy)

Pull request description:

  Coming from:

  * bitcoin#6468 .
  * bitcoin#7656 .
  * bitcoin#7922
  * bitcoin#8736 .
  * bitcoin#10961 .

ACKs for top commit:
  random-zebra:
    ACK aa633c8
  Fuzzbawls:
    ACK aa633c8

Tree-SHA512: 3add3106a847b0b3d768df2c4ab5eae7d009e3820998fb5a4cd274169c64622e83ecd14dca51c31f3f6053199834129a1c6920b7ef1193632339297a041173d6
martinus added a commit to martinus/bitcoin that referenced this pull request Feb 14, 2021
This modifies `DecodeBase58` and `EncodeBase58` by processing multiple
input bytes at once. For `DecodeBase58` we can take 9 bytes at once,
and for `EncodeBase58` 7. This reduces the number of calls of the inner
conversion loop.

Benchmark results:

* 37.78 -> 13.73 ns/byte for `Base58Decode`, ~2.8 times faster.
* 28.81 -> 7.02 ns/byte for `EncodeBase58`, ~4.1 times faster.

Note that I tried to improve `blockToJSON` with this change, but the
difference there is not really significant. This optimization might still be
relevant though for e.g. `listunspents`, see bitcoin#7656
martinus added a commit to martinus/bitcoin that referenced this pull request Feb 14, 2021
This modifies `DecodeBase58` and `EncodeBase58` by processing multiple
input bytes at once. For `DecodeBase58` we can take 9 bytes at once,
and for `EncodeBase58` 7. This reduces the number of calls of the inner
conversion loop.

Benchmark results:

* 37.78 -> 13.73 ns/byte for `Base58Decode`, ~2.8 times faster.
* 28.81 -> 7.02 ns/byte for `EncodeBase58`, ~4.1 times faster.

Note that I tried to improve `blockToJSON` with this change, but the
difference there is not really significant. This optimization might still be
relevant though for e.g. `listunspents`, see bitcoin#7656
martinus added a commit to martinus/bitcoin that referenced this pull request Feb 14, 2021
This modifies `DecodeBase58` and `EncodeBase58` by processing multiple input bytes at once. For `DecodeBase58` we can take 9 bytes at once,
and for `EncodeBase58` 7. This reduces the number of calls of the inner conversion loop.

Benchmark results:

* 37.78 -> 13.73 ns/byte for `Base58Decode`, ~2.8 times faster.
* 28.81 -> 7.02 ns/byte for `EncodeBase58`, ~4.1 times faster.

Note that I tried to improve `blockToJSON` with this change, but the difference there is not really significant. This optimization might still be
relevant though for e.g. `listunspents`, see bitcoin#7656
martinus added a commit to martinus/bitcoin that referenced this pull request Feb 14, 2021
This modifies `DecodeBase58` and `EncodeBase58` by processing multiple input bytes at once. For `DecodeBase58` we can take 9 bytes at once,
and for `EncodeBase58` 7. This reduces the number of calls of the inner conversion loop.

Benchmark results:

* 37.78 -> 13.73 ns/byte for `Base58Decode`, ~2.8 times faster.
* 28.81 -> 7.02 ns/byte for `EncodeBase58`, ~4.1 times faster.

Note that I tried to improve `blockToJSON` with this change, but the difference there is not really significant. This optimization might still be
relevant though for e.g. `listunspents`, see bitcoin#7656
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants