Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Base58 en-/de-coding #1828

Closed
wants to merge 5 commits into from

7 participants

Christian von Roques Gavin Andresen BitcoinPullTester Wladimir J. van der Laan Luke-Jr Pieter Wuille Jeff Garzik
Christian von Roques

The base58 codec uses full length bignum operations for each digit and strchr(pszBase58, *p) to find the value of each base58 digit. I've cleaned up the implementation by:

  • using bignum operations with word-sized operands
  • using a pre-computed decoding table
  • adding comments
  • getting rid of useless copy operations

As with my other recent pull-requests I think I've made the source easier to understand and the speedup is just a side benefit.

However, I could not restrain myself and added a last commit which is a much faster version, converting 5 or 10 digits at a time (depending on sizeof(BN_ULONG)). Esulting in an overall speedup of over 10 times.

If your prime interest is in simple code, please pull all but the last commit.
If you don't mind a bit longer, but much faster code, pull the whole branch.

Gavin Andresen

My primary interest is neither simple code nor faster code. It is secure, backwards-compatible code.

Given the level of scrutiny the existing bitcoin codebase has had so far, I am very reluctant to commit changes to core code unless the benefits of the changes clearly outweigh the risks that the changes will have unintended side effects or open up new security holes.

That's not a "no" for these changes, but I personally don't think the benefits outweigh the cost of the hour or so it would take me to thoroughly review these changes and convince myself they're not dangerous. If you can get some other coders I trust to spend the time reviewing, then great...

BitcoinPullTester

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/80173de16776c468b187348f74cd16245c3e0416 for binaries and test log.

Wladimir J. van der Laan
Owner

I'm divided on this one. Some changes do make the code shorter and easier to read / more maintainable, others are somewhat more doubtful shortcuts made for speed.

But for example reducing the borderline-crazy loop in DecodeBase58 to:

while ((v = pszRBase58[*p]) >= 0)
 {
    bn *= 58;
    bn += v;
    p++;
 }

is a pretty nice feat.

Is there a usecase for fast base58?

BTW: thanks for adding tests, we'll surely pull that one

Christian von Roques

Thanks for the reviews! Based on them I've pushed some improvements.

  • added a comment explaining base58-encoding and how leading 0-bytes are encoded to preserve them. (btw. should it be base-58 or base58 in comments?)
  • corrected Hungarian notation of pszRBase58 to rgi8RBase58 and declared it const (Is there a style guide? I'm unsure when it's OK not to use Hungarian notation)
  • synchronized rgi8RBase58's understanding of isspace() with Posix (0xa0 does not count as space, whereas "\f" formfeed does)
  • fixed the base59 typo
Christian von Roques
BitcoinPullTester

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/8a4bff050503b75c35ad3acd08d78be18c6b3da6 for binaries and test log.

Luke-Jr

Needs rebase.

roques added some commits
Christian von Roques roques EncodeBase58: don't convert little- to big-endian just to undo it 8ed9c48
Christian von Roques roques CBigNum: use BN_foo_word() where possible 1d34bb3
Christian von Roques roques add test of DecodeBase58 skipping whitespace 749e898
Christian von Roques roques reimplement De-/En-codeBase58()
* Make use of faster word-sized bignum operators.
* Use table-lookup instead of linear search to
  find the value of a base58-digit.
* Explain base-58 encoding
7cb83a0
Christian von Roques roques optimize De-/En-codeBase58()
Convert chunks of digits at a time.
Bigger, but still readable code,
Additional speedup ~4.5
eb7000f
Christian von Roques

Rebased on master as of a few minutes ago.

Pieter Wuille
Owner

I do appreciate the readability of the code, and to a lesser extent the possibility for optimization. But it seems you're trying to bypass CBigNum wherever possible, directly calling OpenSSL routines. I agree CBigNum is only a thin wrapper, though, so perhaps we should see it as "C++ interface for OpenSSL's bn" and not "abstract C++ bignum type".

Jeff Garzik
Owner

agree w/ @sipa

BitcoinPullTester

Automatic sanity-testing: FAILED BUILD/TEST, see http://jenkins.bluematt.me/pull-tester/eb7000f6e5be6cbc9b238bdd781ecb25496152ca for binaries and test log.

This could happen for one of several reasons:
1. It chanages paths in makefile.linux-mingw or otherwise changes build scripts in a way that made them incompatible with the automated testing scripts
2. It does not build on either Linux i386 or Win32 (via MinGW cross compile)
3. The test suite fails on either Linux i386 or Win32
4. The block test-cases failed (lookup the first bNN identifier which failed in https://github.com/TheBlueMatt/test-scripts/blob/master/FullBlockTestGenerator.java)

Gavin Andresen

Closing this as "risks outweigh benefits"

Luke-Jr

Might be good to dig the tests out of this at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Nov 13, 2012
  1. Christian von Roques
  2. Christian von Roques
  3. Christian von Roques
  4. Christian von Roques

    reimplement De-/En-codeBase58()

    roques authored
    * Make use of faster word-sized bignum operators.
    * Use table-lookup instead of linear search to
      find the value of a base58-digit.
    * Explain base-58 encoding
  5. Christian von Roques

    optimize De-/En-codeBase58()

    roques authored
    Convert chunks of digits at a time.
    Bigger, but still readable code,
    Additional speedup ~4.5
This page is out of date. Refresh to see the latest.
Showing with 184 additions and 68 deletions.
  1. +134 −58 src/base58.h
  2. +44 −10 src/bignum.h
  3. +6 −0 src/test/base58_tests.cpp
192 src/base58.h
View
@@ -12,6 +12,18 @@
// - E-mail usually won't line-break if there's no punctuation to break at.
// - Double-clicking selects the whole number as one word if it's all alphanumeric.
//
+// This base-58 codec encodes a sequence of bytes, not just a big-endian number.
+// The difference is that we preserve the exact number of leading 0 bytes.
+// Each leading zero is represented by a leading 0-value base-58 digit ('1').
+// The remaining bytes, starting from the first non-zero byte, are then interpreted
+// as a big-endian binary number and converted into a big-endian base-58 number.
+//
+// Example:
+// base-58 encoded: "127" == binary: 0x00 0x40
+// "1" leading 0-byte --------------------^^ ^^
+// "2" 1-valued base-58 digit ----> 1*58 + 6 = 64
+// "7" 6-valued base-58 digit -------------^
+//
#ifndef BITCOIN_BASE58_H
#define BITCOIN_BASE58_H
@@ -25,44 +37,56 @@
static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";
+// We convert in chunks of BASE58_CHUNK_DIGITS base-58 digits.
+// BASE58_CHUNK_DIGITS is the maximum number of base-58 digits fitting into a BN_ULONG.
+// BASE58_CHUNK_MOD is pow(58, BASE58_CHUNK_DIGITS)
+enum {
+ BASE58_CHUNK_DIGITS = (sizeof(BN_ULONG) == 8 ? 10 : 5),
+ BASE58_CHUNK_MOD = (sizeof(BN_ULONG) == 8 ? 0x5fa8624c7fba400ULL : 0x271f35a0ULL), // 58^10 : 58^5
+};
+
// Encode a byte sequence as a base58-encoded string
inline std::string EncodeBase58(const unsigned char* pbegin, const unsigned char* pend)
{
- CAutoBN_CTX pctx;
- CBigNum bn58 = 58;
- CBigNum bn0 = 0;
+ const unsigned char* p;
- // Convert big endian data to little endian
- // Extra zero at the end make sure bignum will interpret as a positive number
- std::vector<unsigned char> vchTmp(pend-pbegin+1, 0);
- reverse_copy(pbegin, pend, vchTmp.begin());
+ // Convert bignum to std::string
+ std::string str;
+ // Expected size increase from base58 conversion is log(256)/log(58) approximately 1.36566
+ // use 350/256 to be safe
+ str.reserve(((pend - pbegin) * 350) / 256 + 1);
- // Convert little endian data to bignum
+ // Leading zeros encoded as base58 zeros
+ for (p = pbegin; p < pend && *p == 0; p++)
+ str += pszBase58[0];
+ ptrdiff_t nLeadingZeros = p - pbegin;
+
+ // Convert big endian data to bignum
CBigNum bn;
- bn.setvch(vchTmp);
+ BN_bin2bn(p, pend - p, &bn);
- // Convert bignum to std::string
- std::string str;
- // Expected size increase from base58 conversion is approximately 137%
- // use 138% to be safe
- str.reserve((pend - pbegin) * 138 / 100 + 1);
- CBigNum dv;
- CBigNum rem;
- while (bn > bn0)
+ BN_ULONG rem;
+ while (1)
{
- if (!BN_div(&dv, &rem, &bn, &bn58, pctx))
- throw bignum_error("EncodeBase58 : BN_div failed");
- bn = dv;
- unsigned int c = rem.getulong();
- str += pszBase58[c];
+ rem = BN_div_word(&bn, BASE58_CHUNK_MOD);
+ if (rem == (BN_ULONG) -1)
+ throw bignum_error("EncodeBase58 : BN_div_word failed");
+ if (!bn)
+ break; // Not a full chunk
+ for (int i = 0; i < BASE58_CHUNK_DIGITS; i++)
+ {
+ str += pszBase58[rem % 58];
+ rem /= 58;
+ }
+ }
+ while (rem != 0)
+ {
+ str += pszBase58[rem % 58];
+ rem /= 58;
}
- // Leading zeroes encoded as base58 zeros
- for (const unsigned char* p = pbegin; p < pend && *p == 0; p++)
- str += pszBase58[0];
-
- // Convert little endian std::string to big endian
- reverse(str.begin(), str.end());
+ // Convert little endian std::string after leading zeros to big endian
+ reverse(str.begin() + nLeadingZeros, str.end());
return str;
}
@@ -76,47 +100,99 @@ inline std::string EncodeBase58(const std::vector<unsigned char>& vch)
// returns true if decoding is successful
inline bool DecodeBase58(const char* psz, std::vector<unsigned char>& vchRet)
{
- CAutoBN_CTX pctx;
- vchRet.clear();
- CBigNum bn58 = 58;
+ // use unsigned char as array index
+ const unsigned char* p = (const unsigned char*)psz;
CBigNum bn = 0;
- CBigNum bnChar;
- while (isspace(*psz))
- psz++;
+
+ // map base58 digit to number, BAD, or SPACE
+ enum RBASE58 {
+ // 0 .. 57 // base58 digit of value 0 .. 57
+ RBASE58_BAD = -1, // neither base58, nor white space
+ RBASE58_SPACE = -2 // space, tab, newline, vtab, form feed, carriage return
+ };
+ static const signed char rgi8RBase58[256] =
+ {-1,-1,-1,-1,-1,-1,-1,-1,-1,-2,-2,-2,-2,-2,-1,-1,
+ -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+ -2,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+ -1, 0, 1, 2, 3, 4, 5, 6, 7, 8,-1,-1,-1,-1,-1,-1,
+ -1, 9,10,11,12,13,14,15,16,-1,17,18,19,20,21,-1,
+ 22,23,24,25,26,27,28,29,30,31,32,-1,-1,-1,-1,-1,
+ -1,33,34,35,36,37,38,39,40,41,42,43,-1,44,45,46,
+ 47,48,49,50,51,52,53,54,55,56,57,-1,-1,-1,-1,-1,
+ -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+ -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+ -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+ -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+ -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+ -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+ -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+ -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1 };
+
+ // The above initializer was calculated using:
+ // memset(rgi8RBase58, RBASE58_BAD, 256);
+ // rgi8RBase58[' '] = RBASE58_SPACE;
+ // rgi8RBase58['\t'] = RBASE58_SPACE;
+ // rgi8RBase58['\n'] = RBASE58_SPACE;
+ // rgi8RBase58['\v'] = RBASE58_SPACE;
+ // rgi8RBase58['\f'] = RBASE58_SPACE;
+ // rgi8RBase58['\r'] = RBASE58_SPACE;
+ //
+ // for (int i = 0; i < 58; i++)
+ // rgi8RBase58[(unsigned char)pszBase58[i]] = i;
+
+ // Skip whitespace
+ while (rgi8RBase58[*p] == RBASE58_SPACE)
+ p++;
+
+ // Count leading zeros
+ int nLeadingZeros;
+ for (nLeadingZeros = 0; *p == pszBase58[0]; p++)
+ nLeadingZeros++;
// Convert big endian string to bignum
- for (const char* p = psz; *p; p++)
+ // We accumulate digits in acc and count them
+ BN_ULONG acc = 0;
+ int nDigits = 0;
+ int v;
+ while ((v = rgi8RBase58[*p]) >= 0)
{
- const char* p1 = strchr(pszBase58, *p);
- if (p1 == NULL)
+ acc *= 58;
+ acc += v;
+ nDigits++;
+ if (nDigits == BASE58_CHUNK_DIGITS)
{
- while (isspace(*p))
- p++;
- if (*p != '\0')
- return false;
- break;
+ // push accumulated digits into bn
+ bn *= BASE58_CHUNK_MOD;
+ bn += acc;
+ acc = 0;
+ nDigits = 0;
}
- bnChar.setulong(p1 - pszBase58);
- if (!BN_mul(&bn, &bn, &bn58, pctx))
- throw bignum_error("DecodeBase58 : BN_mul failed");
- bn += bnChar;
+ p++;
+ }
+ // push remaining digits
+ if (nDigits > 0)
+ {
+ BN_ULONG mul = 58;
+ while (--nDigits > 0)
+ mul *= 58;
+ bn *= mul;
+ bn += acc;
}
- // Get bignum as little endian data
- std::vector<unsigned char> vchTmp = bn.getvch();
+ // Skip whitespace after base58 string
+ while (rgi8RBase58[*p] == RBASE58_SPACE)
+ p++;
- // Trim off sign byte if present
- if (vchTmp.size() >= 2 && vchTmp.end()[-1] == 0 && vchTmp.end()[-2] >= 0x80)
- vchTmp.erase(vchTmp.end()-1);
+ // Fail if there is junk at the end
+ if (*p != '\0')
+ return false;
- // Restore leading zeros
- int nLeadingZeros = 0;
- for (const char* p = psz; *p == pszBase58[0]; p++)
- nLeadingZeros++;
- vchRet.assign(nLeadingZeros + vchTmp.size(), 0);
+ // Fill in leading zeros and make space for bn
+ vchRet.assign(nLeadingZeros + BN_num_bytes(&bn), 0);
+
+ // Fill big endian bn into the right place
+ BN_bn2bin(&bn, &vchRet[nLeadingZeros]);
- // Convert little endian data to big endian
- reverse_copy(vchTmp.begin(), vchTmp.end(), vchRet.end() - vchTmp.size());
return true;
}
54 src/bignum.h
View
@@ -380,12 +380,26 @@ class CBigNum : public BIGNUM
return *this;
}
+ CBigNum& operator+=(const BN_ULONG b)
+ {
+ if (!BN_add_word(this, b))
+ throw bignum_error("CBigNum::operator+= : BN_add_word failed");
+ return *this;
+ }
+
CBigNum& operator-=(const CBigNum& b)
{
*this = *this - b;
return *this;
}
+ CBigNum& operator-=(const BN_ULONG b)
+ {
+ if (!BN_sub_word(this, b))
+ throw bignum_error("CBigNum::operator-= : BN_sub_word failed");
+ return *this;
+ }
+
CBigNum& operator*=(const CBigNum& b)
{
CAutoBN_CTX pctx;
@@ -394,18 +408,42 @@ class CBigNum : public BIGNUM
return *this;
}
+ CBigNum& operator*=(const BN_ULONG b)
+ {
+ if (!BN_mul_word(this, b))
+ throw bignum_error("CBigNum::operator*= : BN_mul_word failed");
+ return *this;
+ }
+
CBigNum& operator/=(const CBigNum& b)
{
*this = *this / b;
return *this;
}
+ CBigNum& operator/=(const BN_ULONG b)
+ {
+ BN_ULONG r = BN_div_word(this, b);
+ if (r == (BN_ULONG)-1)
+ throw bignum_error("CBigNum::operator/= : BN_div_word failed");
+ return *this;
+ }
+
CBigNum& operator%=(const CBigNum& b)
{
*this = *this % b;
return *this;
}
+ CBigNum& operator%=(const BN_ULONG b)
+ {
+ BN_ULONG r = BN_mod_word(this, b);
+ if (r == (BN_ULONG)-1)
+ throw bignum_error("CBigNum::operator%= : BN_mod_word failed");
+ BN_set_word(this, r);
+ return *this;
+ }
+
CBigNum& operator<<=(unsigned int shift)
{
if (!BN_lshift(this, this, shift))
@@ -417,11 +455,9 @@ class CBigNum : public BIGNUM
{
// Note: BN_rshift segfaults on 64-bit if 2^shift is greater than the number
// if built on ubuntu 9.04 or 9.10, probably depends on version of OpenSSL
- CBigNum a = 1;
- a <<= shift;
- if (BN_cmp(&a, this) > 0)
+ if ((int)shift >= BN_num_bits(this))
{
- *this = 0;
+ BN_zero(this);
return *this;
}
@@ -434,8 +470,8 @@ class CBigNum : public BIGNUM
CBigNum& operator++()
{
// prefix operator
- if (!BN_add(this, this, BN_value_one()))
- throw bignum_error("CBigNum::operator++ : BN_add failed");
+ if (!BN_add_word(this, 1))
+ throw bignum_error("CBigNum::operator++ : BN_add_word failed");
return *this;
}
@@ -450,10 +486,8 @@ class CBigNum : public BIGNUM
CBigNum& operator--()
{
// prefix operator
- CBigNum r;
- if (!BN_sub(&r, this, BN_value_one()))
- throw bignum_error("CBigNum::operator-- : BN_sub failed");
- *this = r;
+ if (!BN_sub_word(this, 1))
+ throw bignum_error("CBigNum::operator-- : BN_sub_word failed");
return *this;
}
6 src/test/base58_tests.cpp
View
@@ -55,6 +55,12 @@ BOOST_AUTO_TEST_CASE(base58_DecodeBase58)
}
BOOST_CHECK(!DecodeBase58("invalid", result));
+
+ // check that DecodeBase58 skips whitespace, but still fails with unexpected non-whitespace at the end.
+ BOOST_CHECK(!DecodeBase58(" \t\n\v\f\r skip \r\f\v\n\t a", result));
+ BOOST_CHECK( DecodeBase58(" \t\n\v\f\r skip \r\f\v\n\t ", result));
+ std::vector<unsigned char> expected = ParseHex("971a55");
+ BOOST_CHECK_EQUAL_COLLECTIONS(result.begin(), result.end(), expected.begin(), expected.end());
}
// Visitor to check address type
Something went wrong with that request. Please try again.