Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Wrong Endianess? #580

Closed
SergioDemianLerner opened this Issue Sep 27, 2014 · 6 comments

Comments

Projects
None yet
2 participants

Location: https://bitcoin.org/en/developer-reference#getblocktemplate

The text reads:
Each object in the array contains the rawtransaction data in hex and the hash of the data in little-endian hex.

I looked at the code of base_uint.GetHex() in uint256.cpp and the output hex value is in big-endian format (the internal representation is in little-endian format and GetHex() reverses it).

Am I interpreting it wrong or is the documentation wrong?

Contributor

harding commented Sep 27, 2014

@SergioDemianLerner thanks for the report! The reference for that part of the doc is BIP22, which says:

hash | String | hash/id encoded in little-endian hexadecimal

The help RPC for getblocktemplate reports the same thing on 0.9.2, so if Bitcoin Core is returning big-endian results, a bug needs to be filed with Bitcoin Core. You may want to coordinate with @luke-jr, as he's the author of BIP22 and most of the GetBlockTemplate code.

I'll try to confirm this problem myself later today. Thanks again for the report!

Contributor

harding commented Sep 28, 2014

@SergioDemianLerner I haven't looked at the Bitcoin Core code, but it looks to me like getblocktemplate (GBT) is returning hashes in the same little-endian order used elsewhere in Bitcoin. Here's the code I ran:

#!/usr/bin/env python

import sys, hashlib

## Last I heard, Bitcoin Core only runs on little-endian systems, but
## check anyway
print "Confirm this system is little endian: ", sys.byteorder

## According to https://en.bitcoin.it/wiki/Dump_format#General_note_about_hashes
## the little-endian double-sha256 hash for 0x00 is
## 9a538906e6466ebd2617d321f71bc94e56056ce213d366773699e28158e00614
zero="\x00"
double_sha256_zero = hashlib.sha256(hashlib.sha256(zero).digest()).digest()

print "Expected Double-SHA256 Hash For 0x00:  ", "9a538906e6466ebd2617d321f71bc94e56056ce213d366773699e28158e00614"
print "Little Endian 0x00 Double-SHA256 Hash: ", double_sha256_zero[::-1].encode('hex_codec')
print

## Randomly selected corresponding values from manually running
## 0.9.2 bitcoin-cli -testnet getblocktemplate
raw_tx="0100000001a9cea77bea3bbf61182e93e02d1ce6a7eb679ab0da87ba99dcae5fb805671161000000008a4730440220648ee681ba2f105105383d9aceca2fee1284c8209a2f70ed87af40ac1908fbcf02207b6d79a203a1c081277117f7c26a3d88825986ee5077d1183c73242a76289f73014104a34b99f22c790c4e36b2b3c2c35a36db06226e41c692fc82b8b56ac1c540c5bd5b8dec5235a0fa8722476c7709c02559e3aa73aa03918ba2d492eea75abea235ffffffff012a0c0f01000000001976a914b5bd079c4d57cc7fc28ecf8213a6b791625b818388ac00000000"
gbt_tx_hash="8f10b1f4e7703e404af887fe75eb7ac87283d0a0aa981ecb29f50aed94145dea"

## Convert little-endian raw tx to binary and double-sha256 it
tx_binary = raw_tx.decode("hex")
tx_hash = hashlib.sha256(hashlib.sha256(tx_binary).digest()).digest()

## Hash returned by getblock template
print "Expected Hash (from GBT): ", gbt_tx_hash

## Reverse byte order to little endian (Bitcoin hash format)
print "Little Endian Hash:       ", tx_hash[::-1].encode('hex_codec')

## Normal (network) way of printing a hash (big endian)
print "Big Endian Hash:          ", tx_hash.encode('hex_codec')

And here are the results I obtained:

Confirm this system is little endian:  little
Expected Double-SHA256 Hash For 0x00:   9a538906e6466ebd2617d321f71bc94e56056ce213d366773699e28158e00614
Little Endian 0x00 Double-SHA256 Hash:  9a538906e6466ebd2617d321f71bc94e56056ce213d366773699e28158e00614

Expected Hash (from GBT):  8f10b1f4e7703e404af887fe75eb7ac87283d0a0aa981ecb29f50aed94145dea
Little Endian Hash:        8f10b1f4e7703e404af887fe75eb7ac87283d0a0aa981ecb29f50aed94145dea
Big Endian Hash:           ea5d1494ed0af529cb1e98aaa0d08372c87aeb75fe87f84a403e70e7f4b1108f

Can you confirm that you're getting hashes in the wrong byte order before we investigate further? Thanks!

I think I understand the problem: there is not such thing as little-endian or big-endian transaction hash digest.
Only in the last stage of SHA-256 there is a distinction between little or big-endian, because the hash digest must be interpreted correctly on each platform so it's stored in memory as the same byte array. If the platform was little-endian, the result is split in uint32's and each uint32 is reversed. After SHA-256 is over, the a hash digest is just an array of bytes in memory, and when memory is read byte by byte (if the machine allows it), then the same array of bytes should be read on any platform.

The documentation should read: hash digest values are printed REVERSED. This is because SHA-256 digests are neither big-endian nor little-endian. Again, a hash digest is just an array of bytes.

Another misconception is present in https://en.bitcoin.it/wiki/Dump_format which reads:
"Output from hash functions are usually thought of as big-endian byte strings. ".
This is wrong: Output from hash functions are thought as byte strings and PRINTED accordingly, the first byte is printed before the second and so on. Big-endianess here has no meaning.

You can check this here: http://stackoverflow.com/questions/6269719/little-endian-data-and-sha-256

The reason why we call transaction hashes little or big endian is because we also use SHA-256 to compute the block hash digest, and then we cast the hash digest to a UINT256 in order to compare it with the target as big-integer. So when the hash is casted, the code is assigning an arbitrary "endianess" to the hash digest. This arbitrary endianess is little-endian, because Bitcoin was written originally for an x86.
This cast is done by storing the hash digest into a little-endian uint32 array directly.
This is done with the code...
inline uint256 Hash(const T1 pbegin, const T1 pend)
{
static const unsigned char pblank[1] = {};
uint256 result;
CHash256().Write(pbegin == pend ? pblank : (const unsigned char_)&pbegin[0], (pend - pbegin) * sizeof(pbegin[0]))
.Finalize((unsigned char_)&result); /// <--- direct copy of hash digest into little-endian array
return result;
}
When the hash is printed, it's reversed with this code...
std::string base_uint::GetHex() const
{
char psz[sizeof(pn)2 + 1];
for (unsigned int i = 0; i < sizeof(pn); i++)
sprintf(psz + i_2, "%02x", ((unsigned char
)pn)[sizeof(pn) - i - 1]);
return std::string(psz, psz + sizeof(pn)_2);
}

From this code we interpret "Digest reversed == big-endian".
See https://en.bitcoin.it/wiki/Block_hashing_algorithm an example of how this interpretation is used for blocks.
For transaction hashes, the interpretation was the opposite "Digest reversed == little-endian". Clearly this doesn't make any sense, because it is just the opposite interpretation as the block hash.

So, again, the documentation should read: hash digest values are printed REVERSED. Without mention of any endianness.

Do you agree with this explanation?

Contributor

harding commented Sep 29, 2014

@SergioDemianLerner I think that makes a lot of sense. I'll prepare a pull request updating all the areas in the docs where we refer to hash endianness. I'll @ mention you on that pull request so you can review it.

Thanks!

Contributor

harding commented Sep 30, 2014

Note: pull #583 hopefully addresses this issue.

Contributor

harding commented Oct 25, 2014

Closing: this was fixed in commit d5900e3 / pull #583. Thanks again @SergioDemianLerner for reporting it!

@harding harding closed this Oct 25, 2014

@harding harding added the Dev Docs label Dec 13, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment