New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Divergence between 32- and 64-bit when hashing >4GB affects `gettxoutsetinfo` #7848

Merged
merged 3 commits into from Apr 18, 2016

Conversation

Projects
None yet
7 participants
@laanwj
Member

laanwj commented Apr 9, 2016

I was seeing differerent UTXO hashes from gettxoutsetinfo on my ARM nodes than on x86 (64 bit). After the initial panic had weared off, I eventually managed to trace this to the following:

The counters for SHA256 and friends are supposed to be 64 bit. size_t, the type currently used, is 32-bit on architectures with a 32-bit address space. So after processing 4GB of data, the counter will wrap around on 32-bit and the hashes will start to diverge compared to those computed on 64-bit architectures.

I am sure this has no effect on any other use of hashes by the project, as there is no other place where >4GB of data is hashed in one go.

N.B. as this changes the output of gettxoutsetinfo on many platforms anyway (needs mention in release notes) I've taken the liberty of addressing the direct concern in #7758 as well in a second commit.

Before first commit

// [32-bit]
{
  "height": 406339,
  "bestblock": "0000000000000000030eccfc56131ea1a560e965d22957098f978cb070f6374a",
  "transactions": 9254293,
  "txouts": 35345291,
  "bytes_serialized": 1214932313,
  "hash_serialized": "992696b26016449d90736ff28d0776167014a5d916298963331eb2c9d9d36928",
  //                  ⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧
  "total_amount": 15408336.11047113
}

// [64-bit]
{
  "height": 406339,
  "bestblock": "0000000000000000030eccfc56131ea1a560e965d22957098f978cb070f6374a",
  "transactions": 9254293,
  "txouts": 35345291,
  "bytes_serialized": 1214932313,
  "hash_serialized": "acaf2a5c17a8264ea7ee8aecba3dccb9efe6c8ee1861492fff5013ff4b7a14fc",
  //                  ⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧
  "total_amount": 15408336.11047113
}

After first commit

// [32-bit]
{
  "height": 406339,
  "bestblock": "0000000000000000030eccfc56131ea1a560e965d22957098f978cb070f6374a",
  "transactions": 9254293,
  "txouts": 35345291,
  "bytes_serialized": 1214932313,
  "hash_serialized": "acaf2a5c17a8264ea7ee8aecba3dccb9efe6c8ee1861492fff5013ff4b7a14fc",
  //                  ⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧
  "total_amount": 15408336.11047113
}

[64-bit]
// {
  "height": 406339,
  "bestblock": "0000000000000000030eccfc56131ea1a560e965d22957098f978cb070f6374a",
  "transactions": 9254293,
  "txouts": 35345291,
  "bytes_serialized": 1214932313,
  "hash_serialized": "acaf2a5c17a8264ea7ee8aecba3dccb9efe6c8ee1861492fff5013ff4b7a14fc",
  //                  ⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧
  "total_amount": 15408336.11047113
}

After second commit

// [32-bit]
{
  "height": 406339,
  "bestblock": "0000000000000000030eccfc56131ea1a560e965d22957098f978cb070f6374a",
  "transactions": 9254293,
  "txouts": 35345291,
  "bytes_serialized": 1214932313,
  "hash_serialized": "1814733d4f783cccfc4e5046b69e08077af4b26dc2279825a87c7d67522112a1",
  //                  ⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧
  "total_amount": 15408336.11047113
}

// [64-bit]
{
  "height": 406339,
  "bestblock": "0000000000000000030eccfc56131ea1a560e965d22957098f978cb070f6374a",
  "transactions": 9254293,
  "txouts": 35345291,
  "bytes_serialized": 1214932313,
  "hash_serialized": "1814733d4f783cccfc4e5046b69e08077af4b26dc2279825a87c7d67522112a1",
  //                  ⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧
  "total_amount": 15408336.11047113
}
@jonasschnelli

This comment has been minimized.

Show comment
Hide comment
@jonasschnelli

jonasschnelli Apr 11, 2016

Member

What would be the downside of a forced 64bit type on 32bit platform? I guess the performance reduction and the slightly higher memory consumption is totally negligible?

Would fixing it "on the other side" (detect size_t overflow while hashing, force the 32bit reset on 64bit platforms) not involve less risks?

Member

jonasschnelli commented Apr 11, 2016

What would be the downside of a forced 64bit type on 32bit platform? I guess the performance reduction and the slightly higher memory consumption is totally negligible?

Would fixing it "on the other side" (detect size_t overflow while hashing, force the 32bit reset on 64bit platforms) not involve less risks?

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Apr 11, 2016

Member
Member

sipa commented Apr 11, 2016

@theuni

This comment has been minimized.

Show comment
Hide comment
@theuni

theuni Apr 12, 2016

Member

@laanwj Nice catches. utACK.

@jonasschnelli I think it's not worth the trouble. For platforms constrained enough for the change to be significant, I suspect that a separate hashing implementation would've already been patched in.

Member

theuni commented Apr 12, 2016

@laanwj Nice catches. utACK.

@jonasschnelli I think it's not worth the trouble. For platforms constrained enough for the change to be significant, I suspect that a separate hashing implementation would've already been patched in.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Apr 14, 2016

Member

@sipa I know you have bigger plans for this. Do you disagree, though, that this is an improvement? I'm using gettxoutsetinfo a lot so it would really help me if this is fixed.

@jonasschnelli I'd say doing the hashing as it is supposed to be done, with counters of the appropriate width - as done here - is less risky than trying to hack something with overflow detection? Adding a 64-bit with 32-bit value is cheap on 32-bit architectures as well, it even involves overflow detection (the carry flag) internally.

Member

laanwj commented Apr 14, 2016

@sipa I know you have bigger plans for this. Do you disagree, though, that this is an improvement? I'm using gettxoutsetinfo a lot so it would really help me if this is fixed.

@jonasschnelli I'd say doing the hashing as it is supposed to be done, with counters of the appropriate width - as done here - is less risky than trying to hack something with overflow detection? Adding a 64-bit with 32-bit value is cheap on 32-bit architectures as well, it even involves overflow detection (the carry flag) internally.

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Apr 14, 2016

Member

The hash function code should be correct with large inputs regardless of what the utxoset hash stuff returns. utACK.

Member

gmaxwell commented Apr 14, 2016

The hash function code should be correct with large inputs regardless of what the utxoset hash stuff returns. utACK.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Apr 14, 2016

Member

@laanwj Oops, I just responded to @jonasschnelli's comment, not seeing there was a patch too.

Yes, uint64_t should definitely be used for at least SHA1, SHA256 and RIPEMD160. In theory, we'd need to use a 128-bit integer for SHA512 (as it writes a 128-bit length descriptor in the padding), but I guess we'll never hit that.

Member

sipa commented Apr 14, 2016

@laanwj Oops, I just responded to @jonasschnelli's comment, not seeing there was a patch too.

Yes, uint64_t should definitely be used for at least SHA1, SHA256 and RIPEMD160. In theory, we'd need to use a 128-bit integer for SHA512 (as it writes a 128-bit length descriptor in the padding), but I guess we'll never hit that.

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Apr 14, 2016

Member

there can be an assert to catch the overflow in the sha512 case, if we want to be pedantic?

Member

gmaxwell commented Apr 14, 2016

there can be an assert to catch the overflow in the sha512 case, if we want to be pedantic?

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Apr 14, 2016

Member

In theory, we'd need to use a 128-bit integer for SHA512 (as it writes a 128-bit length descriptor in the padding), but I guess we'll never hit that.

Good catch!

Although implementing actual 128-bit counting is just as much work as adding an overflow assertion, I'd personally say adding an assert is preferable here, as we can never test this behavior.

Member

laanwj commented Apr 14, 2016

In theory, we'd need to use a 128-bit integer for SHA512 (as it writes a 128-bit length descriptor in the padding), but I guess we'll never hit that.

Good catch!

Although implementing actual 128-bit counting is just as much work as adding an overflow assertion, I'd personally say adding an assert is preferable here, as we can never test this behavior.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Apr 14, 2016

Member

The assert should probably exist for SHA1/256/RIPEMD160 as well, as those are undefined for longer messages.

Member

sipa commented Apr 14, 2016

The assert should probably exist for SHA1/256/RIPEMD160 as well, as those are undefined for longer messages.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Apr 14, 2016

Member

Hmm looking at the actual code I'll leave that for another time. At least this is trivially correct. I feel that the risk of someone hashing 18,446,744 TB of data is extremely low in the foreseeable future (even with the unfortunate rate of UTXO growth that will take a while), at least smaller than the risk of me introducing bugs there.

Member

laanwj commented Apr 14, 2016

Hmm looking at the actual code I'll leave that for another time. At least this is trivially correct. I feel that the risk of someone hashing 18,446,744 TB of data is extremely low in the foreseeable future (even with the unfortunate rate of UTXO growth that will take a while), at least smaller than the risk of me introducing bugs there.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Apr 14, 2016

Member

utACK 088c270c2928089f1903db0ff01c82eaaa5f6e45

Member

sipa commented Apr 14, 2016

utACK 088c270c2928089f1903db0ff01c82eaaa5f6e45

@MarcoFalke

This comment has been minimized.

Show comment
Hide comment
@MarcoFalke

MarcoFalke Apr 15, 2016

Member

Concept ACK

Member

MarcoFalke commented Apr 15, 2016

Concept ACK

laanwj added some commits Apr 9, 2016

crypto: bytes counts are 64 bit
Byte counts for SHA256, SHA512, SHA1 and RIPEMD160 must be 64 bits.
`size_t` has a different size per platform, causing divergent results
when hashing more than 4GB of data.
rpc: make sure `gettxoutsetinfo` hash has txids
The key (transaction id for the following outputs) should be serialized
to the HashWriter.

This is a problem as it means different transactions in the same
position with the same outputs will potentially result in the same hash.

Fixes primary concern of #7758.
@gavinandresen

This comment has been minimized.

Show comment
Hide comment
@gavinandresen

gavinandresen Apr 15, 2016

Contributor

Tested ACK, on an OSX 64-bit machine.

On a debug build on my machine, hashing the txids makes gettxoutsetinfo about ten seconds slower (3 minutes versus 2 minutes 50 seconds).

Contributor

gavinandresen commented Apr 15, 2016

Tested ACK, on an OSX 64-bit machine.

On a debug build on my machine, hashing the txids makes gettxoutsetinfo about ten seconds slower (3 minutes versus 2 minutes 50 seconds).

@laanwj laanwj merged commit 28b400f into bitcoin:master Apr 18, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Apr 18, 2016

Merge #7848: Divergence between 32- and 64-bit when hashing >4GB affe…
…cts `gettxoutsetinfo`

28b400f doc: update release-notes for `gettxoutsetinfo` change (Wladimir J. van der Laan)
76212bb rpc: make sure `gettxoutsetinfo` hash has txids (Wladimir J. van der Laan)
9ad1a51 crypto: bytes counts are 64 bit (Wladimir J. van der Laan)

codablock added a commit to codablock/dash that referenced this pull request Sep 16, 2017

Merge #7848: Divergence between 32- and 64-bit when hashing >4GB affe…
…cts `gettxoutsetinfo`

28b400f doc: update release-notes for `gettxoutsetinfo` change (Wladimir J. van der Laan)
76212bb rpc: make sure `gettxoutsetinfo` hash has txids (Wladimir J. van der Laan)
9ad1a51 crypto: bytes counts are 64 bit (Wladimir J. van der Laan)

codablock added a commit to codablock/dash that referenced this pull request Sep 19, 2017

Merge #7848: Divergence between 32- and 64-bit when hashing >4GB affe…
…cts `gettxoutsetinfo`

28b400f doc: update release-notes for `gettxoutsetinfo` change (Wladimir J. van der Laan)
76212bb rpc: make sure `gettxoutsetinfo` hash has txids (Wladimir J. van der Laan)
9ad1a51 crypto: bytes counts are 64 bit (Wladimir J. van der Laan)

@codablock codablock referenced this pull request Oct 20, 2017

Closed

[WIP] Update build system to Bitcoin 0.13.2 #1692

22 of 24 tasks complete

codablock added a commit to codablock/dash that referenced this pull request Dec 20, 2017

Merge #7848: Divergence between 32- and 64-bit when hashing >4GB affe…
…cts `gettxoutsetinfo`

28b400f doc: update release-notes for `gettxoutsetinfo` change (Wladimir J. van der Laan)
76212bb rpc: make sure `gettxoutsetinfo` hash has txids (Wladimir J. van der Laan)
9ad1a51 crypto: bytes counts are 64 bit (Wladimir J. van der Laan)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment