uint256::GetCheapHash bigendian compatibility #7078

Merged
merged 1 commit into from Nov 27, 2015

Conversation

Projects
None yet
6 participants
@arowser
Contributor

arowser commented Nov 23, 2015

Tested on PPC/MIPS :
Before change:

test_bitcoin --run_test=addrman_tests
Running 5 test cases...
test/addrman_tests.cpp(141): error in "addrman_new_collisions": check addrman.size() == 3 failed
test/addrman_tests.cpp(145): error in "addrman_new_collisions": check addrman.size() == 4 failed
test/addrman_tests.cpp(166): error in "addrman_tried_collisions": check addrman.size() == i failed
test/addrman_tests.cpp(166): error in "addrman_tried_collisions": check addrman.size() == i failed
test/addrman_tests.cpp(166): error in "addrman_tried_collisions": check addrman.size() == i failed

*** 5 failures detected in test suite "Bitcoin Test Suite"

After:

test_bitcoin --run_test=addrman_tests
Running 5 test cases...

*** No errors detected
@jgarzik

This comment has been minimized.

Show comment
Hide comment
@jgarzik

jgarzik Nov 23, 2015

Contributor

eh, use WriteLE64()?

Contributor

jgarzik commented Nov 23, 2015

eh, use WriteLE64()?

@dcousens

This comment has been minimized.

Show comment
Hide comment
@dcousens

dcousens Nov 24, 2015

Contributor

LGTM, utACK

edit: See @laanwj's comment

edit2: Conflicted, utACK on the code

Contributor

dcousens commented Nov 24, 2015

LGTM, utACK

edit: See @laanwj's comment

edit2: Conflicted, utACK on the code

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Nov 24, 2015

Member

This is on purpose! The "hash" is meant to be as fast as possible by using endian-dependent operations. E.g. within one program run it is stable and that is all that matters - it is not supposed to be used with anything that leaves the program.
This is mentioned, literally, in the comment:
https://github.com/bitcoin/bitcoin/blob/master/src/uint256.h#L118

Edit: as for the test failure, have you looked into why this causes a test failure? In principle a different hash function should not break the tests (or at least the correctness of the code), so this error may lie deeper.

Member

laanwj commented Nov 24, 2015

This is on purpose! The "hash" is meant to be as fast as possible by using endian-dependent operations. E.g. within one program run it is stable and that is all that matters - it is not supposed to be used with anything that leaves the program.
This is mentioned, literally, in the comment:
https://github.com/bitcoin/bitcoin/blob/master/src/uint256.h#L118

Edit: as for the test failure, have you looked into why this causes a test failure? In principle a different hash function should not break the tests (or at least the correctness of the code), so this error may lie deeper.

@laanwj laanwj added the Tests label Nov 24, 2015

@arowser

This comment has been minimized.

Show comment
Hide comment
@arowser

arowser Nov 25, 2015

Contributor

The failed test case is addrman_new_collisions and addrman_tried_collisions: https://github.com/bitcoin/bitcoin/blob/master/src/test/addrman_tests.cpp#L141

The testcase make a hash collision, The value of "hash1 % ADDRMAN_BUCKET_SIZE" https://github.com/bitcoin/bitcoin/blob/master/src/addrman.cpp#L29 is always 28 when addr1 is 250.1.1.1 or addr1 is 250.1.1.4 when little endian, but its different in big endian, its the reason that test failed.

addrman.Add(CAddress(addr1), source)
    {
    ...
    int nUBucketPos = infoOld.GetBucketPosition(nKey, true, nUBucket);
        --> {
            uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << (fNew ? 'N' : 'K') << nBucket << GetKey()).GetHash().GetCheapHash();
            return hash1 % ADDRMAN_BUCKET_SIZE; //this is same when little endian when addr1 is 250.1.1.1 or 250.1.1.4
            }
        ...
    ClearNew(int nUBucket, int nUBucketPos)
         -->{
            if (vvNew[nUBucket][nUBucketPos] != -1)
            ...
            vvNew[nUBucket][nUBucketPos] = -1;
            }
    }

The test result in big endian should be consistent with the little endian.

Contributor

arowser commented Nov 25, 2015

The failed test case is addrman_new_collisions and addrman_tried_collisions: https://github.com/bitcoin/bitcoin/blob/master/src/test/addrman_tests.cpp#L141

The testcase make a hash collision, The value of "hash1 % ADDRMAN_BUCKET_SIZE" https://github.com/bitcoin/bitcoin/blob/master/src/addrman.cpp#L29 is always 28 when addr1 is 250.1.1.1 or addr1 is 250.1.1.4 when little endian, but its different in big endian, its the reason that test failed.

addrman.Add(CAddress(addr1), source)
    {
    ...
    int nUBucketPos = infoOld.GetBucketPosition(nKey, true, nUBucket);
        --> {
            uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << (fNew ? 'N' : 'K') << nBucket << GetKey()).GetHash().GetCheapHash();
            return hash1 % ADDRMAN_BUCKET_SIZE; //this is same when little endian when addr1 is 250.1.1.1 or 250.1.1.4
            }
        ...
    ClearNew(int nUBucket, int nUBucketPos)
         -->{
            if (vvNew[nUBucket][nUBucketPos] != -1)
            ...
            vvNew[nUBucket][nUBucketPos] = -1;
            }
    }

The test result in big endian should be consistent with the little endian.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Nov 25, 2015

Member

Bleh, so the test relies very strongly on the exact outcome of the hash function (as the algorithm is "randomized" by that), even though both results are correct inthemselves?

Member

laanwj commented Nov 25, 2015

Bleh, so the test relies very strongly on the exact outcome of the hash function (as the algorithm is "randomized" by that), even though both results are correct inthemselves?

@luke-jr

This comment has been minimized.

Show comment
Hide comment
@luke-jr

luke-jr Nov 25, 2015

Member

These addrman "tests" look pretty crappy in general... lots of checking internal implementation details rather than expected external behaviour. :/

Member

luke-jr commented Nov 25, 2015

These addrman "tests" look pretty crappy in general... lots of checking internal implementation details rather than expected external behaviour. :/

@arowser

This comment has been minimized.

Show comment
Hide comment
@arowser

arowser Nov 25, 2015

Contributor

The addrman_new_collisions first call addrman.MakeDeterministic(), it set addrman.nKey is NULL, then the addrman addr placement in new table will be deterministic, it seems try to simulate hash collisions when addrman.add() .

Contributor

arowser commented Nov 25, 2015

The addrman_new_collisions first call addrman.MakeDeterministic(), it set addrman.nKey is NULL, then the addrman addr placement in new table will be deterministic, it seems try to simulate hash collisions when addrman.add() .

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Nov 25, 2015

Member

The primary reason I'm protesting here is because GetCheapHash is used in CSignatureCacheHasher, which is a performance critical path, and also the only use beside addrman.

It is possible that adding a (conditional) byte swap has hardly impact on performance, but that would have to be benchmarked.

Member

laanwj commented Nov 25, 2015

The primary reason I'm protesting here is because GetCheapHash is used in CSignatureCacheHasher, which is a performance critical path, and also the only use beside addrman.

It is possible that adding a (conditional) byte swap has hardly impact on performance, but that would have to be benchmarked.

@laanwj

View changes

src/uint256.h
*/
uint64_t GetCheapHash() const
{
uint64_t result;
- memcpy((void*)&result, (void*)data, 8);
+ WriteLE64((unsigned char*)&result, *(uint64_t *)data);

This comment has been minimized.

@laanwj

laanwj Nov 25, 2015

Member

Nit: what you want here is ReadLE64, not WriteLE64:

return ReadLE64(data);
@laanwj

laanwj Nov 25, 2015

Member

Nit: what you want here is ReadLE64, not WriteLE64:

return ReadLE64(data);
@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 26, 2015

Member

I doubt the performance penalty of ReadLE64 is going to matter really on any system.

I liked the idea of having an unportable fast function, though, but being able to rely on behaviour does make testing easier too.

Unsure.

Member

sipa commented Nov 26, 2015

I doubt the performance penalty of ReadLE64 is going to matter really on any system.

I liked the idea of having an unportable fast function, though, but being able to rely on behaviour does make testing easier too.

Unsure.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Nov 27, 2015

Member

I liked the idea of having an unportable fast function, though, but being able to rely on behaviour does make testing easier too.

Right - in the case of addrman, the overhead is nothing compared to the double-SHA256, in all its uses:

src/addrman.cpp:    uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << GetKey()).GetHash().GetCheapHash();
src/addrman.cpp:    uint64_t hash2 = (CHashWriter(SER_GETHASH, 0) << nKey << GetGroup() << (hash1 % ADDRMAN_TRIED_BUCKETS_PER_GROUP)).GetHash().GetCheapHash();
src/addrman.cpp:    uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << GetGroup() << vchSourceGroupKey).GetHash().GetCheapHash();
src/addrman.cpp:    uint64_t hash2 = (CHashWriter(SER_GETHASH, 0) << nKey << vchSourceGroupKey << (hash1 % ADDRMAN_NEW_BUCKETS_PER_SOURCE_GROUP)).GetHash().GetCheapHash();
src/addrman.cpp:    uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << (fNew ? 'N' : 'K') << nBucket << GetKey()).GetHash().GetCheapHash();

So the two usecases of GetCheapHash are wildly different. In the case of sigcache every cycle counts, in the case of addrman we may just as well be using something else.

E.g. could also remove the function from uint256 class and have tailored implementations at the call sites

Member

laanwj commented Nov 27, 2015

I liked the idea of having an unportable fast function, though, but being able to rely on behaviour does make testing easier too.

Right - in the case of addrman, the overhead is nothing compared to the double-SHA256, in all its uses:

src/addrman.cpp:    uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << GetKey()).GetHash().GetCheapHash();
src/addrman.cpp:    uint64_t hash2 = (CHashWriter(SER_GETHASH, 0) << nKey << GetGroup() << (hash1 % ADDRMAN_TRIED_BUCKETS_PER_GROUP)).GetHash().GetCheapHash();
src/addrman.cpp:    uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << GetGroup() << vchSourceGroupKey).GetHash().GetCheapHash();
src/addrman.cpp:    uint64_t hash2 = (CHashWriter(SER_GETHASH, 0) << nKey << vchSourceGroupKey << (hash1 % ADDRMAN_NEW_BUCKETS_PER_SOURCE_GROUP)).GetHash().GetCheapHash();
src/addrman.cpp:    uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << (fNew ? 'N' : 'K') << nBucket << GetKey()).GetHash().GetCheapHash();

So the two usecases of GetCheapHash are wildly different. In the case of sigcache every cycle counts, in the case of addrman we may just as well be using something else.

E.g. could also remove the function from uint256 class and have tailored implementations at the call sites

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 27, 2015

Member

@laanwj Even the signature cache now uses a SHA256 (though that should at some point be replaced with a fast non-cryptographic but highly collision-resistant hash function). A single byteswap (and only on BE platforms...) is not going to be measurable.

Member

sipa commented Nov 27, 2015

@laanwj Even the signature cache now uses a SHA256 (though that should at some point be replaced with a fast non-cryptographic but highly collision-resistant hash function). A single byteswap (and only on BE platforms...) is not going to be measurable.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Nov 27, 2015

Member

OK, just going ahead and merging this then

Member

laanwj commented Nov 27, 2015

OK, just going ahead and merging this then

@laanwj laanwj merged commit c434940 into bitcoin:master Nov 27, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Nov 27, 2015

Merge pull request #7078
c434940 uint256::GetCheapHash bigendian compatibility (daniel)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment