Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uint256::GetCheapHash bigendian compatibility #7078

Merged
merged 1 commit into from Nov 27, 2015
Merged

Conversation

@arowser
Copy link
Contributor

@arowser arowser commented Nov 23, 2015

Tested on PPC/MIPS :
Before change:

test_bitcoin --run_test=addrman_tests
Running 5 test cases...
test/addrman_tests.cpp(141): error in "addrman_new_collisions": check addrman.size() == 3 failed
test/addrman_tests.cpp(145): error in "addrman_new_collisions": check addrman.size() == 4 failed
test/addrman_tests.cpp(166): error in "addrman_tried_collisions": check addrman.size() == i failed
test/addrman_tests.cpp(166): error in "addrman_tried_collisions": check addrman.size() == i failed
test/addrman_tests.cpp(166): error in "addrman_tried_collisions": check addrman.size() == i failed

*** 5 failures detected in test suite "Bitcoin Test Suite"

After:

test_bitcoin --run_test=addrman_tests
Running 5 test cases...

*** No errors detected
@jgarzik
Copy link
Contributor

@jgarzik jgarzik commented Nov 23, 2015

eh, use WriteLE64()?

@arowser arowser force-pushed the arowser:ppc branch Nov 23, 2015
@dcousens
Copy link
Contributor

@dcousens dcousens commented Nov 24, 2015

LGTM, utACK

edit: See @laanwj's comment

edit2: Conflicted, utACK on the code

@laanwj
Copy link
Member

@laanwj laanwj commented Nov 24, 2015

This is on purpose! The "hash" is meant to be as fast as possible by using endian-dependent operations. E.g. within one program run it is stable and that is all that matters - it is not supposed to be used with anything that leaves the program.
This is mentioned, literally, in the comment:
https://github.com/bitcoin/bitcoin/blob/master/src/uint256.h#L118

Edit: as for the test failure, have you looked into why this causes a test failure? In principle a different hash function should not break the tests (or at least the correctness of the code), so this error may lie deeper.

@laanwj laanwj added the Tests label Nov 24, 2015
@arowser
Copy link
Contributor Author

@arowser arowser commented Nov 25, 2015

The failed test case is addrman_new_collisions and addrman_tried_collisions: https://github.com/bitcoin/bitcoin/blob/master/src/test/addrman_tests.cpp#L141

The testcase make a hash collision, The value of "hash1 % ADDRMAN_BUCKET_SIZE" https://github.com/bitcoin/bitcoin/blob/master/src/addrman.cpp#L29 is always 28 when addr1 is 250.1.1.1 or addr1 is 250.1.1.4 when little endian, but its different in big endian, its the reason that test failed.

addrman.Add(CAddress(addr1), source)
    {
    ...
    int nUBucketPos = infoOld.GetBucketPosition(nKey, true, nUBucket);
        --> {
            uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << (fNew ? 'N' : 'K') << nBucket << GetKey()).GetHash().GetCheapHash();
            return hash1 % ADDRMAN_BUCKET_SIZE; //this is same when little endian when addr1 is 250.1.1.1 or 250.1.1.4
            }
        ...
    ClearNew(int nUBucket, int nUBucketPos)
         -->{
            if (vvNew[nUBucket][nUBucketPos] != -1)
            ...
            vvNew[nUBucket][nUBucketPos] = -1;
            }
    }

The test result in big endian should be consistent with the little endian.

@laanwj
Copy link
Member

@laanwj laanwj commented Nov 25, 2015

Bleh, so the test relies very strongly on the exact outcome of the hash function (as the algorithm is "randomized" by that), even though both results are correct inthemselves?

@luke-jr
Copy link
Member

@luke-jr luke-jr commented Nov 25, 2015

These addrman "tests" look pretty crappy in general... lots of checking internal implementation details rather than expected external behaviour. :/

@arowser
Copy link
Contributor Author

@arowser arowser commented Nov 25, 2015

The addrman_new_collisions first call addrman.MakeDeterministic(), it set addrman.nKey is NULL, then the addrman addr placement in new table will be deterministic, it seems try to simulate hash collisions when addrman.add() .

@laanwj
Copy link
Member

@laanwj laanwj commented Nov 25, 2015

The primary reason I'm protesting here is because GetCheapHash is used in CSignatureCacheHasher, which is a performance critical path, and also the only use beside addrman.

It is possible that adding a (conditional) byte swap has hardly impact on performance, but that would have to be benchmarked.

@laanwj
laanwj reviewed Nov 25, 2015
View changes
src/uint256.h Outdated
*/
uint64_t GetCheapHash() const
{
uint64_t result;
memcpy((void*)&result, (void*)data, 8);
WriteLE64((unsigned char*)&result, *(uint64_t *)data);

This comment has been minimized.

@laanwj

laanwj Nov 25, 2015
Member

Nit: what you want here is ReadLE64, not WriteLE64:

return ReadLE64(data);
@arowser arowser force-pushed the arowser:ppc branch to c434940 Nov 25, 2015
@sipa
Copy link
Member

@sipa sipa commented Nov 26, 2015

I doubt the performance penalty of ReadLE64 is going to matter really on any system.

I liked the idea of having an unportable fast function, though, but being able to rely on behaviour does make testing easier too.

Unsure.

@laanwj
Copy link
Member

@laanwj laanwj commented Nov 27, 2015

I liked the idea of having an unportable fast function, though, but being able to rely on behaviour does make testing easier too.

Right - in the case of addrman, the overhead is nothing compared to the double-SHA256, in all its uses:

src/addrman.cpp:    uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << GetKey()).GetHash().GetCheapHash();
src/addrman.cpp:    uint64_t hash2 = (CHashWriter(SER_GETHASH, 0) << nKey << GetGroup() << (hash1 % ADDRMAN_TRIED_BUCKETS_PER_GROUP)).GetHash().GetCheapHash();
src/addrman.cpp:    uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << GetGroup() << vchSourceGroupKey).GetHash().GetCheapHash();
src/addrman.cpp:    uint64_t hash2 = (CHashWriter(SER_GETHASH, 0) << nKey << vchSourceGroupKey << (hash1 % ADDRMAN_NEW_BUCKETS_PER_SOURCE_GROUP)).GetHash().GetCheapHash();
src/addrman.cpp:    uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << (fNew ? 'N' : 'K') << nBucket << GetKey()).GetHash().GetCheapHash();

So the two usecases of GetCheapHash are wildly different. In the case of sigcache every cycle counts, in the case of addrman we may just as well be using something else.

E.g. could also remove the function from uint256 class and have tailored implementations at the call sites

@sipa
Copy link
Member

@sipa sipa commented Nov 27, 2015

@laanwj Even the signature cache now uses a SHA256 (though that should at some point be replaced with a fast non-cryptographic but highly collision-resistant hash function). A single byteswap (and only on BE platforms...) is not going to be measurable.

@laanwj
Copy link
Member

@laanwj laanwj commented Nov 27, 2015

OK, just going ahead and merging this then

@laanwj laanwj merged commit c434940 into bitcoin:master Nov 27, 2015
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
laanwj added a commit that referenced this pull request Nov 27, 2015
c434940 uint256::GetCheapHash bigendian compatibility (daniel)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants
You can’t perform that action at this time.