Make Hash independent of char signedness#5
Make Hash independent of char signedness#5laanwj wants to merge 1 commit intobitcoin-core:bitcoin-forkfrom laanwj:bitcoin-fork
Conversation
|
Nice debugging! I wonder whether Hash() was intended to operate on unsigned data. If so, why doesn't it take an unsigned char pointer as input? IIRC, on x86 char is signed. |
|
@sipa Yes, on x86 char is signed. However at the top of the function it operates on unsigned data (uint32_t, DecodeFixed32). Hence I assume that this is the intent in the bottom part as well. Changing the type of the parameter is certainly a possibility too but I wanted to limit code impact as much as possible. |
util/hash_test.cc
Outdated
There was a problem hiding this comment.
Is this new code? 2011 seems a bit in the past...
There was a problem hiding this comment.
Yes, it has new code, but I copied the framework (including the comment) from another _test.cc file.
|
Upstream issue https://code.google.com/p/leveldb/issues/detail?id=237 |
Solves an illusive issue with BloomFilterPolicy and interoperability of data files between platforms with signed char (such as x86) and platforms with unsigned char (such as ARM). - Add casts to Hash() to treat chars as unsigned (I think this was intended) - Add tests for Hash - Change name of algoritm in BloomFilterPolicy::Name to make sure that filters are recomputed; this provides forward and backward compatibility
|
Hmm, what is the behaviour of "filters being recomputed". Does it actively rebuild the bloom filter at load time, or does the filter just fail until the sstable file is rewritten? My guess is the latter, but I haven't verified the code. If so, I wonder whether effectively disabling bloom filtering for all existing database until files are rewritten will be acceptable upstream. An alternative would be replicating the "buggy" behaviour and making it use signed chars for the padding on every platform. |
|
I've thought about that, but replicating the buggy behaviour doesn't help, it will create the same problem but in the other direction. There is no way to detect that a database used the wrong hash function. So as I see it the only solution is to blanket recompute all bloom filters. Not sure if this requires any manual action, but it indeed seems so, TableBuilder uses a FilterBlockBuilder to build the filter once the table is 'finished'. Seemingly the new function will only be used for new tables, and the filter will be ignored for old tables. (then again I don't know exactly how the implementation works either... it creates new tables every time the last table is full?) |
|
Tables are immutable. They're created once as the result of a compaction, and used until they are removed through another compactions. Compactions happen when log files grow too large, or periodically to keep performance up. |
|
Ok, then it's clear: only after the next compaction of a table you'll get filter functionality back for that data. |
|
I have tested this patch.
Node was completely borked with connect block errors.
Loaded the node on ARMv7 with no issues, sync continued and remains stable. |
|
This is part of leveldb upstream now; see bitcoin/bitcoin#5093 |
8d4eb08 Add HasAcceleratedCRC32C to port_win.h (Cory Fields) 77cfbfd crc32: move helper functions out of port_posix_sse.cc (Cory Fields) 4c1e9e0 silence compiler warnings about uninitialized variables (Cory Fields) Pull request description: Addresses bitcoin-core#4. As this file is compiled with sse42 flags, it's possible that the feature discovery ends up using an unsupported instruction at runtime. Fix this by adding CanAccelerateCRC32C to the port api, and requiring that it be checked before using AcceleratedCRC32C. Tree-SHA512: 166cc0f4758bc0f22adda2126acad83e0251605a3a14d695fbb34a1d40f2328c4d938fbdcd624964281e6b9fcb3b233d3a8bde010ab889d82ae4f94479c6e545
- use "#if defined(foo)" rather than "#if foo" - Use the same guard for the cpuid header and the function
…bitcoin-core#5. 8b1cd37 fixup define checks. Cleans up some oopses from bitcoin-core#5. (Cory Fields) Pull request description: - use "#if defined(foo)" rather than "#if foo" - Use the same guard for the cpuid header and the function Tree-SHA512: fe83895055faf9f5491b9af44262a4dc15d9f56ec8f818e7d66c1002bb6568a90345662828abc7baab0772baa646f9cf13f8ba586ebad5fc3678731b27585885
Solves an elusive issue with BloomFilterPolicy and interoperability of data files between platforms with signed char (such as x86) and platforms with unsigned char (such as ARM) (see bitcoin/bitcoin#2293).
I suppose we want to take this upstream first, but those that need this can already apply this patch.