Skip to content
This repository was archived by the owner on Jun 17, 2019. It is now read-only.

Make Hash independent of char signedness#5

Closed
laanwj wants to merge 1 commit intobitcoin-core:bitcoin-forkfrom
laanwj:bitcoin-fork
Closed

Make Hash independent of char signedness#5
laanwj wants to merge 1 commit intobitcoin-core:bitcoin-forkfrom
laanwj:bitcoin-fork

Conversation

@laanwj
Copy link
Copy Markdown
Member

@laanwj laanwj commented Jun 7, 2014

Solves an elusive issue with BloomFilterPolicy and interoperability of data files between platforms with signed char (such as x86) and platforms with unsigned char (such as ARM) (see bitcoin/bitcoin#2293).

  • Add casts to Hash() to treat chars as unsigned (I think this was intended).
  • Add tests for Hash.
  • Change name of algoritm in BloomFilterPolicy::Name to make sure that filters are recomputed; this provides forward and backward compatibility. BloomFilterPolicy has the only use of Hash() that is stored to disk.

I suppose we want to take this upstream first, but those that need this can already apply this patch.

@sipa
Copy link
Copy Markdown
Contributor

sipa commented Jun 7, 2014

Nice debugging!

I wonder whether Hash() was intended to operate on unsigned data. If so, why doesn't it take an unsigned char pointer as input? IIRC, on x86 char is signed.

@laanwj
Copy link
Copy Markdown
Member Author

laanwj commented Jun 7, 2014

@sipa Yes, on x86 char is signed. However at the top of the function it operates on unsigned data (uint32_t, DecodeFixed32). Hence I assume that this is the intent in the bottom part as well.

Changing the type of the parameter is certainly a possibility too but I wanted to limit code impact as much as possible.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this new code? 2011 seems a bit in the past...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it has new code, but I copied the framework (including the comment) from another _test.cc file.

@laanwj
Copy link
Copy Markdown
Member Author

laanwj commented Jun 7, 2014

Solves an illusive issue with BloomFilterPolicy and interoperability
of data files between platforms with signed char (such as x86) and
platforms with unsigned char (such as ARM).

- Add casts to Hash() to treat chars as unsigned (I think this was
  intended)
- Add tests for Hash
- Change name of algoritm in BloomFilterPolicy::Name to make sure that
  filters are recomputed; this provides forward and backward
  compatibility
@sipa
Copy link
Copy Markdown
Contributor

sipa commented Jun 8, 2014

Hmm, what is the behaviour of "filters being recomputed". Does it actively rebuild the bloom filter at load time, or does the filter just fail until the sstable file is rewritten? My guess is the latter, but I haven't verified the code.

If so, I wonder whether effectively disabling bloom filtering for all existing database until files are rewritten will be acceptable upstream. An alternative would be replicating the "buggy" behaviour and making it use signed chars for the padding on every platform.

@laanwj
Copy link
Copy Markdown
Member Author

laanwj commented Jun 9, 2014

I've thought about that, but replicating the buggy behaviour doesn't help, it will create the same problem but in the other direction. There is no way to detect that a database used the wrong hash function.
(well - apart from 'Uses BloomFilterPolicy and is created by leveldb < vXXXX' - but that doesn't tell you whether the database was written with signed or unsigned chars)

So as I see it the only solution is to blanket recompute all bloom filters. Not sure if this requires any manual action, but it indeed seems so, TableBuilder uses a FilterBlockBuilder to build the filter once the table is 'finished'. Seemingly the new function will only be used for new tables, and the filter will be ignored for old tables.

(then again I don't know exactly how the implementation works either... it creates new tables every time the last table is full?)

@sipa
Copy link
Copy Markdown
Contributor

sipa commented Jun 9, 2014

Tables are immutable. They're created once as the result of a compaction, and used until they are removed through another compactions. Compactions happen when log files grow too large, or periodically to keep performance up.

@laanwj
Copy link
Copy Markdown
Member Author

laanwj commented Jun 9, 2014

Ok, then it's clear: only after the next compaction of a table you'll get filter functionality back for that data.
In our case that would be acceptable, as correctness trumps performance, but indeed it's not clear what upstream is going to do. The ball is in their court now.

@dsnark
Copy link
Copy Markdown

dsnark commented Jun 21, 2014

I have tested this patch.

  • built bitcoind on ARMv7
  • copied in an x86 chainstate

Node was completely borked with connect block errors.

  • applied this patch and built on ARMv7
  • copied in an x86 chainstate

Loaded the node on ARMv7 with no issues, sync continued and remains stable.

@laanwj
Copy link
Copy Markdown
Member Author

laanwj commented Oct 21, 2014

This is part of leveldb upstream now; see bitcoin/bitcoin#5093

@laanwj laanwj closed this Oct 21, 2014
droark pushed a commit to droark/leveldb-1 that referenced this pull request Oct 6, 2018
8d4eb08 Add HasAcceleratedCRC32C to port_win.h (Cory Fields)
77cfbfd crc32: move helper functions out of port_posix_sse.cc (Cory Fields)
4c1e9e0 silence compiler warnings about uninitialized variables (Cory Fields)

Pull request description:

  Addresses bitcoin-core#4.

  As this file is compiled with sse42 flags, it's possible that the feature discovery ends up using an unsupported instruction at runtime.

  Fix this by adding CanAccelerateCRC32C to the port api, and requiring that it be checked before using AcceleratedCRC32C.

Tree-SHA512: 166cc0f4758bc0f22adda2126acad83e0251605a3a14d695fbb34a1d40f2328c4d938fbdcd624964281e6b9fcb3b233d3a8bde010ab889d82ae4f94479c6e545
droark pushed a commit to droark/leveldb-1 that referenced this pull request Oct 6, 2018
- use "#if defined(foo)" rather than "#if foo"
- Use the same guard for the cpuid header and the function
droark pushed a commit to droark/leveldb-1 that referenced this pull request Oct 6, 2018
…bitcoin-core#5.

8b1cd37 fixup define checks. Cleans up some oopses from bitcoin-core#5. (Cory Fields)

Pull request description:

  - use "#if defined(foo)" rather than "#if foo"
  - Use the same guard for the cpuid header and the function

Tree-SHA512: fe83895055faf9f5491b9af44262a4dc15d9f56ec8f818e7d66c1002bb6568a90345662828abc7baab0772baa646f9cf13f8ba586ebad5fc3678731b27585885
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants