New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite addrdb with less duplication using CHashVerifier #10248
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK d76b7ac079e140948d589f5c53a4af15ddcad6f5
src/validation.cpp
Outdated
@@ -1455,19 +1455,18 @@ bool UndoReadFromDisk(CBlockUndo& blockundo, const CDiskBlockPos& pos, const uin | |||
|
|||
// Read block | |||
uint256 hashChecksum; | |||
CHashVerifier<CAutoFile> verifier(&filein); // We need a CHashVerifier as reserializing may lose data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this comment is wrong if this is pulled out of #10195.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed; fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 0e8d567209ba198f64c7657acb0e05018bf0c5be
src/hash.h
Outdated
char data[1024]; | ||
while (nSize > 0) { | ||
size_t now = std::min<size_t>(nSize, 1024); | ||
source->read(data, now); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Introduce CHashVerifier to hash read data"
Could replace these two lines with just read(data, now);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
catch (const std::exception& e) { | ||
// de-serialization has failed, ensure addrman is left in a clean state | ||
bool ret = DeserializeDB(ssPeers, addr, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Deduplicate addrdb.cpp and use CHashWriter/Verifier"
Not your change, but it isn't clear to me why this why this function is a part of the CAddrDB class and is non-static. It doesn't seem to access any state or be directly related to the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made it static.
return error("%s: Failed to open file %s", __func__, pathAddr.string()); | ||
|
||
// use file size to size memory buffer | ||
uint64_t fileSize = fs::file_size(pathAddr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Deduplicate addrdb.cpp and use CHashWriter/Verifier"
It seems this logic to read the file into a memory buffer before deserializing it has gone away. Any idea why it was written that way in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's to protect against the case where the data is unusually expensive to deserialize. I'm not particularly worried about (non-adverserial) corruption that causes expensive deserialization without causing an error somewhere along the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the point was to be able to check the checksum before starting any deserialization work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I don't think that's particularly interesting, but if that remains a requirement, I'll drop this PR.
src/hash.h
Outdated
CHashVerifier(Source* source_) : CHashWriter(source_->GetType(), source_->GetVersion()), source(source_) {} | ||
|
||
void read(char* pch, size_t nSize) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check for nSize == 0
to avoid problems like #10250
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the invoked read and write method already need to deal with nSize==0 anyway?
src/hash.h
Outdated
Source* source; | ||
|
||
public: | ||
CHashVerifier(Source* source_) : CHashWriter(source_->GetType(), source_->GetVersion()), source(source_) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass a reference instead, so that it can't be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like using references for objects that persist. This risks calling CHashVerifier(function_that_returns_a_Source()).
utACK, other than the nits |
I think I've addressed or responded to all comments. |
The first commits of this PR are merged as part of #10195. I'm going to change the title to just reflect the effect of the last remaining commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to leave comment for future PR.
utACK cf68a48
pathBanlist = GetDataDir() / "banlist.dat"; | ||
// Write and commit header, data | ||
try { | ||
CHashWriter hasher(SER_DISK, CLIENT_VERSION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems it may be nicer to add a more analygous hasher to CHashVerifyer here - why not hash while writing the data instead of serializing twice?
utACK cf68a48 |
cf68a48 Deduplicate addrdb.cpp and use CHashWriter/Verifier (Pieter Wuille) Tree-SHA512: 0301332e797f64da3a1588c9ebaf533af58da41e38f8a64206bff20102c5e82c2a7c630ca3150cf451b2ccf4acb3dd45e44259b6ba15e92786e9e9a2b225bd2f
…Verifier cf68a48 Deduplicate addrdb.cpp and use CHashWriter/Verifier (Pieter Wuille) Tree-SHA512: 0301332e797f64da3a1588c9ebaf533af58da41e38f8a64206bff20102c5e82c2a7c630ca3150cf451b2ccf4acb3dd45e44259b6ba15e92786e9e9a2b225bd2f
…Verifier cf68a48 Deduplicate addrdb.cpp and use CHashWriter/Verifier (Pieter Wuille) Tree-SHA512: 0301332e797f64da3a1588c9ebaf533af58da41e38f8a64206bff20102c5e82c2a7c630ca3150cf451b2ccf4acb3dd45e44259b6ba15e92786e9e9a2b225bd2f
…Verifier cf68a48 Deduplicate addrdb.cpp and use CHashWriter/Verifier (Pieter Wuille) Tree-SHA512: 0301332e797f64da3a1588c9ebaf533af58da41e38f8a64206bff20102c5e82c2a7c630ca3150cf451b2ccf4acb3dd45e44259b6ba15e92786e9e9a2b225bd2f
…Verifier cf68a48 Deduplicate addrdb.cpp and use CHashWriter/Verifier (Pieter Wuille) Tree-SHA512: 0301332e797f64da3a1588c9ebaf533af58da41e38f8a64206bff20102c5e82c2a7c630ca3150cf451b2ccf4acb3dd45e44259b6ba15e92786e9e9a2b225bd2f
…Verifier cf68a48 Deduplicate addrdb.cpp and use CHashWriter/Verifier (Pieter Wuille) Tree-SHA512: 0301332e797f64da3a1588c9ebaf533af58da41e38f8a64206bff20102c5e82c2a7c630ca3150cf451b2ccf4acb3dd45e44259b6ba15e92786e9e9a2b225bd2f
…+ Add file syncing logging and error handling d593e7e Migrate remaining FLATDATA serialization, we are natively supporting it now. (furszy) 051970d addrdb: Remove temporary files created in SerializeFileDB. Fixes non-determinism in unit tests. (practicalswift) 4be426f Add logging and error handling for file syncing (Wladimir J. van der Laan) 8662fb3 Remove unused double_safe_addition & double_safe_multiplication functions. (furszy) a7c3885 Add native support for serializing char arrays without FLATDATA (Pieter Wuille) 459ecb9 Deduplicate addrdb.cpp and use CHashWriter/Verifier (Pieter Wuille) a5d2f8a Fix copypasted comment. (Pavel Janík) Pull request description: Digging down the peers and ban databases commits path in upstream found some good stuff, back ported the following PRs: * bitcoin#9216. * bitcoin#10248. * bitcoin#12740. * bitcoin#13039. * bitcoin#16212. ACKs for top commit: random-zebra: ACK d593e7e Fuzzbawls: ACK d593e7e Tree-SHA512: 8d567c68a2c36f43c749d5faa4b7f8a299dbfdda133495df434ac642c1a2ac96dc2259123ad85decc9039c62c46602665f6be4aadf6d5bf729344c51692ec60e
This rewrites addrdb.cpp to use CHashVerifier (from #10195), and combines the code for addrdb and bandb to the extent possible.