Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Fixes subscript 0 (&var[0]) where should use (var.data()) instead. #9804
Conversation
fanquake
added the
Refactoring
label
Feb 20, 2017
| @@ -229,7 +229,7 @@ vector<unsigned char> DecodeBase64(const char* p, bool* pfInvalid) | ||
| string DecodeBase64(const string& str) | ||
| { | ||
| vector<unsigned char> vchRet = DecodeBase64(str.c_str()); | ||
| - return (vchRet.size() == 0) ? string() : string((const char*)vchRet.data(), vchRet.size()); | ||
| + return string((const char*)vchRet.data(), vchRet.size()); |
|
Sorry for the bit of line noise, there was some UB when I removed some of the branches (memset/memcpy/read/write 0 bytes on nullptr/invalid ptr) that was causing one of the builds (with more aggressive runtime checks) to fail. I added them back in and all seems well now. |
|
Whoa, good catch. I'm shocked that &vch[0] use was still so rampant in the code. We'd introduced Anyhow, concept ACK. |
| @@ -571,7 +571,7 @@ void Serialize_impl(Stream& os, const prevector<N, T>& v, const unsigned char&) | ||
| { | ||
| WriteCompactSize(os, v.size()); | ||
| if (!v.empty()) | ||
| - os.write((char*)&v[0], v.size() * sizeof(T)); | ||
| + os.write((char*) v.data(), v.size() * sizeof(T)); |
laanwj
Feb 20, 2017
Owner
I wonder why we're casting to (char*) here iso (const char*), even though these are write operations and the type passed in is const.
There's a couple more of these. Though seems out of scope to fix in this pull.
JeremyRubin
Feb 20, 2017
Contributor
Yeah that seems obviously incorrect, but I think I understand why it was done (or at least have a theory). Because the methods are all generic w.r.t. the stream type, they probably are made to work with some arbitrary stream that can't accept a const char *. Hopefully not the case anymore...
Anyways, probably best to fix those up separately as you suggest.
laanwj
Feb 21, 2017
Owner
Sure, but it would be wrong for a stream write to take a char* i.s.o. const char*, so fixing those just means the const-correctness spreads.
|
There are still a few more places that were trickier/not possible to correct (where the type passed in is templated, such that it would be a pointer or a vector). I also didn't fix anything in subdirectories, because I felt this PR was getting a bit large already, and there are a few you can still grep for pretty easily there:
(there are a lot more worth checking if you don't use the hungarian notation |
| @@ -549,7 +549,7 @@ void Serialize(Stream& os, const std::basic_string<C>& str) | ||
| { | ||
| WriteCompactSize(os, str.size()); | ||
| if (!str.empty()) | ||
| - os.write((char*)&str[0], str.size() * sizeof(str[0])); | ||
| + os.write((char*) str.data(), str.size() * sizeof(typename std::basic_string<C>::value_type)); |
TheBlueMatt
Feb 20, 2017
•
Contributor
AFAICT, this is equivalent to sizeof(C) (std::basic_string<C>::value_type is std::basic_string<C, std::char_traits<C>>::value_type which is std::char_traits<C>::char_type, which is C).
JeremyRubin
Feb 22, 2017
Contributor
Yep, I was aware :) I felt that the version I chose was more similar to the existing code (the size of whatever str[0] returns), but if everyone is comfortable with sizeof(C) I can change it.
| @@ -229,7 +229,7 @@ vector<unsigned char> DecodeBase64(const char* p, bool* pfInvalid) | ||
| string DecodeBase64(const string& str) | ||
| { | ||
| vector<unsigned char> vchRet = DecodeBase64(str.c_str()); | ||
| - return (vchRet.size() == 0) ? string() : string((const char*)&vchRet[0], vchRet.size()); | ||
| + return std::string((const char*)vchRet.data(), vchRet.size()); |
TheBlueMatt
Feb 21, 2017
Contributor
Hmm, based on cppreference I cant tell if this is well-defined with vchRet.size() == 0 or not.
dcousens
Feb 21, 2017
•
Contributor
From my understanding (it doesn't seem ambiguous, but, it certainly isn't definitive), a len of 0 is well-defined, and simply an empty string. Aka, equivalent to std::string().
I don't have a definitive statement from the reference.
laanwj
Feb 21, 2017
Owner
Yes, a string with size 0 is well-defined. Calling .data() on a string of size 0 is also defined. This should be ok.
sipa
Feb 21, 2017
Owner
The question is whether calling std::string(garbage_pointer, 0) is well-defined. I can't tell from what cppreference.com says (The behavior is undefined if s does not point at an array of at least count elements of CharT, including the case when s is a null pointer.).
As an alternative, what about std::string(vchRet.begin(), vchRet.end())?
laanwj
Feb 21, 2017
•
Owner
I read that as: If the count is 0, it always points at least 0 elements, thus "point at an array of at least count elements of CharT" is always satisfied.
Alternatively: if count is 0 it would be wrong to dereference the pointer's address at all.
sipa
Feb 21, 2017
•
Owner
I'm not convinced that "a pointer to an array of 0 elements of type T" can be any pointer. I think it's extremely unlikely to matter, and that every C++ std library effectively is implemented in a way that makes the constructor, when invoked with (pointer, 0) effectively ignore pointer, however.
TheBlueMatt
Feb 21, 2017
Contributor
I mean when cppreference is ambiguous thats probably a good sign that trying to parse a wiki is going to get your an incorrect result...best to go to the standard if in doubt in that case. :/
JeremyRubin
Feb 22, 2017
Contributor
I don't think it is can be a garbage pointer. vector.data() is guaranteed to return a valid range (even when empty), which I think would require that it have proper alignment.
laanwj
Feb 22, 2017
Owner
Just to be complete: this is not the same as std::string(vchRet.begin(), vchRet.end()) in the general case. When creating an iterator range, if the type of vch is different from the type of string, a per-element cast will happen. So string.size() will be vch.size(). Usual issues with casting between signed and unsigned apply.
std::string((const char*)vchRet.data(), vchRet.size()) however just return the memory area with all the elements of the vector as a string. So string.size() will be vch.size()*sizeof(vch::value_type). It is likely also more efficient.
|
Concept ACK. |
|
Concept ACK! |
|
Rebased. |
| - const uint8_t* tail = (const uint8_t*)(&vDataToHash[0] + nblocks * 4); | ||
| + //---------- | ||
| + // tail | ||
| + const uint8_t* tail = (const uint8_t*)(vDataToHash.data() + nblocks * 4); |
ryanofsky
Mar 27, 2017
Contributor
In commit: "Cleanup (safe, it was checked) 0-subscript in MurmurHash3"
Could you drop this (const uint8_t*) cast here? It shouldn't be needed since we already call vDataToHash.data() with no cast above on line 27.
| @@ -227,8 +227,8 @@ std::vector<unsigned char> DecodeBase64(const char* p, bool* pfInvalid) | ||
| std::string DecodeBase64(const std::string& str) | ||
| { | ||
| - std::vector<unsigned char> vchRet = DecodeBase64(str.c_str()); | ||
| - return (vchRet.size() == 0) ? std::string() : std::string((const char*)&vchRet[0], vchRet.size()); | ||
| + vector<unsigned char> vchRet = DecodeBase64(str.c_str()); |
ryanofsky
Mar 27, 2017
Contributor
In commit "Fix 0-subscript in utilstrencodings.cpp"
Should put back std:: prefixes.
| @@ -227,8 +227,8 @@ std::vector<unsigned char> DecodeBase64(const char* p, bool* pfInvalid) | ||
| std::string DecodeBase64(const std::string& str) | ||
| { | ||
| - vector<unsigned char> vchRet = DecodeBase64(str.c_str()); | ||
| - return (vchRet.size() == 0) ? string() : string((const char*)vchRet.data(), vchRet.size()); | ||
| + std::vector<unsigned char> vchRet = DecodeBase64(str.c_str()); |
ryanofsky
Mar 27, 2017
Contributor
In commit "Remove uneccessary branches in utilstrencodings string constructors."
Some of the changes in this commit should be squashed into previous commit "In commit "Fix 0-subscript in utilstrencodings.cpp."
Also missing an n in unnecessary.
| @@ -414,8 +414,13 @@ std::vector<unsigned char> DecodeBase32(const char* p, bool* pfInvalid) | ||
| std::string DecodeBase32(const std::string& str) | ||
| { | ||
| +<<<<<<< 1f0401c11315b5090cfe2b37f6338205e2984372 |
ryanofsky
Mar 27, 2017
Contributor
In commit "Fix 0-subscript in utilstrencodings.cpp"
Bad merge here.
| @@ -385,7 +385,7 @@ class CDataStream | ||
| { | ||
| // Special case: stream << stream concatenates like stream += stream | ||
| if (!vch.empty()) | ||
| - s.write((char*)&vch[0], vch.size() * sizeof(vch[0])); | ||
| + s.write((char*) vch.data(), vch.size() * sizeof(typename decltype(vch)::value_type)); |
ryanofsky
Mar 27, 2017
Contributor
In commit "Fix subscript-0 in streams.h"
It would seem more direct (and shorter) to write sizeof(*vch.data()).
|
@ryanofsky feedback addressed. |
ryanofsky
reviewed
Jun 14, 2017
|
Needs rebase again (sorry). |
JeremyRubin
added some commits
Feb 19, 2017
laanwj
referenced
this pull request
Jul 11, 2017
Closed
Use vector::data() instead of &vch[0] in base58.cpp #10790
MeshCollider
commented
Jul 11, 2017
|
Looks good to me, utACK |
MeshCollider
added a commit
to MeshCollider/bitcoin
that referenced
this pull request
Jul 11, 2017
|
|
MeshCollider |
ea88711
|
MeshCollider
added a commit
to MeshCollider/bitcoin
that referenced
this pull request
Jul 11, 2017
|
|
MeshCollider |
eccc270
|
MeshCollider
added a commit
to MeshCollider/bitcoin
that referenced
this pull request
Jul 11, 2017
|
|
MeshCollider |
62acd9d
|
ryanofsky
reviewed
Jul 11, 2017
utACK 30ac768. Changes since previous review were addrdb changes going away and some rearranged commits.
|
utACK 30ac768 |
|
utACK 30ac768 |
|
utACK 30ac768 |
sipa
merged commit 30ac768
into
bitcoin:master
Jul 12, 2017
1 check passed
sipa
added a commit
that referenced
this pull request
Jul 12, 2017
|
|
sipa |
479afa0
|
JeremyRubin commentedFeb 19, 2017
•
edited
This PR fixes up a couple places where the codebase has semi-risky uses of 0 subscript. I don't think any of these are actively bugs, but some of them are unsafe (or unexpected-errors) for unsanitized inputs (such as in addrdb or key.cpp).
This also allows us to eliminate some bounds checks, so I guess this is a net-negative patch. I don't think any of those bounds checks are performance sensitive, but worth double checking in review.
p.s. I also fix some confusing code in murmurhash (with non obvious offset-math) as it took me a little bit to see that it was correct, could separate that out if desired.