Fixes subscript 0 (&var[0]) where should use (var.data()) instead. #9804

Merged
merged 12 commits into from Jul 12, 2017

Conversation

Projects
None yet
10 participants
Contributor

JeremyRubin commented Feb 19, 2017 edited

This PR fixes up a couple places where the codebase has semi-risky uses of 0 subscript. I don't think any of these are actively bugs, but some of them are unsafe (or unexpected-errors) for unsanitized inputs (such as in addrdb or key.cpp).

This also allows us to eliminate some bounds checks, so I guess this is a net-negative patch. I don't think any of those bounds checks are performance sensitive, but worth double checking in review.

p.s. I also fix some confusing code in murmurhash (with non obvious offset-math) as it took me a little bit to see that it was correct, could separate that out if desired.

src/utilstrencodings.cpp
@@ -229,7 +229,7 @@ vector<unsigned char> DecodeBase64(const char* p, bool* pfInvalid)
string DecodeBase64(const string& str)
{
vector<unsigned char> vchRet = DecodeBase64(str.c_str());
- return (vchRet.size() == 0) ? string() : string((const char*)vchRet.data(), vchRet.size());
+ return string((const char*)vchRet.data(), vchRet.size());
@dcousens

dcousens Feb 20, 2017

Contributor

std::string maybe?

Contributor

JeremyRubin commented Feb 20, 2017

Sorry for the bit of line noise, there was some UB when I removed some of the branches (memset/memcpy/read/write 0 bytes on nullptr/invalid ptr) that was causing one of the builds (with more aggressive runtime checks) to fail. I added them back in and all seems well now.

Owner

laanwj commented Feb 20, 2017

Whoa, good catch.

I'm shocked that &vch[0] use was still so rampant in the code. We'd introduced begin_ptr with the intention to replace those, and phased even that out again with C++11 .data().

Anyhow, concept ACK.

src/serialize.h
@@ -571,7 +571,7 @@ void Serialize_impl(Stream& os, const prevector<N, T>& v, const unsigned char&)
{
WriteCompactSize(os, v.size());
if (!v.empty())
- os.write((char*)&v[0], v.size() * sizeof(T));
+ os.write((char*) v.data(), v.size() * sizeof(T));
@laanwj

laanwj Feb 20, 2017

Owner

I wonder why we're casting to (char*) here iso (const char*), even though these are write operations and the type passed in is const.

There's a couple more of these. Though seems out of scope to fix in this pull.

@JeremyRubin

JeremyRubin Feb 20, 2017

Contributor

Yeah that seems obviously incorrect, but I think I understand why it was done (or at least have a theory). Because the methods are all generic w.r.t. the stream type, they probably are made to work with some arbitrary stream that can't accept a const char *. Hopefully not the case anymore...

Anyways, probably best to fix those up separately as you suggest.

@laanwj

laanwj Feb 21, 2017

Owner

Sure, but it would be wrong for a stream write to take a char* i.s.o. const char*, so fixing those just means the const-correctness spreads.

Contributor

JeremyRubin commented Feb 20, 2017

There are still a few more places that were trickier/not possible to correct (where the type passed in is templated, such that it would be a pointer or a vector).

I also didn't fix anything in subdirectories, because I felt this PR was getting a bit large already, and there are a few you can still grep for pretty easily there:

❯ grep '\&v[A-z0-9_]*\[0\]' */*.cpp | nl
    1	qt/signverifymessagedialog.cpp:    ui->signatureOut_SM->setText(QString::fromStdString(EncodeBase64(&vchSig[0], vchSig.size())));
    2	rpc/misc.cpp:    return EncodeBase64(&vchSig[0], vchSig.size());
    3	script/ismine.cpp:        CRIPEMD160().Write(&vSolutions[0][0], vSolutions[0].size()).Finalize(hash.begin());
    4	script/sigcache.cpp:        CSHA256().Write(nonce.begin(), 32).Write(hash.begin(), 32).Write(&pubkey[0], pubkey.size()).Write(&vchSig[0], vchSig.size()).Finalize(entry.begin());
    5	script/sign.cpp:        CRIPEMD160().Write(&vSolutions[0][0], vSolutions[0].size()).Finalize(h160.begin());
    6	script/standard.cpp:            CHash160().Write(&vSolutions[0][0], vSolutions[0].size()).Finalize(h160);
    7	test/getarg_tests.cpp:    ParseParameters(vecChar.size(), &vecChar[0]);
    8	test/skiplist_tests.cpp:        BOOST_CHECK(vIndex[from].GetAncestor(0) == &vIndex[0]);
    9	wallet/crypter.cpp:    size_t nLen = enc.Encrypt(&vchPlaintext[0], vchPlaintext.size(), &vchCiphertext[0]);
   10	wallet/crypter.cpp:    nLen = dec.Decrypt(&vchCiphertext[0], vchCiphertext.size(), &vchPlaintext[0]);
   11	wallet/rpcwallet.cpp:    return EncodeBase64(&vchSig[0], vchSig.size());
   12	wallet/wallet.cpp:    GetStrongRandBytes(&vMasterKey[0], WALLET_CRYPTO_KEY_SIZE);

(there are a lot more worth checking if you don't use the hungarian notation v prefix as a filter)

src/serialize.h
@@ -549,7 +549,7 @@ void Serialize(Stream& os, const std::basic_string<C>& str)
{
WriteCompactSize(os, str.size());
if (!str.empty())
- os.write((char*)&str[0], str.size() * sizeof(str[0]));
+ os.write((char*) str.data(), str.size() * sizeof(typename std::basic_string<C>::value_type));
@TheBlueMatt

TheBlueMatt Feb 20, 2017 edited

Contributor

AFAICT, this is equivalent to sizeof(C) (std::basic_string<C>::value_type is std::basic_string<C, std::char_traits<C>>::value_type which is std::char_traits<C>::char_type, which is C).

@JeremyRubin

JeremyRubin Feb 22, 2017

Contributor

Yep, I was aware :) I felt that the version I chose was more similar to the existing code (the size of whatever str[0] returns), but if everyone is comfortable with sizeof(C) I can change it.

@JeremyRubin

JeremyRubin Feb 22, 2017

Contributor

👍 fixed

src/utilstrencodings.cpp
@@ -229,7 +229,7 @@ vector<unsigned char> DecodeBase64(const char* p, bool* pfInvalid)
string DecodeBase64(const string& str)
{
vector<unsigned char> vchRet = DecodeBase64(str.c_str());
- return (vchRet.size() == 0) ? string() : string((const char*)&vchRet[0], vchRet.size());
+ return std::string((const char*)vchRet.data(), vchRet.size());
@TheBlueMatt

TheBlueMatt Feb 21, 2017

Contributor

Hmm, based on cppreference I cant tell if this is well-defined with vchRet.size() == 0 or not.

@dcousens

dcousens Feb 21, 2017 edited

Contributor

From my understanding (it doesn't seem ambiguous, but, it certainly isn't definitive), a len of 0 is well-defined, and simply an empty string. Aka, equivalent to std::string().
I don't have a definitive statement from the reference.

@laanwj

laanwj Feb 21, 2017

Owner

Yes, a string with size 0 is well-defined. Calling .data() on a string of size 0 is also defined. This should be ok.

@sipa

sipa Feb 21, 2017

Owner

The question is whether calling std::string(garbage_pointer, 0) is well-defined. I can't tell from what cppreference.com says (The behavior is undefined if s does not point at an array of at least count elements of CharT, including the case when s is a null pointer.).

As an alternative, what about std::string(vchRet.begin(), vchRet.end())?

@laanwj

laanwj Feb 21, 2017 edited

Owner

I read that as: If the count is 0, it always points at least 0 elements, thus "point at an array of at least count elements of CharT" is always satisfied.

Alternatively: if count is 0 it would be wrong to dereference the pointer's address at all.

@sipa

sipa Feb 21, 2017 edited

Owner

I'm not convinced that "a pointer to an array of 0 elements of type T" can be any pointer. I think it's extremely unlikely to matter, and that every C++ std library effectively is implemented in a way that makes the constructor, when invoked with (pointer, 0) effectively ignore pointer, however.

@TheBlueMatt

TheBlueMatt Feb 21, 2017

Contributor

I mean when cppreference is ambiguous thats probably a good sign that trying to parse a wiki is going to get your an incorrect result...best to go to the standard if in doubt in that case. :/

@JeremyRubin

JeremyRubin Feb 22, 2017

Contributor

I don't think it is can be a garbage pointer. vector.data() is guaranteed to return a valid range (even when empty), which I think would require that it have proper alignment.

@laanwj

laanwj Feb 22, 2017

Owner

Just to be complete: this is not the same as std::string(vchRet.begin(), vchRet.end()) in the general case. When creating an iterator range, if the type of vch is different from the type of string, a per-element cast will happen. So string.size() will be vch.size(). Usual issues with casting between signed and unsigned apply.

std::string((const char*)vchRet.data(), vchRet.size()) however just return the memory area with all the elements of the vector as a string. So string.size() will be vch.size()*sizeof(vch::value_type). It is likely also more efficient.

Member

gmaxwell commented Feb 21, 2017

Concept ACK.

Contributor

practicalswift commented Feb 28, 2017

Concept ACK! 👍

Contributor

JeremyRubin commented Mar 27, 2017

Rebased.

src/hash.cpp
- const uint8_t* tail = (const uint8_t*)(&vDataToHash[0] + nblocks * 4);
+ //----------
+ // tail
+ const uint8_t* tail = (const uint8_t*)(vDataToHash.data() + nblocks * 4);
@ryanofsky

ryanofsky Mar 27, 2017

Contributor

In commit: "Cleanup (safe, it was checked) 0-subscript in MurmurHash3"

Could you drop this (const uint8_t*) cast here? It shouldn't be needed since we already call vDataToHash.data() with no cast above on line 27.

src/utilstrencodings.cpp
@@ -227,8 +227,8 @@ std::vector<unsigned char> DecodeBase64(const char* p, bool* pfInvalid)
std::string DecodeBase64(const std::string& str)
{
- std::vector<unsigned char> vchRet = DecodeBase64(str.c_str());
- return (vchRet.size() == 0) ? std::string() : std::string((const char*)&vchRet[0], vchRet.size());
+ vector<unsigned char> vchRet = DecodeBase64(str.c_str());
@ryanofsky

ryanofsky Mar 27, 2017

Contributor

In commit "Fix 0-subscript in utilstrencodings.cpp"

Should put back std:: prefixes.

src/utilstrencodings.cpp
@@ -227,8 +227,8 @@ std::vector<unsigned char> DecodeBase64(const char* p, bool* pfInvalid)
std::string DecodeBase64(const std::string& str)
{
- vector<unsigned char> vchRet = DecodeBase64(str.c_str());
- return (vchRet.size() == 0) ? string() : string((const char*)vchRet.data(), vchRet.size());
+ std::vector<unsigned char> vchRet = DecodeBase64(str.c_str());
@ryanofsky

ryanofsky Mar 27, 2017

Contributor

In commit "Remove uneccessary branches in utilstrencodings string constructors."

Some of the changes in this commit should be squashed into previous commit "In commit "Fix 0-subscript in utilstrencodings.cpp."

Also missing an n in unnecessary.

src/utilstrencodings.cpp
@@ -414,8 +414,13 @@ std::vector<unsigned char> DecodeBase32(const char* p, bool* pfInvalid)
std::string DecodeBase32(const std::string& str)
{
+<<<<<<< 1f0401c11315b5090cfe2b37f6338205e2984372
@ryanofsky

ryanofsky Mar 27, 2017

Contributor

In commit "Fix 0-subscript in utilstrencodings.cpp"

Bad merge here.

src/streams.h
@@ -385,7 +385,7 @@ class CDataStream
{
// Special case: stream << stream concatenates like stream += stream
if (!vch.empty())
- s.write((char*)&vch[0], vch.size() * sizeof(vch[0]));
+ s.write((char*) vch.data(), vch.size() * sizeof(typename decltype(vch)::value_type));
@ryanofsky

ryanofsky Mar 27, 2017

Contributor

In commit "Fix subscript-0 in streams.h"

It would seem more direct (and shorter) to write sizeof(*vch.data()).

Contributor

JeremyRubin commented Mar 28, 2017

@ryanofsky feedback addressed.

@ryanofsky

utACK 275caff. Only changes since previous review (69fd8ab) were addressing feedback and combining / rearranging some of the commits.

Owner

laanwj commented Jun 26, 2017

Needs rebase again (sorry).

Contributor

JeremyRubin commented Jul 8, 2017

@laanwj rebased.

The addrdb commit went away because of cf68a48

Looks good to me, utACK

@MeshCollider MeshCollider added a commit to MeshCollider/bitcoin that referenced this pull request Jul 11, 2017

@MeshCollider MeshCollider Changing &vec[i] to vec.data() + i, what #9804 missed ea88711

@MeshCollider MeshCollider added a commit to MeshCollider/bitcoin that referenced this pull request Jul 11, 2017

@MeshCollider MeshCollider Changing &vec[i] to vec.data() + i, what #9804 missed eccc270

@MeshCollider MeshCollider added a commit to MeshCollider/bitcoin that referenced this pull request Jul 11, 2017

@MeshCollider MeshCollider Changing &vec[i] to vec.data() + i, what #9804 missed 62acd9d
@ryanofsky

utACK 30ac768. Changes since previous review were addrdb changes going away and some rearranged commits.

Owner

sipa commented Jul 11, 2017

utACK 30ac768

Contributor

TheBlueMatt commented Jul 11, 2017

utACK 30ac768

Owner

sipa commented Jul 12, 2017

utACK 30ac768

@sipa sipa merged commit 30ac768 into bitcoin:master Jul 12, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@sipa sipa added a commit that referenced this pull request Jul 12, 2017

@sipa sipa Merge #9804: Fixes subscript 0 (&var[0]) where should use (var.data()…
…) instead.


30ac768 Fix subscript[0] potential bugs in key.cpp (Jeremy Rubin)
4b1c0f2 Remove unnecessary branches in utilstrencodings string constructors. (Jeremy Rubin)
e19db7b Fix subscript[0] in utilstrencodings.cpp (Jeremy Rubin)
bc2e7fd Fix subscript[0] in streams.h (Jeremy Rubin)
4cac0d1 Fix subscript[0] in validation.cpp (Jeremy Rubin)
ac658e5 Fix subscript[0] in torcontrol (Jeremy Rubin)
b6856eb Fix subscript[0] in netaddress.cpp (Jeremy Rubin)
361d952 Fix subscript[0] in base58.cpp (Jeremy Rubin)
6896dbf Cleanup (safe, it was checked) subscript[0] in MurmurHash3 (and cleanup MurmurHash3 to be more clear). (Jeremy Rubin)
96f2119 Fix subscript[0] in compressor.cpp (Jeremy Rubin)
500710b Fix 2 subscript[0] bugs in pubkey.cpp, and eliminate one extra size check (Jeremy Rubin)
e0451e3 Fix subscript[0] bug in net.cpp if GetGroup returns a 0-sized vector (Jeremy Rubin)

Tree-SHA512: 5b9103652cf8c615bd8f4f32b3573d291d6b67c39e0308ce00100bc6625f346e8e016b4c999f4f34f5c37ae059490a83c3b513deb21f838af785227d06e02362
479afa0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment