New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several changes to mruset #6064

Merged
merged 4 commits into from May 1, 2015

Conversation

Projects
None yet
4 participants
@sipa
Member

sipa commented Apr 25, 2015

Including:

  • @gavinandresen's #6066
  • Maintain the mru order using iterators rather than copies.
  • Use a ring buffer (implemented using a statically allocated vector) rather than a deque for keeping the iterators.
  • Replace the mruset unit tests with a single stronger unified one.

This reduces the memory consumption of a maximal setAddrKnown by +- half.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Apr 25, 2015

Member

The HMAC for address inventory hashing is absolute overkill, but was easy to write. I doubt the performance matters.

Member

sipa commented Apr 25, 2015

The HMAC for address inventory hashing is absolute overkill, but was easy to write. I doubt the performance matters.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Apr 25, 2015

Member

Rebased on top of @gavinandresen's #6066 (so that I don't need the address hasher here).

Removed the boost::unordered_set usage for now, as it is incompatible with older boost versions (which don't have unordered_set::reserve).

Member

sipa commented Apr 25, 2015

Rebased on top of @gavinandresen's #6066 (so that I don't need the address hasher here).

Removed the boost::unordered_set usage for now, as it is incompatible with older boost versions (which don't have unordered_set::reserve).

Show outdated Hide outdated src/mruset.h
{
set.clear();
queue.clear();
std::vector<iterator>(nMaxSizeIn, set.end()).swap(order);

This comment has been minimized.

@gavinandresen

gavinandresen Apr 26, 2015

Contributor

I'm probably just an old fuddy-duddy...
... but I find the "use swap to assign" idiom hard to review.
Why not just:
order.assign(nMaxSizeIn, set.end());

@gavinandresen

gavinandresen Apr 26, 2015

Contributor

I'm probably just an old fuddy-duddy...
... but I find the "use swap to assign" idiom hard to review.
Why not just:
order.assign(nMaxSizeIn, set.end());

This comment has been minimized.

@sipa

sipa Apr 26, 2015

Member

It's horrible, but it's the only way to make a vector release its memory, afaik.

@sipa

sipa Apr 26, 2015

Member

It's horrible, but it's the only way to make a vector release its memory, afaik.

This comment has been minimized.

@laanwj

laanwj Apr 30, 2015

Member

AFAIK too. Though this is only important if we expect the size to change (become smaller) dynamically.

I'm not sure I like having clear() double as set-maximum-size method. I'd prefer the maximum size to be passed to the constructor and be static over the lifetime of the object. If that's not possible, let's add a special resize() call for resizing (that happens to clear the structure too).

@laanwj

laanwj Apr 30, 2015

Member

AFAIK too. Though this is only important if we expect the size to change (become smaller) dynamically.

I'm not sure I like having clear() double as set-maximum-size method. I'd prefer the maximum size to be passed to the constructor and be static over the lifetime of the object. If that's not possible, let's add a special resize() call for resizing (that happens to clear the structure too).

Show outdated Hide outdated src/mruset.h
set.erase(order[first_used]);
order[first_used] = set.end();
first_used++;
if (first_used == nMaxSize) { first_used = 0; }

This comment has been minimized.

@gavinandresen

gavinandresen Apr 26, 2015

Contributor

Save a line of code with:
first_used = (first_used+1)%nMaxSize;

@gavinandresen

gavinandresen Apr 26, 2015

Contributor

Save a line of code with:
first_used = (first_used+1)%nMaxSize;

This comment has been minimized.

@sipa

sipa Apr 26, 2015

Member

Modulo operator is slow, if it's not a compile-time known power-of-two constant.

@sipa

sipa Apr 26, 2015

Member

Modulo operator is slow, if it's not a compile-time known power-of-two constant.

This comment has been minimized.

@sipa

sipa Apr 26, 2015

Member

Updated to use another one-line construct.

@sipa

sipa Apr 26, 2015

Member

Updated to use another one-line construct.

@gavinandresen

This comment has been minimized.

Show comment
Hide comment
@gavinandresen

gavinandresen Apr 26, 2015

Contributor

Code review ACK-- note I changed the CRollingBloomFilter implementation commit in #6066 .

Measured heap usage on OSX for a 1,000-entry mruset : 72,192 bytes.
(before this change: 100,992 bytes).

I'm running a node with these changes now.

Contributor

gavinandresen commented Apr 26, 2015

Code review ACK-- note I changed the CRollingBloomFilter implementation commit in #6066 .

Measured heap usage on OSX for a 1,000-entry mruset : 72,192 bytes.
(before this change: 100,992 bytes).

I'm running a node with these changes now.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Apr 26, 2015

Member

Rebased on updated #6066, and addressed a nit.

Member

sipa commented Apr 26, 2015

Rebased on updated #6066, and addressed a nit.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Apr 26, 2015

Member

@gavinandresen I measure around a 50 byte overhead per mruset entry now. Before, it was 50 bytes + double storage of the elements.

Member

sipa commented Apr 26, 2015

@gavinandresen I measure around a 50 byte overhead per mruset entry now. Before, it was 50 bytes + double storage of the elements.

@sandakersmann

This comment has been minimized.

Show comment
Hide comment
@sandakersmann

sandakersmann Apr 27, 2015

Contributor

Heading in src/mruset.h should be:
// Copyright (c) 2012-2015 The Bitcoin Core developers

Contributor

sandakersmann commented Apr 27, 2015

Heading in src/mruset.h should be:
// Copyright (c) 2012-2015 The Bitcoin Core developers

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Apr 27, 2015

Member

@sandakersmann Why? It's only worked on in 2012 and 2015.

Member

sipa commented Apr 27, 2015

@sandakersmann Why? It's only worked on in 2012 and 2015.

@sandakersmann

This comment has been minimized.

Show comment
Hide comment
@sandakersmann

sandakersmann Apr 27, 2015

Contributor

@sipa Last update was in 2014.

Contributor

sandakersmann commented Apr 27, 2015

@sipa Last update was in 2014.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Apr 27, 2015

Member

@sandakersmann Ok, fixed.

Member

sipa commented Apr 27, 2015

@sandakersmann Ok, fixed.

Show outdated Hide outdated src/bloom.cpp
@@ -40,6 +40,17 @@ nFlags(nFlagsIn)
{
}
// Private constructor used by CRollingBloomFilter
CBloomFilter::CBloomFilter(unsigned int nElements, double nFPRate, unsigned int nTweakIn) :
vData((unsigned int)(-1 / LN2SQUARED * nElements * log(nFPRate)) / 8),

This comment has been minimized.

@laanwj

laanwj Apr 30, 2015

Member

Nit: indentation

@laanwj

laanwj Apr 30, 2015

Member

Nit: indentation

Show outdated Hide outdated src/bloom.cpp
CRollingBloomFilter::CRollingBloomFilter(unsigned int nElements, double fpRate, unsigned int nTweak) :
b1(nElements*2, fpRate, nTweak), b2(nElements*2, fpRate, nTweak)
{
// Implemented using two bloom filters of nElements each.

This comment has been minimized.

@laanwj

laanwj Apr 30, 2015

Member

Comment should be two filters of nElements*2 each, if I understand the above initialization correctly?

@laanwj

laanwj Apr 30, 2015

Member

Comment should be two filters of nElements*2 each, if I understand the above initialization correctly?

Show outdated Hide outdated src/bloom.cpp
b2.clear();
b1.insert(vKey);
b2.insert(vKey);
++nInsertions;

This comment has been minimized.

@laanwj

laanwj Apr 30, 2015

Member

I'd prefer to wrap this counter around at nBloomSize, this avoids some % operations, and also avoids trouble with integer overflows when UIINT_MAX is not divisible by nBloomSize.

@laanwj

laanwj Apr 30, 2015

Member

I'd prefer to wrap this counter around at nBloomSize, this avoids some % operations, and also avoids trouble with integer overflows when UIINT_MAX is not divisible by nBloomSize.

Show outdated Hide outdated src/main.cpp
@@ -4799,8 +4799,9 @@ bool SendMessages(CNode* pto, bool fSendTrickle)
BOOST_FOREACH(const CAddress& addr, pto->vAddrToSend)
{
// returns true if wasn't already contained in the set

This comment has been minimized.

@laanwj

laanwj Apr 30, 2015

Member

This comment is no longer relevant: contains is self-documenting in contrast to insert's second return value

@laanwj

laanwj Apr 30, 2015

Member

This comment is no longer relevant: contains is self-documenting in contrast to insert's second return value

}
}
}
// 16-bit permutation function
int static permute(int n)

This comment has been minimized.

@laanwj

laanwj Apr 30, 2015

Member

I've checked that the old test still passes on the new code (a drawback of replacing the tests as well as the code that is tested 😁)

@laanwj

laanwj Apr 30, 2015

Member

I've checked that the old test still passes on the new code (a drawback of replacing the tests as well as the code that is tested 😁)

Rolling bloom filter class
For when you need to keep track of the last N items
you've seen, and can tolerate some false-positives.

Rebased-by: Pieter Wuille <pieter.wuille@gmail.com>
@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Apr 30, 2015

Member

Made several changes:

  • The MRU size of mruset is now a constant (set at constructor time).
  • Restyled some of the rolling bloom filter code.

I think there is a significant improvement possible to the size of the rolling bloom filter, by using a different tweak for multiple filters, and checking several filters for a 'contains'. I'll check the math and write a separate PR for that if found useful.

Member

sipa commented Apr 30, 2015

Made several changes:

  • The MRU size of mruset is now a constant (set at constructor time).
  • Restyled some of the rolling bloom filter code.

I think there is a significant improvement possible to the size of the rolling bloom filter, by using a different tweak for multiple filters, and checking several filters for a 'contains'. I'll check the math and write a separate PR for that if found useful.

gavinandresen and others added some commits Apr 25, 2015

Replace mruset setAddrKnown with CRollingBloomFilter addrKnown
Use a probabilistic bloom filter to keep track of which addresses
we think we have given our peers, instead of a list.

This uses much less memory, at the cost of sometimes failing to
relay an address to a peer-- worst case if the bloom filter happens
to be as full as it gets, 1-in-1,000.

Measured memory usage of a full mruset setAddrKnown: 650Kbytes
Constant memory usage of CRollingBloomFilter addrKnown: 37Kbytes.

This will also help heap fragmentation, because the 37K of storage
is allocated when a CNode is created (when a connection to a peer
is established) and then there is no per-item-remembered memory
allocation.

I plan on testing by restarting a full node with an empty peers.dat,
running a while with -debug=addrman and -debug=net, and making sure
that the 'addr' message traffic out is reasonable.
(suggestions for better tests welcome)
@gavinandresen

This comment has been minimized.

Show comment
Hide comment
@gavinandresen

gavinandresen Apr 30, 2015

Contributor

@sipa: There's a good survey paper by Sasu Tarkoma et al: "Theory and Practice of Bloom Filters for Distributed Systems." Which pointed me to: https://home.kookmin.ac.kr/~mkyoon/publications/2010TKDE.pdf
... which has a more complicated scheme (maybe similar to what you're thinking).

Contributor

gavinandresen commented Apr 30, 2015

@sipa: There's a good survey paper by Sasu Tarkoma et al: "Theory and Practice of Bloom Filters for Distributed Systems." Which pointed me to: https://home.kookmin.ac.kr/~mkyoon/publications/2010TKDE.pdf
... which has a more complicated scheme (maybe similar to what you're thinking).

@gavinandresen

This comment has been minimized.

Show comment
Hide comment
@gavinandresen

gavinandresen Apr 30, 2015

Contributor

ACK

Contributor

gavinandresen commented Apr 30, 2015

ACK

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj May 1, 2015

Member

I think there is a significant improvement possible to the size of the rolling bloom filter, by using a different tweak for both, and checking both filters for a 'contains'. I'll check the math and write a separate PR for that if found useful.

@sipa sounds great to me, although let's do that in a later pull, this is already a great improvement compared to the current state, so let's get this reviewed and merged.

Member

laanwj commented May 1, 2015

I think there is a significant improvement possible to the size of the rolling bloom filter, by using a different tweak for both, and checking both filters for a 'contains'. I'll check the math and write a separate PR for that if found useful.

@sipa sounds great to me, although let's do that in a later pull, this is already a great improvement compared to the current state, so let's get this reviewed and merged.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj May 1, 2015

Member

Tested ACK

Member

laanwj commented May 1, 2015

Tested ACK

@laanwj laanwj merged commit f46a680 into bitcoin:master May 1, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request May 1, 2015

Merge pull request #6064
f46a680 Better mruset unit test (Pieter Wuille)
d4d5022 Use ring buffer of set iterators instead of deque of copies in mruset (Pieter Wuille)
d81cff3 Replace mruset setAddrKnown with CRollingBloomFilter addrKnown (Gavin Andresen)
69a5f8b Rolling bloom filter class (Gavin Andresen)

Cryptoslave added a commit to Anoncoin/oldrepo-backup that referenced this pull request Mar 8, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment