Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several changes to mruset #6064

Merged
merged 4 commits into from May 1, 2015

Conversation

@sipa
Copy link
Member

commented Apr 25, 2015

Including:

  • @gavinandresen's #6066
  • Maintain the mru order using iterators rather than copies.
  • Use a ring buffer (implemented using a statically allocated vector) rather than a deque for keeping the iterators.
  • Replace the mruset unit tests with a single stronger unified one.

This reduces the memory consumption of a maximal setAddrKnown by +- half.

@sipa

This comment has been minimized.

Copy link
Member Author

commented Apr 25, 2015

The HMAC for address inventory hashing is absolute overkill, but was easy to write. I doubt the performance matters.

@sipa

This comment has been minimized.

Copy link
Member Author

commented Apr 25, 2015

Rebased on top of @gavinandresen's #6066 (so that I don't need the address hasher here).

Removed the boost::unordered_set usage for now, as it is incompatible with older boost versions (which don't have unordered_set::reserve).

{
set.clear();
queue.clear();
std::vector<iterator>(nMaxSizeIn, set.end()).swap(order);

This comment has been minimized.

Copy link
@gavinandresen

gavinandresen Apr 26, 2015

Contributor

I'm probably just an old fuddy-duddy...
... but I find the "use swap to assign" idiom hard to review.
Why not just:
order.assign(nMaxSizeIn, set.end());

This comment has been minimized.

Copy link
@sipa

sipa Apr 26, 2015

Author Member

It's horrible, but it's the only way to make a vector release its memory, afaik.

This comment has been minimized.

Copy link
@laanwj

laanwj Apr 30, 2015

Member

AFAIK too. Though this is only important if we expect the size to change (become smaller) dynamically.

I'm not sure I like having clear() double as set-maximum-size method. I'd prefer the maximum size to be passed to the constructor and be static over the lifetime of the object. If that's not possible, let's add a special resize() call for resizing (that happens to clear the structure too).

set.erase(order[first_used]);
order[first_used] = set.end();
first_used++;
if (first_used == nMaxSize) { first_used = 0; }

This comment has been minimized.

Copy link
@gavinandresen

gavinandresen Apr 26, 2015

Contributor

Save a line of code with:
first_used = (first_used+1)%nMaxSize;

This comment has been minimized.

Copy link
@sipa

sipa Apr 26, 2015

Author Member

Modulo operator is slow, if it's not a compile-time known power-of-two constant.

This comment has been minimized.

Copy link
@sipa

sipa Apr 26, 2015

Author Member

Updated to use another one-line construct.

@gavinandresen

This comment has been minimized.

Copy link
Contributor

commented Apr 26, 2015

Code review ACK-- note I changed the CRollingBloomFilter implementation commit in #6066 .

Measured heap usage on OSX for a 1,000-entry mruset : 72,192 bytes.
(before this change: 100,992 bytes).

I'm running a node with these changes now.

@sipa sipa force-pushed the sipa:smallmru branch from 8ba8474 to 1d0997a Apr 26, 2015

@sipa

This comment has been minimized.

Copy link
Member Author

commented Apr 26, 2015

Rebased on updated #6066, and addressed a nit.

@sipa

This comment has been minimized.

Copy link
Member Author

commented Apr 26, 2015

@gavinandresen I measure around a 50 byte overhead per mruset entry now. Before, it was 50 bytes + double storage of the elements.

@sipa sipa force-pushed the sipa:smallmru branch from 1d0997a to 0835dae Apr 26, 2015

@sandakersmann

This comment has been minimized.

Copy link
Contributor

commented Apr 27, 2015

Heading in src/mruset.h should be:
// Copyright (c) 2012-2015 The Bitcoin Core developers

@sipa

This comment has been minimized.

Copy link
Member Author

commented Apr 27, 2015

@sandakersmann Why? It's only worked on in 2012 and 2015.

@sandakersmann

This comment has been minimized.

Copy link
Contributor

commented Apr 27, 2015

@sipa Last update was in 2014.

@sipa sipa force-pushed the sipa:smallmru branch from 0835dae to 5f35e5b Apr 27, 2015

@sipa

This comment has been minimized.

Copy link
Member Author

commented Apr 27, 2015

@sandakersmann Ok, fixed.

@@ -40,6 +40,17 @@ nFlags(nFlagsIn)
{
}

// Private constructor used by CRollingBloomFilter
CBloomFilter::CBloomFilter(unsigned int nElements, double nFPRate, unsigned int nTweakIn) :
vData((unsigned int)(-1 / LN2SQUARED * nElements * log(nFPRate)) / 8),

This comment has been minimized.

Copy link
@laanwj

laanwj Apr 30, 2015

Member

Nit: indentation

CRollingBloomFilter::CRollingBloomFilter(unsigned int nElements, double fpRate, unsigned int nTweak) :
b1(nElements*2, fpRate, nTweak), b2(nElements*2, fpRate, nTweak)
{
// Implemented using two bloom filters of nElements each.

This comment has been minimized.

Copy link
@laanwj

laanwj Apr 30, 2015

Member

Comment should be two filters of nElements*2 each, if I understand the above initialization correctly?

b2.clear();
b1.insert(vKey);
b2.insert(vKey);
++nInsertions;

This comment has been minimized.

Copy link
@laanwj

laanwj Apr 30, 2015

Member

I'd prefer to wrap this counter around at nBloomSize, this avoids some % operations, and also avoids trouble with integer overflows when UIINT_MAX is not divisible by nBloomSize.

@@ -4799,8 +4799,9 @@ bool SendMessages(CNode* pto, bool fSendTrickle)
BOOST_FOREACH(const CAddress& addr, pto->vAddrToSend)
{
// returns true if wasn't already contained in the set

This comment has been minimized.

Copy link
@laanwj

laanwj Apr 30, 2015

Member

This comment is no longer relevant: contains is self-documenting in contrast to insert's second return value

}
}
}

// 16-bit permutation function
int static permute(int n)

This comment has been minimized.

Copy link
@laanwj

laanwj Apr 30, 2015

Member

I've checked that the old test still passes on the new code (a drawback of replacing the tests as well as the code that is tested 😁)

Rolling bloom filter class
For when you need to keep track of the last N items
you've seen, and can tolerate some false-positives.

Rebased-by: Pieter Wuille <pieter.wuille@gmail.com>

@sipa sipa force-pushed the sipa:smallmru branch 2 times, most recently from ab42414 to 2fceed4 Apr 30, 2015

@sipa

This comment has been minimized.

Copy link
Member Author

commented Apr 30, 2015

Made several changes:

  • The MRU size of mruset is now a constant (set at constructor time).
  • Restyled some of the rolling bloom filter code.

I think there is a significant improvement possible to the size of the rolling bloom filter, by using a different tweak for multiple filters, and checking several filters for a 'contains'. I'll check the math and write a separate PR for that if found useful.

gavinandresen and others added 3 commits Apr 25, 2015
Replace mruset setAddrKnown with CRollingBloomFilter addrKnown
Use a probabilistic bloom filter to keep track of which addresses
we think we have given our peers, instead of a list.

This uses much less memory, at the cost of sometimes failing to
relay an address to a peer-- worst case if the bloom filter happens
to be as full as it gets, 1-in-1,000.

Measured memory usage of a full mruset setAddrKnown: 650Kbytes
Constant memory usage of CRollingBloomFilter addrKnown: 37Kbytes.

This will also help heap fragmentation, because the 37K of storage
is allocated when a CNode is created (when a connection to a peer
is established) and then there is no per-item-remembered memory
allocation.

I plan on testing by restarting a full node with an empty peers.dat,
running a while with -debug=addrman and -debug=net, and making sure
that the 'addr' message traffic out is reasonable.
(suggestions for better tests welcome)

@sipa sipa force-pushed the sipa:smallmru branch from 2fceed4 to f46a680 Apr 30, 2015

@gavinandresen

This comment has been minimized.

Copy link
Contributor

commented Apr 30, 2015

@sipa: There's a good survey paper by Sasu Tarkoma et al: "Theory and Practice of Bloom Filters for Distributed Systems." Which pointed me to: https://home.kookmin.ac.kr/~mkyoon/publications/2010TKDE.pdf
... which has a more complicated scheme (maybe similar to what you're thinking).

@gavinandresen

This comment has been minimized.

Copy link
Contributor

commented Apr 30, 2015

ACK

@laanwj

This comment has been minimized.

Copy link
Member

commented May 1, 2015

I think there is a significant improvement possible to the size of the rolling bloom filter, by using a different tweak for both, and checking both filters for a 'contains'. I'll check the math and write a separate PR for that if found useful.

@sipa sounds great to me, although let's do that in a later pull, this is already a great improvement compared to the current state, so let's get this reviewed and merged.

@laanwj

This comment has been minimized.

Copy link
Member

commented May 1, 2015

Tested ACK

@laanwj laanwj merged commit f46a680 into bitcoin:master May 1, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
laanwj added a commit that referenced this pull request May 1, 2015
Merge pull request #6064
f46a680 Better mruset unit test (Pieter Wuille)
d4d5022 Use ring buffer of set iterators instead of deque of copies in mruset (Pieter Wuille)
d81cff3 Replace mruset setAddrKnown with CRollingBloomFilter addrKnown (Gavin Andresen)
69a5f8b Rolling bloom filter class (Gavin Andresen)
Cryptoslave added a commit to Anoncoin/oldrepo-backup that referenced this pull request Mar 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.