Replace setInventoryKnown with a rolling bloom filter. #7100

Closed
wants to merge 5 commits into
from

Conversation

Projects
None yet
6 participants
@gmaxwell
Member

gmaxwell commented Nov 26, 2015

Mruset setInventoryKnown was reduced to a remarkably small 1000
entries as a side effect of sendbuffer size reductions in 2012.

This removes setInventoryKnown filtering from merkleBlock responses
because false positives there are especially unattractive and
also because I'm not sure if there aren't race conditions around
the relay pool that would cause some transactions there to
be suppressed. (Also, ProcessGetData was accessing
setInventoryKnown without taking the required lock.)

Replace setInventoryKnown with a rolling bloom filter.
Mruset setInventoryKnown was reduced to a remarkably small 1000
 entries as a side effect of sendbuffer size reductions in 2012.

This removes setInventoryKnown filtering from merkleBlock responses
 because false positives there are especially unattractive and
 also because I'm not sure if there aren't race conditions around
 the relay pool that would cause some transactions there to
 be suppressed. (Also, ProcessGetData was accessing
 setInventoryKnown without taking the required lock.)
@dcousens

This comment has been minimized.

Show comment
Hide comment
@dcousens

dcousens Nov 26, 2015

Contributor

@gmaxwell with this change, mruset can be completely removed.

Contributor

dcousens commented Nov 26, 2015

@gmaxwell with this change, mruset can be completely removed.

@pstratem

This comment has been minimized.

Show comment
Hide comment
@pstratem

pstratem Nov 26, 2015

Contributor

@gmaxwell how large is the filter?

Contributor

pstratem commented Nov 26, 2015

@gmaxwell how large is the filter?

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 26, 2015

Member
Member

sipa commented Nov 26, 2015

@pstratem

This comment has been minimized.

Show comment
Hide comment
@pstratem

pstratem Nov 26, 2015

Contributor

@sipa in that case utACK

Contributor

pstratem commented Nov 26, 2015

@sipa in that case utACK

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Nov 26, 2015

Member

hm. Is it that big? My math must be wrong, I was thinking it was more like 350k. In any case, I didn't want to make it smaller than the maximum INV size, since having a single INV blow it out would be unfortunate.

Member

gmaxwell commented Nov 26, 2015

hm. Is it that big? My math must be wrong, I was thinking it was more like 350k. In any case, I didn't want to make it smaller than the maximum INV size, since having a single INV blow it out would be unfortunate.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 26, 2015

Member
Member

sipa commented Nov 26, 2015

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 26, 2015

Member

Eh, messed up my math. This should result in 700 KiB per peer.

Member

sipa commented Nov 26, 2015

Eh, messed up my math. This should result in 700 KiB per peer.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 26, 2015

Member

Confirmed with a sneaky fprintf call: 35 KiB for the address tables, and 702 KiB for the inventory tables.

Member

sipa commented Nov 26, 2015

Confirmed with a sneaky fprintf call: 35 KiB for the address tables, and 702 KiB for the inventory tables.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 26, 2015

Member

utACK. Also, mruset can go away indeed.

Member

sipa commented Nov 26, 2015

utACK. Also, mruset can go away indeed.

@dcousens

This comment has been minimized.

Show comment
Hide comment
@dcousens

dcousens Nov 27, 2015

Contributor

utACK

Contributor

dcousens commented Nov 27, 2015

utACK

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Nov 27, 2015

Member

700kiB per peer is quite a lot for a P2P application in itself, but more relevant is how it compares to mruset overhead when keeping the functionality at the same level. I suppose it's much better?

Member

laanwj commented Nov 27, 2015

700kiB per peer is quite a lot for a P2P application in itself, but more relevant is how it compares to mruset overhead when keeping the functionality at the same level. I suppose it's much better?

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 27, 2015

Member
Member

sipa commented Nov 27, 2015

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Nov 27, 2015

Member

Yes, that's why I mentioned "functionally equivalent", comparing against the current state isn't fair as it's broken.

Member

laanwj commented Nov 27, 2015

Yes, that's why I mentioned "functionally equivalent", comparing against the current state isn't fair as it's broken.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 27, 2015

Member
Member

sipa commented Nov 27, 2015

@pstratem

This comment has been minimized.

Show comment
Hide comment
@pstratem

pstratem Nov 27, 2015

Contributor

@sipa Care to share? :)

Contributor

pstratem commented Nov 27, 2015

@sipa Care to share? :)

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Nov 27, 2015

Member

Do we really need a 1/1000000 false positive rate? An FP will just cause us
to not relay a particular transaction to one peer, but we'll still relay it
to others (they have other nonces).

Ideally what we'd want here is the opposite of a bloom filter. False positives are bad, as they cause peers to miss out on transactions, but the occasional false negative is ok, as they might get it INVed another time.

Member

laanwj commented Nov 27, 2015

Do we really need a 1/1000000 false positive rate? An FP will just cause us
to not relay a particular transaction to one peer, but we'll still relay it
to others (they have other nonces).

Ideally what we'd want here is the opposite of a bloom filter. False positives are bad, as they cause peers to miss out on transactions, but the occasional false negative is ok, as they might get it INVed another time.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 27, 2015

Member

Agree... unfortunately I don't know of any better way to do that than an mruset...

Anyway, #7113 reduces this to 526 KiB per peer.

Member

sipa commented Nov 27, 2015

Agree... unfortunately I don't know of any better way to do that than an mruset...

Anyway, #7113 reduces this to 526 KiB per peer.

@jtimon

View changes

src/net.cpp
@@ -2370,6 +2370,7 @@ CNode::CNode(SOCKET hSocketIn, const CAddress& addrIn, const std::string& addrNa
nSendOffset = 0;
hashContinue = uint256();
nStartingHeight = -1;
+ setInventoryKnown.reset();

This comment has been minimized.

@jtimon

jtimon Nov 27, 2015

Member

Won't this undo the setInventoryKnown(50000, 0.000001) initialization?

@jtimon

jtimon Nov 27, 2015

Member

Won't this undo the setInventoryKnown(50000, 0.000001) initialization?

This comment has been minimized.

@sipa

sipa Nov 27, 2015

Member

No, it only wipes the set, not the parameters.

@sipa

sipa Nov 27, 2015

Member

No, it only wipes the set, not the parameters.

@jtimon

This comment has been minimized.

Show comment
Hide comment
@jtimon

jtimon Nov 27, 2015

Member

utACK

Member

jtimon commented Nov 27, 2015

utACK

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Nov 27, 2015

Member

@sipa I'm not really excited with having false positives at all here; mostly due to the case where you only have one peer, a FP means you may not relay your own transaction.

In a universe where software complexity had no cost, I'd give MRUsets to (or even just the first few) outbound peers, and higher FP rolling bloom filters to all the others.

Member

gmaxwell commented Nov 27, 2015

@sipa I'm not really excited with having false positives at all here; mostly due to the case where you only have one peer, a FP means you may not relay your own transaction.

In a universe where software complexity had no cost, I'd give MRUsets to (or even just the first few) outbound peers, and higher FP rolling bloom filters to all the others.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 28, 2015

Member

utACK

Member

sipa commented Nov 28, 2015

utACK

@pstratem

This comment has been minimized.

Show comment
Hide comment
@pstratem

pstratem Nov 29, 2015

Contributor

needs rename of setInventoryKnown to filterInventoryKnown

won't a false positive here break block relaying?

Contributor

pstratem commented Nov 29, 2015

needs rename of setInventoryKnown to filterInventoryKnown

won't a false positive here break block relaying?

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 29, 2015

Member
Member

sipa commented Nov 29, 2015

@pstratem

This comment has been minimized.

Show comment
Hide comment
@pstratem

pstratem Nov 29, 2015

Contributor

there doesn't seem to be a strong reason to filter block inv messages

i say just send them without filtering

Contributor

pstratem commented Nov 29, 2015

there doesn't seem to be a strong reason to filter block inv messages

i say just send them without filtering

pstratem added some commits Nov 29, 2015

@pstratem

This comment has been minimized.

Show comment
Hide comment
@pstratem

pstratem Nov 29, 2015

Contributor

ACK

Contributor

pstratem commented Nov 29, 2015

ACK

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 29, 2015

Member

Updated #7125 to not use setInventoryKnown for blocks anymore.

Member

sipa commented Nov 29, 2015

Updated #7125 to not use setInventoryKnown for blocks anymore.

@@ -492,8 +491,9 @@ class CNode
{
{
LOCK(cs_inventory);
- if (!setInventoryKnown.count(inv))
- vInventoryToSend.push_back(inv);
+ if (inv.type == MSG_TX && filterInventoryKnown.contains(inv.hash))

This comment has been minimized.

@sipa

sipa Nov 29, 2015

Member

This test is not enough. SendMessages checks the filter again.

@sipa

sipa Nov 29, 2015

Member

This test is not enough. SendMessages checks the filter again.

Actually only use filterInventoryKnown with MSG_TX inventory messages.
Previously this logic could erroneously filter a MSG_BLOCK inventory message.
@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 30, 2015

Member

Needs rebase on top of #7129.

Member

sipa commented Nov 30, 2015

Needs rebase on top of #7129.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 30, 2015

Member

See #7133.

Member

sipa commented Nov 30, 2015

See #7133.

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Dec 2, 2015

Member

replaced by #7133.

Member

gmaxwell commented Dec 2, 2015

replaced by #7133.

@gmaxwell gmaxwell closed this Dec 2, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment