New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework addnode behaviour #8113

Merged
merged 3 commits into from Jun 16, 2016

Conversation

Projects
None yet
6 participants
@sipa
Member

sipa commented May 28, 2016

This is an alternative to #8097 that:

  • Maintains consistency between ThreadOpenAddedNodeConnections and getaddednodeinfo, without any DNS resolving from RPC.
  • Deals with the use case of added names whose IP mapping changes over time (switching over to the new IP only when the connection to the old one closes, though).
  • Gets rid of the fDns argument to getaddednodeinfo (turning it into a dummy)
  • Simplifies the code significantly by introducing a shared GetAddedNodeInfo in net.cpp.
  • The existing addnode logic that tries all resolved addresses for a name is pushed down into ConnectSocketByName (picking at random).
  • The existing addnode logic to prevent multiple connections to the same IP is pushed down into ConnectNode (by storing colliding names into the CNode's addrName field).

Untested.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj May 30, 2016

Member

This is a more invasive change than #8097, but concept ACK.

Untested.

Looks like we don't have any RPC tests for getaddednodeinfo, nor addnode besides onetry mode.

Member

laanwj commented May 30, 2016

This is a more invasive change than #8097, but concept ACK.

Untested.

Looks like we don't have any RPC tests for getaddednodeinfo, nor addnode besides onetry mode.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa May 30, 2016

Member

Basic testing for getaddednodeinfo/addnode should certainly be added, but the specific case being changed here would be very hard to test without beind able to mock out name resolving...

Member

sipa commented May 30, 2016

Basic testing for getaddednodeinfo/addnode should certainly be added, but the specific case being changed here would be very hard to test without beind able to mock out name resolving...

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj May 31, 2016

Member

Yes, agreed. Maybe that's a future possibility after the net refactor. IMO the most important privacy invariant for DNS changes is that we don't ever do DNS queries when -dns=0 is set. Would be nice to have an auto test of some kind for that.

In any case that's a tangent. I think #8097 was nice in that it was only a local change to an (apparently rarely used) RPC call. This involves some regression risk for code outside that as well. On the other hand this is a more thorough solution, and at least one person (@TheBlueMatt) cares about the functionality it seems better to do that.

@theuni can you take a look?

Member

laanwj commented May 31, 2016

Yes, agreed. Maybe that's a future possibility after the net refactor. IMO the most important privacy invariant for DNS changes is that we don't ever do DNS queries when -dns=0 is set. Would be nice to have an auto test of some kind for that.

In any case that's a tangent. I think #8097 was nice in that it was only a local change to an (apparently rarely used) RPC call. This involves some regression risk for code outside that as well. On the other hand this is a more thorough solution, and at least one person (@TheBlueMatt) cares about the functionality it seems better to do that.

@theuni can you take a look?

@theuni

This comment has been minimized.

Show comment
Hide comment
@theuni

theuni May 31, 2016

Member

Concept ACK. Thanks, @sipa!

Will review in detail.

Member

theuni commented May 31, 2016

Concept ACK. Thanks, @sipa!

Will review in detail.

@theuni

This comment has been minimized.

Show comment
Hide comment
@theuni

theuni Jun 1, 2016

Member

This looks great. A few notes:

  • GetAddedNodeInfo() doesn't check for pnode->fDisconnect. I'm not sure if it needs to bother (probably not).
  • ConnectSocketByName() only tries 1 of the resolved results, which seems to conflict with the PR description. Unless I'm reading incorrectly, adding "foo.com" which resolves to 10 ips will mean 2 min between tries for its ips, with random collisions mixed in.
  • It looks like this will mean up to a 2min lag from addnode rpc to attempted first connection. We can introduce a condvar after fc2561f in lieu of the hard-coded wait.

I'll work on some real testing. Yes, I have planned (for later, not initially) for functionality to spoof resolved results for easier dns testing as part of the net refactor. For now, I'll just spoof system-wide.

Member

theuni commented Jun 1, 2016

This looks great. A few notes:

  • GetAddedNodeInfo() doesn't check for pnode->fDisconnect. I'm not sure if it needs to bother (probably not).
  • ConnectSocketByName() only tries 1 of the resolved results, which seems to conflict with the PR description. Unless I'm reading incorrectly, adding "foo.com" which resolves to 10 ips will mean 2 min between tries for its ips, with random collisions mixed in.
  • It looks like this will mean up to a 2min lag from addnode rpc to attempted first connection. We can introduce a condvar after fc2561f in lieu of the hard-coded wait.

I'll work on some real testing. Yes, I have planned (for later, not initially) for functionality to spoof resolved results for easier dns testing as part of the net refactor. For now, I'll just spoof system-wide.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Jun 1, 2016

Member

GetAddedNodeInfo() doesn't check for pnode->fDisconnect. I'm not sure if it needs to bother (probably not).

I don't think that matters. If fDisconnect is set, it tends to be processed quickly.

ConnectSocketByName() only tries 1 of the resolved results, which seems to conflict with the PR description. Unless I'm reading incorrectly, adding "foo.com" which resolves to 10 ips will mean 2 min between tries for its ips, with random collisions mixed in.

Only the random order changed. The old code would consecutively try all IPs from all addnodes, but with an inner loop over the -addnode arguments (with 0.5s delay), and an outer loop (with 120s delay) over the multiple lookup results.

It looks like this will mean up to a 2min lag from addnode rpc to attempted first connection. We can introduce a condvar after fc2561f in lieu of the hard-coded wait.

Sounds like a good idea, but that's independent of this PR, right?

Member

sipa commented Jun 1, 2016

GetAddedNodeInfo() doesn't check for pnode->fDisconnect. I'm not sure if it needs to bother (probably not).

I don't think that matters. If fDisconnect is set, it tends to be processed quickly.

ConnectSocketByName() only tries 1 of the resolved results, which seems to conflict with the PR description. Unless I'm reading incorrectly, adding "foo.com" which resolves to 10 ips will mean 2 min between tries for its ips, with random collisions mixed in.

Only the random order changed. The old code would consecutively try all IPs from all addnodes, but with an inner loop over the -addnode arguments (with 0.5s delay), and an outer loop (with 120s delay) over the multiple lookup results.

It looks like this will mean up to a 2min lag from addnode rpc to attempted first connection. We can introduce a condvar after fc2561f in lieu of the hard-coded wait.

Sounds like a good idea, but that's independent of this PR, right?

@theuni

This comment has been minimized.

Show comment
Hide comment
@theuni

theuni Jun 1, 2016

Member

Yikes, yet another misreading of this code on my part. I misunderstood the connection logic post-resolve. So nevermind the ConnectSocketByName comment.

Yes, the condvar can come later.

Member

theuni commented Jun 1, 2016

Yikes, yet another misreading of this code on my part. I misunderstood the connection logic post-resolve. So nevermind the ConnectSocketByName comment.

Yes, the condvar can come later.

@theuni

This comment has been minimized.

Show comment
Hide comment
@theuni

theuni Jun 4, 2016

Member

@sipa This needs theuni@c22dc8e in order for 'getaddednodeinfo' to properly report the connected ip.

Tested (via RPC, at least) ACK with the fix above. The rest looks sane. Though I suspect @TheBlueMatt will object to the behavioral change here. IMO it's more sane in that it lines up with my expectations of how this would work, but it breaks the "addnode a dns entry and it will quickly detect a change and create a new connection" persistence model.

I'd like to point out that this logic is also much easier to re-implement in the refactored code, as the internal connection handler doesn't need to keep up with application state. Instead, it will just retry a dns connection any time the current one disconnects.

Member

theuni commented Jun 4, 2016

@sipa This needs theuni@c22dc8e in order for 'getaddednodeinfo' to properly report the connected ip.

Tested (via RPC, at least) ACK with the fix above. The rest looks sane. Though I suspect @TheBlueMatt will object to the behavioral change here. IMO it's more sane in that it lines up with my expectations of how this would work, but it breaks the "addnode a dns entry and it will quickly detect a change and create a new connection" persistence model.

I'd like to point out that this logic is also much easier to re-implement in the refactored code, as the internal connection handler doesn't need to keep up with application state. Instead, it will just retry a dns connection any time the current one disconnects.

CNode* pnode = FindNode((CService)addrConnect);
if (pnode)
{
pnode->AddRef();

This comment has been minimized.

@theuni

theuni Jun 4, 2016

Member

What Release()s this?

@theuni

theuni Jun 4, 2016

Member

What Release()s this?

This comment has been minimized.

@sipa

sipa Jun 6, 2016

Member

I just copied the behaviour of the "// Look for existing connection" logic above. I assume that ConnectNode always returns something with an incremented refcount that the caller is supposed to deal with.

@sipa

sipa Jun 6, 2016

Member

I just copied the behaviour of the "// Look for existing connection" logic above. I assume that ConnectNode always returns something with an incremented refcount that the caller is supposed to deal with.

This comment has been minimized.

@theuni

theuni Jun 8, 2016

Member

Ok. I was already suspicious of the current refcounting logic, but copying the current behavior makes sense. I'll look into that separately.

@theuni

theuni Jun 8, 2016

Member

Ok. I was already suspicious of the current refcounting logic, but copying the current behavior makes sense. I'll look into that separately.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Jun 8, 2016

Member

Rebased, and incorporated @theuni's fix to the RPC output.

Member

sipa commented Jun 8, 2016

Rebased, and incorporated @theuni's fix to the RPC output.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 9, 2016

Member

@TheBlueMatt can you give your OK on whether this still supports your use case?

Member

laanwj commented Jun 9, 2016

@TheBlueMatt can you give your OK on whether this still supports your use case?

sipa added some commits May 28, 2016

Rework addnode behaviour
* Use CNode::addeName to track whether a connection to a name is already open
  * A new connection to a previously-connected by-name addednode is only opened when
    the previous one closes (even if the name starts resolving to something else)
  * At most one connection is opened per addednode (even if the name resolves to multiple)
* Unify the code between ThreadOpenAddedNodeConnections and getaddednodeinfo
  * Information about open connections is always returned, and the dns argument becomes a dummy
  * An IP address and inbound/outbound is only reported for the (at most 1) open connection
@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Jun 13, 2016

Member

Rebased.

Member

sipa commented Jun 13, 2016

Rebased.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 14, 2016

Member

Nit: let's make the dummy argument optional.

Member

laanwj commented Jun 14, 2016

Nit: let's make the dummy argument optional.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 14, 2016

Member

I don't have any luck using it to connect to hostnames. E.g. when trying to add a testnet seed (which I know for fact is returing valid addresses most of the time):

13:31:26 getaddednodeinfo false

13:31:26
[
  {
    "addednode": "testnet-seed.bitcoin.petertodd.org",
    "connected": false,
    "addresses": [
    ]
  }
]
Member

laanwj commented Jun 14, 2016

I don't have any luck using it to connect to hostnames. E.g. when trying to add a testnet seed (which I know for fact is returing valid addresses most of the time):

13:31:26 getaddednodeinfo false

13:31:26
[
  {
    "addednode": "testnet-seed.bitcoin.petertodd.org",
    "connected": false,
    "addresses": [
    ]
  }
]
@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 14, 2016

Member

Got it to work now, after setting up a fake hostname instead of using a seed, and prodding it a bit:

  {
    "addednode": "testnettest.com",
    "connected": true,
    "addresses": [
      {
        "address": "52.72.156.74:18333",
        "connected": "outbound"
      }
    ]
  }
Member

laanwj commented Jun 14, 2016

Got it to work now, after setting up a fake hostname instead of using a seed, and prodding it a bit:

  {
    "addednode": "testnettest.com",
    "connected": true,
    "addresses": [
      {
        "address": "52.72.156.74:18333",
        "connected": "outbound"
      }
    ]
  }
@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Jun 14, 2016

Contributor

I'm really not a huge fan of not following DNS changes until after the previous connection dies, but it does not effect me, personally. In any case, all of the addnode logic needs to be better documented somewhere.

Contributor

TheBlueMatt commented Jun 14, 2016

I'm really not a huge fan of not following DNS changes until after the previous connection dies, but it does not effect me, personally. In any case, all of the addnode logic needs to be better documented somewhere.

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Jun 15, 2016

Member

utACK. I think the new behavior is more useful and explicable.

Member

gmaxwell commented Jun 15, 2016

utACK. I think the new behavior is more useful and explicable.

@laanwj laanwj merged commit 1a5a4e6 into bitcoin:master Jun 16, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Jun 16, 2016

Merge #8113: Rework addnode behaviour
1a5a4e6 Randomize name lookup result in ConnectSocketByName (Pieter Wuille)
f9f5cfc Prevent duplicate connections where one is by name and another by ip (Pieter Wuille)
1111b80 Rework addnode behaviour (Pieter Wuille)
@MarcoFalke

This comment has been minimized.

Show comment
Hide comment
@MarcoFalke

MarcoFalke Jun 17, 2016

Member

Not sure if related but this was reported on IRC by @jonasschnelli:

2016-06-13 23:42:03 receive version message: /bitcoinj:0.14.1/: version 70001, blocks=0, us=127.0.0.1:8333, peer=12469
2016-06-13 23:42:03 AdvertiseLocal: advertising address [2a01:4f8:172:10da::2]:8333
2016-06-13 23:42:05 receive version message: /bitcoinj:0.14.2/Bitcoin Wallet:4.55/: version 70001, blocks=416193, us=127.0.0.1:8333, peer=12470
2016-06-13 23:42:05 AdvertiseLocal: advertising address 138.201.55.219:8333
2016-06-13 23:42:06 POTENTIAL DEADLOCK DETECTED
2016-06-13 23:42:06 Previous lock order was:
2016-06-13 23:42:06  pnode->cs_vRecvMsg  net.cpp:1731 (TRY)
2016-06-13 23:42:06  cs_main  main.cpp:4441
2016-06-13 23:42:06  (1) pfrom->cs_filter  main.cpp:4497
2016-06-13 23:42:06  (2) cs_vSend  net.cpp:2437
2016-06-13 23:42:06 Current lock order is:
2016-06-13 23:42:06  (2) pnode->cs_vSend  net.cpp:1750 (TRY)
2016-06-13 23:42:06  cs_main  main.cpp:5693 (TRY)
2016-06-13 23:42:06  pto->cs_inventory  main.cpp:5896
2016-06-13 23:42:06  (1) pto->cs_filter  main.cpp:5919
2016-06-17 13:52:54
Member

MarcoFalke commented Jun 17, 2016

Not sure if related but this was reported on IRC by @jonasschnelli:

2016-06-13 23:42:03 receive version message: /bitcoinj:0.14.1/: version 70001, blocks=0, us=127.0.0.1:8333, peer=12469
2016-06-13 23:42:03 AdvertiseLocal: advertising address [2a01:4f8:172:10da::2]:8333
2016-06-13 23:42:05 receive version message: /bitcoinj:0.14.2/Bitcoin Wallet:4.55/: version 70001, blocks=416193, us=127.0.0.1:8333, peer=12470
2016-06-13 23:42:05 AdvertiseLocal: advertising address 138.201.55.219:8333
2016-06-13 23:42:06 POTENTIAL DEADLOCK DETECTED
2016-06-13 23:42:06 Previous lock order was:
2016-06-13 23:42:06  pnode->cs_vRecvMsg  net.cpp:1731 (TRY)
2016-06-13 23:42:06  cs_main  main.cpp:4441
2016-06-13 23:42:06  (1) pfrom->cs_filter  main.cpp:4497
2016-06-13 23:42:06  (2) cs_vSend  net.cpp:2437
2016-06-13 23:42:06 Current lock order is:
2016-06-13 23:42:06  (2) pnode->cs_vSend  net.cpp:1750 (TRY)
2016-06-13 23:42:06  cs_main  main.cpp:5693 (TRY)
2016-06-13 23:42:06  pto->cs_inventory  main.cpp:5896
2016-06-13 23:42:06  (1) pto->cs_filter  main.cpp:5919
2016-06-17 13:52:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment