Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decentralised NAT Signalling #365

Closed
tegefaulkes opened this issue Mar 28, 2022 · 52 comments · Fixed by #618
Closed

Decentralised NAT Signalling #365

tegefaulkes opened this issue Mar 28, 2022 · 52 comments · Fixed by #618
Assignees
Labels
design Requires design enhancement New feature or request epic Big issue with multiple subissues r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices

Comments

@tegefaulkes
Copy link
Contributor

tegefaulkes commented Mar 28, 2022

Requirements of this design

  1. Maintaining active connections between nodes to form a 'strongly connected' network.
  2. Decentralised connection forming and NAT punching.
  3. Decentralised message relaying.
  4. Possibly caching known connection chains to speed up connection forming.

Additional context

Specification

We need to be able to hole punch a connection to the target node in a decentralised way. This can happen in stages.

  1. Holding connections to form a strongly connected graph.
  2. Discovery of a connected path.
  3. Forming a direct connection using a connected path.

Holding connections

To ensure that a connection path is available we need to have an environment where a chain of connections to the target is likely to exist. To do this each node needs to hold active connections to other nodes. the distribution of these connections needs to ensure that the overall network is strongly connected. Bonus points for minimising the number of hops.

Using the kademlia structure we can maintain active connections to the N closest nodes. but we can also hold connections between other nodes across the network. The closest nodes should maintain strong connections locally in the network while the extra connections will form shortcuts between parts of the network that are not closely related. Note I mean close the kademlia sense of closeness.

Discovery

The discovery process is very similar to the Kademlia FIND_NODE process. the difference here is that we only search over the active connections Proxy connections. Using this process we should be able to find a chain of connections that reaches our target Node.

Generally the idea is that we are walking the graph of active connections to find a chain of connections that leads to our target. Naively this would be a pretty expensive and exhaustive search but utilising the Kademlia structure we can ensure that we only walk closer to our target at each step of the search.

Forming direct connections

Given a chain of connections A>B>C>D we should be able to form a direct connection of A>B. We can form incrementally closer connections by using each step of the chain as a signaller to punch a connection to the next part of the chain. I will refer to this increasingly closer connection as a Feeler.

With the Chain A>B>C>D we can start by requesting B to signal a hole punch between A and C. If the hole-punch is successful then we now have a connection chain of A>C>D. We can then repeat the step use C as the new signaller to form a connection between A and D thus completing the desired direct connection.


    
 ┌─┐     ┌─┐     ┌─┐     ┌─┐    This is the existing connection chain.
 │A├─────►B├─────►C├─────►D│    We start out by asking Node B to signal
 └─┘     └─┘     └─┘     └─┘    hole punching to node C thus creating a
                                connection between A and C.


 ┌─┐     ┌─┐     ┌─┐     ┌─┐
 │A├─────►B├─────►C├─────►D│    With an active connection to C we can use
 └┬┘     └─┘     └▲┘     └─┘    C to act as a signaller to form a connection
  │    Feeler     │             A and D.
  └───────────────┘

 ┌─┐     ┌─┐     ┌─┐     ┌─┐    We now have a direct connection through
 │A├─────►B├─────►C├─────►D│    the NAT between A and D.
 └┬┘     └─┘     └─┘     └▲┘
  │         Feeler        │
  └───────────────────────┘

At each step the 'signalling' node is directly connected to both nodes and thus is aware of the 'NAT-ified' ports and addresses. It can easily coordinate a hole-punch between the two nodes. So long as we can find a connection chain we can form a connection between two nodes through NATs without a centralised signaller.

If we fail to form a more direct connection then an alternative chain will need to be found. Given a graph that is sufficiently connected then there should be multiple routes to the target. If for some reason a direct connection can't be formed but the target node is holding a connection. This process should ideally connect us with a node that IS holding an active connection to the target node and can act as a relay for said node. It's possible that not all nodes will like to act as a data relay so we may need a way to negotiate with the target node to connect to a relay node that will relay data.

Full process

We can combine the discovery and connection forming to create increasing closer connections to our target. the process will look something like this.

  1. get list of active connections sorted by distance to target.
  2. Pick N out of that list in order of closeness to target.
  3. form a direct connection with the newly found nodes using the previous connections as a signaller.
  4. repeat from step 1.

Eventually this should form a direct connection to our target. We at any point in time concurrently have alpha Feelers walking closer to the target during this process.

If we had existing discovery knowledge about the network such as know connection chains that lead to the target then that can be used to greatly speed up the process. if we had knowledge of an existing connection chain that leads to the target then we could just do the connection forming process without doing any discovery. Likewise if we had a partial chain that we know leads closer to our target then we can use that as a starting point and reduce how much we need to search.

Additional Resources

Sub-Issues & Sub-PRs created

  1. Decentralized Signalling #618
  2. MDNS integration and NodeGraph structure expansion #537
@tegefaulkes tegefaulkes added enhancement New feature or request design Requires design labels Mar 28, 2022
@tegefaulkes tegefaulkes self-assigned this Mar 28, 2022
@tegefaulkes
Copy link
Contributor Author

There are a few things that come to mind need to be addressed for this.

  1. Separate hold punching out of NodeConnection/Manager and just have the Proxy handle it.
  2. Proxy needs to maintain a minimum of active connections as a super set of the NodeConnections to maintain network connectivity.
  3. Out Kademlia implementation needs to be modified slightly to support discovery over active connections.
  4. Implement connection forming across a chain. What do we call this? 'Chain collapsing'? 'Chain reducing'?

@tegefaulkes
Copy link
Contributor Author

This will need a bit of discussion to refine the idea some more but I think we have a good basis here. For a start I think some refactoring work needs to be done for hole-punching and related messaging.

@CMCDragonkai
Copy link
Member

A diagram here would help understand what the problem is.

Compare:

  1. Using gossip/kademlia-driven casting to flood the network with the relay message.
  2. Using 1-hop point to point hole punching, and thus combining the "resolving query" with the "relay query".

@CMCDragonkai
Copy link
Member

Also the difference between closest node connections vs closest node entries in NodeGraph and the difference between NodeConnection and Proxy ConnectionForward.

@tegefaulkes
Copy link
Contributor Author

To maintain the network connectivity we need to maintain a bunch of open connections for each node. It raises some questions.

  1. How many concurrent connections can we have before we're using too many resources?
  2. How many connections does each node need to maintain strong connectivity of the network? is there a point where the network 'percolates'?
  3. If an existing connection is required to start the hole-punch process then is there any value in storing information in the node Graph for nodes we're not currently connected to?
  4. If only existing connections are useful is there any point in having the NodeGraph be persistent in the DB?

The main one is there any point in distinguishing entries in the node Graph are active connections or not? what can we do with the host/port information without having an active connection to it? If there is no use for this information if we're not connected to the node then shouldn't we only store information about nodes we're actively connected to?

@tegefaulkes
Copy link
Contributor Author

The other approach we considered was to use a kademlia to gossip/broadcast a hole-punch relay message to the target we are trying to connect to. if we picked N nodes out of the nodes we were connected to that are the closest to the target node and we asked them to relay a hole-punch message. If each node receiving this message relayed it to the n of it's connected nodes that were closest to the target node. Then we would eventually get the message to the target node.

This message has some disavantages.

  1. no control over when the message reaches the target node thus we can't coordinate timing on the hole punching very well.
  2. This method has a multiplying affect on the message creating more messages at each stage with each message pointing towards the target. Possibly DDOSing the target.
  3. we would still need some way to get the proper port information that we need to punch through the NAT.

@CMCDragonkai CMCDragonkai changed the title Decentralised NAT punching and discovery Decentralised NAT punching and discovery (dealing with bidirectional punch relay message) Mar 29, 2022
@tegefaulkes tegefaulkes added the epic Big issue with multiple subissues label Mar 30, 2022
@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Mar 30, 2022

It may not be useful to store the addrees:port information of NodeIds that are not currently connected in the NodeGraph. But what would be extremely useful is storing the previously seen connection chains. If during the discovery process we can obtain information about previously seen connection chains to our target then we can greatly speed up the process.

Keeping the previously seen address:port can also be useful if the target node can be connected to directly. either in the cases;

  1. target has a public facing IP address.
  2. target's ports were manually forwarded.
  3. target automatically negotiated port forwarding using either, PCP, NAT-PMP, UPnP-IGD or others?

@tegefaulkes
Copy link
Contributor Author

There is a node library for negotiating port mappings with the NAT. This will be very useful for avoiding hole punching in the first place as well as reducing the amount of keep alive packets we may have to send.

https://github.com/freedomjs/freedom-port-control

@CMCDragonkai
Copy link
Member

Persistent connections that are needed to maintain "NAT reachability" will need to be maintained at the NCM level, not at the proxy connections. Because once we have composed a proxy connection, when we terminate the GRPC client, we cannot be sure that GRPC server will handle this correctly without also receiving a TCP FIN packet. At the same time, this avoids adding additional complexity to the connection management at the proxy connection level.

Note that "NAT reachability" is the ability for a node to be reachable behind a NAT even while idle.

@CMCDragonkai
Copy link
Member

Assigning @emmacasolin to this issue to be resolved after #357 is merged. It's a good continuation of that PR. Please review this issue details, and we can review it on Monday next sprint.

@emmacasolin
Copy link
Contributor

emmacasolin commented Apr 11, 2022

Something like Dijkstra's algorithm might be useful here for forming increasingly closer connections to the target node, however, we would need to modify the definition of "closeness" to better meet our needs.

Rather than defining the "closest" node (to the target node) using solely the Kademlia way, we could incorporate a node's held proxy connections when deciding which node to use as a signaller. The "closest" node can be defined as the node that we have an open connection to that has an open connection to the closest (Kademlia definition) node to the target node. If we also maintain that every node must at least have an open connection to its k closest nodes then, provided k >= 2, we are guaranteed to be able to find the "shortest" path to the target node, since the next connection we make will always get us closer to the target. The connections made during this process could potentially be cached to speed up connection formation in the future.

Take for example this graph of nodes, where k = 2. The lines are active connections, and some nodes have more than k active connections. If all nodes have exactly k active proxy connections, then using the closest Kademlia-wise node to the target we're connected to will take the same number of connections as taking active connections into account. If any nodes have more than k active proxy connections, however, then using only the Kademlia definition can be either the same or slower.

image

In this example, Node 10 wants to connect to Node 2. If it chooses the next closest node to the target using the Kademlia definition of closest then it will choose 11 as a signalling node, followed by 0, then 1, then finally the target node 2. In this situation three new proxy connections would need to be established.

If it were instead to choose the signalling node as the node with an active connection to the next closest node to the target using the Kademlia definition then it will choose 9 as a signalling node, followed by 1 and then the target node 2, needing only 2 new proxy connections to be established.

@tegefaulkes
Copy link
Contributor Author

Looks good. This was generally the idea I had about discovery but re-reading the spec I see that it's pretty vague on this.

If I recall my thinking correctly. We would've asked all of our connected nodes for their X closest nodes to the target. Out of that larger set we would select the closest nodes and make connections to them. I think that's in line with your modified closest definition above.

Keep in mind that in terms of optimisation I think it's more important to minimise the number of alternate routes we explore than the number of hops we take to the target.

@CMCDragonkai
Copy link
Member

It's a good idea to do a simulation of this in a simple script. You'll have to start off with some cases:

  1. You may know only the NodeID, or you may know the NodeID and NodeAddress. If you don't know the NodeAddress, you can combine the resolving operation with signalling requests some how.
  2. You should be doing "ICE", which means doing both direction connection and signalling simultaneously, and whichever one finishes first wins, and the other operation is cancelled. Assume that we can do this.
  3. The list of active connections you have will always be a subset of your entire node graph contact list. However it may be entirely optimal to only use the the list of active connections because you may be on of the un-active contacts can be contacted to do a signalling. There's likely some tunable parameters/heuristics you may need to be optimal for making use of both the entire contact list and just your active connections (preferring your active connections over unactive contacts obviously).
  4. There may not be a globally optimal solution, so a "good enough" heuristics is sufficient.

@emmacasolin
Copy link
Contributor

If we're only maintaining active connections to the k closest nodes, isn't it possible that sub-groups of close nodes could become "detached" from the rest of the network? Only maintaining connections to the closest nodes seems like it would encourage connections to be clumped together, meaning that clumps of nodes may be only connected to each other and be unreachable to other nodes outside of the "clump".

@CMCDragonkai
Copy link
Member

Yea, so we may need to also maintain connections to far-off nodes too to balance it out.

@tegefaulkes
Copy link
Contributor Author

It bears looking into. But we do want strong 'local' connectivity. I was thinking about maintaining some 'regional' connections spread across all buckets to act as shortcuts during discovery.

I don't think we can end up with disconnected groups if we just rely on the closest nodes though but that is something that should be checked.

@CMCDragonkai
Copy link
Member

Imagine k is 20, then if every node in the 20 closest nodes connects to every other 20 node, you have a cluster of close nodes of 20 with no outside connections.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Apr 11, 2022

Actually I don't know if that's the case. @emmacasolin can you use #158 (comment) (script: #326 (comment)) and work out a visualisation/simulation.

If the minimum number of connections is 1, then it should be possible to have disconnected pairs.

But if the minimum number of connections is greater than 1, I imagine there's less likelihood for that to be the case, even if all nodes have the same complete node graph. But I'm not sure if this is the case, can you verify this?

Also remember that connections are directional. Our we have a directed graph because connections are client to server.

Also once another node has made a directional connection to you, you should be able to make the reverse directional connection without hole punching.

Active proxy connections may be divided between outgoing and incoming. These are separate.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Apr 11, 2022

You could use graphology, ngraph or cytoskape. https://github.com/anvaka/ngraph. Related is #367

@CMCDragonkai
Copy link
Member

For future reference, most "visual" aspects of JS ecosystem is outside of nodejs conventions. You're now in the territory of frontend-JS development. Which requires different context. Here are some resources:

Some nodejs packages purport to provide terminal rendering. These are often quite niche, and more advanced visualisations often don't work or are just toys. Simpler visualisations are possible though with https://github.com/vadimdemedes/ink and https://github.com/kroitor/asciichart.

Some nodejs packages support desktop GUIs like imgui and other graphics-libraries. These tend to be OS-specific, and some are cross-platform. These are more advanced of course, and are intended to create general GUI programs.

Recommendations:

  1. If you are cormfortable with web-dev, use the index.html, it ultimately leads to our Polykey-Desktop work which uses electron as a browser frontend
  2. ASCII terminal ones can be fun for the pk CLI.

@CMCDragonkai
Copy link
Member

So we are going to need this for Beta CLI launch.

I'm exploring more of this idea here https://chat.openai.com/share/de1b9d93-755b-4fb5-91e8-1288fa7c8300. I need someone to expand this conversation to include what we do with the MDNS discovered nodes, since they are not part of the node graph atm.

It seems a multi-hop signalling system is needed to find the closest nodes to connect from. however, I want this to be as efficient as possible, can the operation to hole punch the closest nodes be done in similar way to how we find the node information? We want to avoid amplification here.

@amydevs comments regarding MDNS required.

@tegefaulkes any new things coming from the nodes domain that has been refactored recently?

@CMCDragonkai
Copy link
Member

Also #599 is relevant to this discussion too since the mainnet/testnet page will need to acquire information from the seed node cluster and if nodes are connecting to only m/n of the seed cluster then this may impact the reconciliation of all of the information.

@tegefaulkes
Copy link
Contributor Author

Information from MDNS is not relevant to this process. address information within a single machine or a lan network is not useful for punching in from an external network.

Besides that, what has changed in the nodes domain since writing this up?

  1. Multi connection logic.
  2. Complete network stack overhaul.
  3. NAT signalling and hole punching better resembles what this spec was expecting. Where it's an atomic operation between a source, target and signalling node.

@CMCDragonkai
Copy link
Member

I need to make the operation generic from the perspective of A trying to connect to B. It's important that it's an efficient way if internally it's doing multi-hop hole punching.

@tegefaulkes
Copy link
Contributor Author

Depends on what you mean by multi-hop hole punching.

This spec describes a method of finding a common node to signal with by doing a Kademlia find operation over active connections in the network. We can optimise this in some ways to minimise the amount of searching that needs to be done. Distributing information what public signalling nodes this node would be contactable on can help a lot. Tracking what nodes are public and using them and starting points for a search can help as well.

But doing something like sending a signalling message across multiple hops. I'm very resistant to doing this. Not that it's not an option. But how do we send a message like that very quickly across multiple hops? How do we discover the connection chain between two nodes remotely without the amplification problem?

@CMCDragonkai
Copy link
Member

Multi hop means each time attempting a bridge to help facilitate a connection.

@CMCDragonkai CMCDragonkai self-assigned this Nov 1, 2023
@CMCDragonkai
Copy link
Member

Ok I just re-read everything in this issue. This is actually very complex problem that has alot of different kinds of solutions with various trade-offs.

I'm going to try to summarise the situation right now.

The root of this problem is the existence of NAT. Even with IPv6, NAT is likely to continue being a problem. This is because even without a NAT gateway, there may be firewalls that simply default-deny unknown packets. And this is likely to be true on mobile networks regardless of any technology evolution (ignoring the fact that hole punching itself would not work here, but solving this problem is necessary for efficient decentralized relaying too).

Currently we have a fixed set of seed nodes that is deployed into AWS. Let's take testnet.polykey.com as an example. Details on this is being discussed in #599.

Currently all PK nodes upon launching with --network testnet will connect to all well-known seed nodes. The seed node connections are considered special, and they are not subject to the node connection TTLs that all other node connections are subject to. That means they do not expire even if there's no RPC-level activity on them (requires confirmation on MatrixAI/js-rpc#52).

This ensures that all nodes have a port-mapping established.

The network stack has been completely replaced with QUIC. Furthermore, we now only have 1 single QUICSocket using 1 single UDP port. All connections to other nodes are always multiplexed into this single QUIC socket and thus single UDP port.

Suppose our seed cluster has 2 seed nodes. That means right now all nodes launched will have 2 QUIC connections, but 1 QUIC socket, and thus 1 port mapping in their NAT (assuming NAT exists for them).

During a recent testnet 6 test, we were able to connect 2 separate nodes both with hole-punched NAT between Sydney and Austin. This was done with the help of the seed nodes facilitating the signaling operation. Atm, there's a bug with both nodes being able to hole punch each other #605, but there's on-going progress on that from #609 (@tegefaulkes confirm please).

This works because all nodes are connected to the same seed nodes. Which means when node A wants to connect to node B, it can basically ask any or all of the seed nodes S1 and S2 to forward on the signaling message.

In relation to #365 (comment), this means there's a strict quorum, well it's completely overlapping basically.

Now why does an overlap matter? The reason is simple. The seed nodes all have different IPs. In a prior testnet integration #403 we realised that we are not going down the route of a "distributed and load balanced" seed node cluster architecture as this goes against the principle of decentralisation. So therefore because the seed nodes have separate IPs, then during port mapping at the NAT layer, if node B only connected to S2, while node A connected to S1, then the node B's NAT would not accept packets from S1, as its IP and port is not part of the port-mapping structure in B's NAT. Basically they would be dropped as unknown packets.

From a NAT port mapping perspective, it is important to remember that these mappings are temporary, and may disappear after 20 seconds of zero packet activity. Primarily we can assume that they mean a particular local IP and local port inside the LAN, is mapped to a external IP and external port. Because we use the same QUIC socket for a given node, the local IP and local port is always the same, but the external IP and external port will vary for every external node it is connecting to.

So node A and node B must share a common seed node, in this issue's parlance, that is called the "signaling node", simply due to the need for the common signaling node's IP and port to be in place for both A and B's NAT port mapping so that both A and B accept packets from the common signaling node.

This the core problem of this issue.

Now decentralized signaling means achieving 2 things:

  1. Scalability of the seed nodes
  2. Scalability of the entire network

These 2 are sort of interelated, enabling the seed node architecture to scale - and by scalability we mean for every new seed node we deploy, do we get superlinear, linear or sublinear returns to over-all network capacity (and capacity is measured by multi-constrained limits of how big a practical realistic network will be).

The current architecture does not scale, because every node is potentially doing 100% of the work of all signaling operations and 100% of all connections (which is to say, if there was 1000 nodes launched, there would be 1000 connections to every single seed node).

The ICE protocol in each node is asking all the seed nodes concurrently to perform the signaling operation. Although technically it could also just ask 1 out of all of them, with the assumption that all seed nodes are connected. @tegefaulkes can you confirm where in the source code this is decided upon?

Choosing only 1 out of all seed nodes to do this reduces the workfload some-what, however all nodes still have to connect to the common set of seed nodes in entirety.

As per #365 (comment), it is possible to workaround this a bit by introducing strict or sloopy quorums, thus enabling at least one node to be the common seed node. However this introduces the problem of knowing which node is the actual common seed node to perform the signaling role. And such a solution would only work in the context of the blessed seed node cluster.

As an aside, #599 is also discussing the change in the DNS record architecture to allow for there to be a public dashboard of the network. Which actually fits into our designs for private network bootstrapping, and also means there will need to be a change from hardcoding trusted seed node IDs to instead relying on a root certificate for every node which would require #154. Which enables dynamic seed node autoscaling.

Now earlier in this issue, we imagined that getting every node to connect to their closest nodes is a decentralized co-ordination algorithm as it means we can expect every node to have connected to their closest nodes, thus placing a NAT port mapping for their closest nodes, and therefore being able to re-use the kademlia closeness to be able to "find" the potential signaling node to use to connect to a particular target node.

However we later realised, that this kind of decentralized co-ordination would not form good connectivity across the entire network graph. #365 (comment) and #372.

So even if we were to always be actively connected where active means regular packet activity and therefore NAT port mapping is maintained to our closest nodes, if node A was not in the same cluster as node B, then none of the potential signaling nodes will be reachable. This problem is transitive. If the potential signaling nodes are not reachable, then none of the potential signaling nodes of the signaling nodes will be reachable... etc.

Empirical simulation shows that we could have good connectivity with random node connections, but there's discussion about how the reason this works is primarily due to having connections to the farthest bucket. #365 (comment)

@tegefaulkes, can you explain what the difference is between spreading random active connections across the buckets, versus just random active connections across the entire graph. My impression is that they are roughly the same.

You say:

We need to maintain connections with the furthest bucket to ensure connectivity with the wider network. Otherwise we end up with the clustering.

This conflicts with empirical summary from the simulation saying that random connections work fine too.

Is your objection due to the fact that setting up random connections doesn't work with kademlia atm, since we always try to find the closest 20 nodes first. And also remember that a node record is only added to the node graph upon a successful connection, meaning that we would likely connect up to 20 nodes to fulfill the bucket size, and this is likely to be spread through the first few buckets.

Then are you saying that we add additional methods into our node graph to also find the farthest 20 nodes too?

I'm just confused as to whether your comments came after the results showing random connections maintained connectivity with least complexity, and therefore is really arguing that while random works fine, structurally we need to fit into kademlia's way of finding nodes to connect to anyway, thus we now need to find closest nodes, farthest nodes... or just give me 20 random nodes across the entire graph? This is in reference to #365 (comment) since you posted that on the same date as #372 (comment).

...MORE COMMENTARY COMING

@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Nov 1, 2023

Replying to parts of the above comment...

"During a recent testnet 6 test, we were able to connect 2 separate nodes both with hole-punched NAT between Sydney and Austin. This was done with the help of the seed nodes facilitating the signaling operation. Atm, there's a bug with both nodes being able to hole punch each other #605, but there's on-going progress on that from #609 (@tegefaulkes confirm please)."

Yes, I'm nearly done with #609. #605 is planned but I think addressing memory problems with #598 is going to be done first.

"The ICE protocol in each node is asking all the seed nodes concurrently to perform the signaling operation. Although technically it could also just ask 1 out of all of them, with the assumption that all seed nodes are connected. @tegefaulkes can you confirm where in the source code this is decided upon?"

Hole punching is initiated with NodeConnectionManager.initiateHolePunch in src/nodes/NodeConnectionManager.

" @tegefaulkes, can you explain what the difference is between spreading random active connections across the buckets, versus just random active connections across the entire graph. My impression is that they are roughly the same. "

For connectivity we want a random spread of connection across the network. For searching to work with Kademlia, the 'closer' two nodes are, the more likely they should have a connection. But if you only held connections to your 'closest' nodes then the network would be fragmented into clusters. So we need a mix of random connections and close ones.

@CMCDragonkai
Copy link
Member

Continuing from the previous commentary, some discussions with @tegefaulkes indicates that:

  1. NCM manages connections and determinse whether to idle them out. This basically means we need a way of identifying which connections should be subject to the idle timeout, and which connections are not.
  2. NM is what understands the node graph structure in terms of semantics. When deciding how best to find the signaling node chains to the target node, this would be done here.

Now global connectivity means we need to at least have random connections but, we must have at least connections to the farthest bucket too.

This implies that random selection by itself shoulidn't be enough, we should have policy-directed randomness.

In order to do this, the NM and NG must provide a way to also find the farthest bucket.

So instead of only finding the closest nodes, we also want to find the farthest nodes and a random spread of other nodes to maintain active connections to.

From NG's perspective, finding farthest nodes or selecting random nodes in the node graph will be easy. The NG buckets are just prefix indexed, and we can always select nodes from the farthest bucket.

The seed nodes will receive connections from all over the Node ID space, so they are likely to have a fairly uniform split because they always receive contact first.

For the NCM to get the farthest bucket, it's just a matter of calling:

nodeGraph.getBucket(255);

As a side note I noticed that the NG has the KeyRing dependency injected. This works because the NG can ask for the own Node ID for certain operations. But it seems a bit tight coupling atm. We talked about this earlier, but we might consider a push-flow abstraction. Only issue is push-flow won't be consistent, whereas we might want the whole node to be updated in lock-step when the Node ID changes. I think however a reactive property might work better. Bascially NG only cares about the current node ID and has to react to changes to the current node ID.

There's also another method called NG.getClosestNodes. This allows you pass in a node ID, which will then try to find the closest node records to that node ID in the local node graph.

Now the issue is that, using nodeGraph.getBucket(255) will just get you the nodes in the farthest bucket to the current node's own node ID. If new nodes enter the network and are asking the seed nodes what their farthest nodes are, this won't work, we need something like NG.getClosestNodes or perhaps NG.getFarthestNodes.

I'm not sure how that would work in relation to random selections across the network. If we always connect closest and farthest, then why do we need random selection in between?

This is the first step at the very least is to be able to query the NG in more sophisticated ways. Subsequently we need to expose this information over the network. Then I am still confused as to whether kademlia will even help us find the intermediate signaling node necessary. Since closest won't help, how would we know which nodes that are closest or farthest away can act as a signaling node?

Maybe another selection strategy is required? If closest causes disconnected clustering, why not always pick the farthest? And use that as our signaling system?

According to #372 (comment), the bottom K strategy did work but wasn't as efficient as random K at maintaining connectivity with minimum number of active connections.

I think there may be a trade-off between the probability of finding a sufficient signaling chain to reach a target node and the number of active connections to be maintained for full connectivity.

Because if everything was connected to everything, then any path may lead me to the target. But if there's fewer active connections, then there are less paths to my target, and thus a better way of identifying which path to take is important.

@CMCDragonkai
Copy link
Member

Some additional notes.

The kademlia XOR property is subject to triangle inequality.

image

image

https://stackoverflow.com/a/25756549

This means:

If A to B is far away.

A to B would be shorter than or equal to the distance of A to C plus C to B.

So if C is close to B, then it would also have to be far away from A too. Because the sum of the 2 distances would have to be greater or equal to A to B.

In other words:

If B is the target node, and C is a close node to B, and B is far away from A, then C is also far away from A, potentially even farther away actually.

@CMCDragonkai
Copy link
Member

This triangle inequality property is quite important to giving a geometric intuition to how this thing will work.

If B is the target, then we have to search backwards.

What we are searching for first is C. C is the node that has an active connection to B.

And if we are preferring active connections to be selected from close nodes, then C would relatively be far from A as well.

Upon finding C, we could attempt a connection to that first - because we have to in order to initiate the signal phases.

Then of course in order to connect to C, we have to find C', the intermediate node that we can use to help us connect to C.

Now there's a problem here, although none of this means that C is any closer to A, the originating node. Because the distance between A to C could be equal to A to B.

This kind of means, that if we keep trying to find C', and C'', each time we may just be finding another node that is just as far away, and potentially farther away in XOR distance to A.

Eventually we would exhaust everything that is close to B, C, C', C''... etc (assuming that the C is always the closest node to B). In fact, if we are asking kademlia what are the closest nodes to B, then it would likely give us a list of C, C', C''... etc.

We're not actually "closer" to getting the node that is close to us.

@CMCDragonkai
Copy link
Member

Ok on another note, the procedure is roughly right now:

  1. Find a node ID
  2. Looks up locally and enters into a priority queue
  3. Each time we ask another node to find the same node ID
  4. This gives us IP + port combinations
  5. Which we then use ICE to connect which is hardcoded to contact all seed nodes atm

The modification to ICE, is to not simply always contact all seed nodes. Instead to do this more intelligently.

We could use provenance here.

One idea is that whoever knows about this Node ID and was able to tell you about the IP + port, can only have done so, if they had previously connected to that Node ID. Because records do not go into the node graph until a successful connection has been made.

Therefore one makes an observation that if a node does not even have the relevant record in its node graph, then it is impossible for them to be a valid signalling node in the context of hole punching.

Conversely, if a node has a relevant record, it could be a valid signalling node if it still has that active connection to that node.

The reason it could and not will is because:

  1. That node may no longer have a connection to the target Node ID.
  2. That node may have learned about the Node ID through the refresh bucket mechanism - which adds a layer of random gossip to kademlia, which forces nodes to share contacts, however contacts are only retained if a successful connection has been made.

So the 2 reasons are really just 1 reason. That 1 reason becomes more lopsided over time. That is the longer a node is running in the network, the more likely there will be nodes that had former connections to that node. (By how much though? The refresh bucket system seems like it wouldn't necessarily always pick a node ID that would exist).

As a side note, the termination condition of the find node mechanism is not verified:

image

It appears that we need to have a few conditions to ensure that we should not end up contacting the entire network. Right now it appears the refresh bucket mechanism would result in every node in the network end up slowly flooding the entire network by trying to connect to everybody. This needs to be prevented.

@CMCDragonkai
Copy link
Member

Provenance may be relevant here, if the NodeGraph kept track of who told it about this NodeId information prior to connecting to it, this may help trace back to those who would have a persistent connection to the target node.

On the other hand, this may just always lead back our seed nodes, and that isn't particularly great either.

@CMCDragonkai
Copy link
Member

To summarise the situation right now.

We have modify ICE, where it selects the node to do the signalling.

At that specific point in time we have a few choices to figure out here:

  1. Trying to find the closest nodes to the target node by itself won't really get you a chain of active connections by itself.
  2. But finding the closest nodes, is one of the ways in which you essentially get to know about the IP + port of a target node. Remember at this point we may not even know the IP + port of the target node.

Ok so here's an idea.

Let's make nodes maintain connections to close nodes, but also have random connections to far away nodes.

So the first step is to resolve the Node ID, that means doing a find node operation which leads us to close nodes, that are likely to have active connections to the target node.

Second step which is simultaneous to the first step, is to ask our own active connections (which some are close and some are far), what their active connections are.

So basically this lets us understand who is connected to us, and who they are connected to, as well as who is connected to the target node.

This search process will eventually lead to a bridge point where there is a node that is in the set intersection, and is the missing link that can bridge the 2 networks together.

There is a high probability that all nodes are only separated by 6 degrees.

@CMCDragonkai
Copy link
Member

Actually the 6 degrees of separation isn't proved.

We still need to work out the probabilistic longest chain.

@CMCDragonkai
Copy link
Member

So at 6 random connections, the small world network says that the number of intermediate hops is log_2(N)/log_2(K). That means at 60,000 nodes, it will take 6 hops.

@CMCDragonkai
Copy link
Member

Going to try this implementation and see how it goes:

  1. During the find nodes procedure the return value will now contain both the closest nodes and active connections, this could be a separate list, a separate RPC call, or added as additional metadata to the close nodes.
  2. When you're trying to connect to any particular node, you have to look at your local node graph first anyway, and when doing an ICE to them, you have to try to do a direct connection, and if you need to do the signalling you have to rely on your active connections anyway. Without active connections, you would use the seed nodes.
  3. This means you now have a more complex multi-priority queue, where there are heuristics that can lead you to trying different things. You can try nodes that are close via direct connections, via active connections, or you can even try other nodes that farther away but you have a signalling available for it.

No matter what just connecting to nodes to do the find nodes (which is intended to just resolve the node ID to the IP + port) requires connecting to other nodes anyway. So ICE has to more options to try.

If you have 6 active connections, you can ask those connections what their active connections are - basically can they can help to connect to. You can then make connections to them, and ask them what their active connections are. This somewhat close a flooding operation. Where you are flooding through the active connection list. Actually is this a maze traversal problem? Kind of like the micromouse challenge?

It seems the idea is that the find nodes operation could be used along with flood fill of active connections.

@CMCDragonkai
Copy link
Member

@tegefaulkes write up a summary comment about what has been done in #618, and how we may proceed to improve it.

Also we now begin empirical tests.

We need to create some EC2 machines and setup the NAT firewalls for all these machines, we can take code from the NAT simulations in PK CLI - which has been disabled, we can re-use that code to setup the nftables/iptables across these EC2 machines, we can do it at scale with Pulumi, and start stress testing the network. Observability can be improved with more events/metrics exposed to the dashboards #599.

@CMCDragonkai
Copy link
Member

@amydevs

@CMCDragonkai
Copy link
Member

Tomorrow morning make sure to get PK CLI setup with the new version of PK and deploy the new version to the testnet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Requires design enhancement New feature or request epic Big issue with multiple subissues r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices
Development

Successfully merging a pull request may close this issue.

5 participants