Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network collapses when Node is closed #98

Closed
FlorianG2021 opened this issue Jun 23, 2021 · 11 comments
Closed

Network collapses when Node is closed #98

FlorianG2021 opened this issue Jun 23, 2021 · 11 comments

Comments

@FlorianG2021
Copy link

FlorianG2021 commented Jun 23, 2021

When i use multiple nodes like 1 Bootstrap with 1-x Nodes and 1-2 get/set Nodes the network collapses as soon as I close one of the Nodes. All nodes have different ports where they are listening on and getting bootstraped by the Bootstrap Node.

For an example I do this steps:

  1. start the Bootstrap Node
  2. start Node 1 and Node 2
  3. start Setter Node 1 <- seems to break here (when setter node finished)
  4. start Node 3
  5. start Getter Node 1
  6. close Node 2
  7. start Node 4
  8. start Node 5
  9. start Node 6
  10. start Setter Node 2
  11. start Getter Node 2
  12. start Node 7

Edit: Threw out the ZIP because now there is Gist attached.

I have the same results when i keep the Getter 1 and Setter 1 running in a while-true loop with "asyncio.sleep(5)". Than its crashing as soon as I close Node 2 and Start Node 4 (see step 6 and 7).

I tried many other combinations with less or more nodes and the result is always the same. The network always is collapsing with messages like this until setting and getting of key/value pairs is impossible and/or only the bootstrap node is left in the routing table:

Did not receive reply for msg id b'9vnyGd7GSadJyjd0ZHQRlMZ7g2w=' within 5 seconds
2021-06-23 11:20:09,156 - kademlia.protocol - WARNING - no response from 127.0.0.1:8472, removing from router

Normall the network should work normally when a node is getting closed right?

I am using "PyCharm Community Edition 2020.2.3 x64" on Windows 10 and Python 3.9.

@bmuller
Copy link
Owner

bmuller commented Jun 23, 2021

One thing that may be helpful to convey up front is that all nodes on the network are storage nodes. At point 3 (if I understand your list correctly) you start a 4th node (what you call a "setter" node) that announces to the network that it is available to store things. Your network has 4 nodes at that point. If your 4th node then shuts off, the other 3 nodes still think they're on a 4 node network. At that point you have shut off 25% of your storage capacity. When you try to do anything on the network at that point, the remaining 3 nodes will produce messages about not being able to reach the 4th node that has now disappeared.

A second item that may be helpful - so long as there is at least 1 node, and you are able to connect to it with a second node, your second node should be able to get/set values on the network (even with messages about not being able to reach nodes that may no longer be available).

With that in mind, I'd be interested to get more specifics about what you're experiencing compared to what you're expecting. For instance:

  1. What do you mean by "network collapse" (with as much specific information / detail as possible)?
  2. What specific behavior are you expecting and what specific behavior are you experiencing?
  3. What are the minimal number of steps to reproduce the unexpected behavior? If you could provide these steps in a gist for instance that could be helpful.

@FlorianG2021
Copy link
Author

FlorianG2021 commented Jun 23, 2021

Okay maybe i just wrote it a little bit clumsy because my english is not the best one.
Yes i know that all nodes have the capability to store and request key/value-pairs. I tried to make it as basic and as close as on your example files as possible. But after i made the Gist I now see that it was way to complicated (sorry)!

  1. With "network collapses" i mean, that even when im just closing 1 node and still have a lot more nodes running (like 49 more nodes for example), not only the node turned off is getting kicked from all other nodes routing table, but also all the other nodes which where in the the routing table (and are still online) are getting kicked out of all other nodes. The result is, that after closing 1 of like 50 nodes the routing table of all nodes is breaking down in a way that i could not set or get other key/value pairs because the nodes are losing all neighboors and can't find any neighboors to make lookups for key/value-pairs.
  2. I try to build a structure where i can have one node which is giving commands by setting key/value-pairs and all other nodes are periodically checking the key/value-pairs and taking action on the results and saving their results as key/value-pairs after that. I'm expecting that i can add new nodes and close them again and change the key/value pairs without the experienced behavior, that all nodes are losing their neighboors and can't get or set key/value pairs anymore.
  3. I made a Gist with better instruction and comments in the comment section: https://gist.github.com/FlorianG2021/f03b429d06e652d5804c8ea5a55e2d39

Let me know if you need better information. Thanks for your help and this awesome project!

@bmuller
Copy link
Owner

bmuller commented Jun 23, 2021

Thanks @FlorianG2021 - if you're seeing nodes removed from the routing table when they are still reachable then that could definitely be signs of a bug. I'll try to dig in to see if I can reproduce, but it may take a little while before I have time.

@FlorianG2021
Copy link
Author

FlorianG2021 commented Jun 23, 2021

No problem - I have to say thanks to you for this nice project. If I can help you to find/fix this by giving more details/logs just let me know. :)

@FlorianG2021
Copy link
Author

FlorianG2021 commented Jun 24, 2021

I have a happy addition @bmuller.
After trying several things like reinstalling PyCharm and trying it on 2 different devices, I now just tried it with the combination of a fresh new Kubuntu 24.04 (instead of Windows 10) and the newest PyCharm.
With this combination the scenarios I linked you in Gist are working without problems. I will make some more tests today and than make another comment to tell you if it is still working or not.
I don't know what just happened.

@FlorianG2021
Copy link
Author

For now it seems to work.
More tests will be done over the weekend.

@FlorianG2021
Copy link
Author

Still no changes. I can run different examples and projects of myself with the Kademlia Library, but under Windows 10 + PyCharm im still losing the neighbors all the time. I even tried setting up different virtual interpreter and tried using a global system interpreter.
Kubuntu 20.04 + PyCharm out of the box perfectly fine.

@bmuller
Copy link
Owner

bmuller commented Jun 30, 2021

@FlorianG2021 could there be some sort of Windows firewall that's enabled?

@FlorianG2021
Copy link
Author

FlorianG2021 commented Jun 30, 2021

@bmuller
Tried to deactivate everything right now but this isn't changing anything.
Also this would not explain that it works fine as long as i don't close one node.
I will try it tomorrow in a complete fresh Windows Virtual Machine.

Now im really triggered and want to know if my windows systems are broken (but it would be strange because I tried it with 2 different devices). :D

@FlorianG2021
Copy link
Author

@bmuller the problem is also available under a Windows 10 Enterprise Test VM. I used a Python 3.8 Interpreter (worked out-of-the-box in Kubuntu but not in Windows).

@bmuller
Copy link
Owner

bmuller commented Jul 10, 2021

Hey @FlorianG2021 - thanks for that update. I don't have access to a windows machine for testing, so unfortunately this isn't something I'm able to look into. If you're able to find a solution, I would happily accept a pull request, though. I'm going to close the issue for now, but please feel free to update with any other comments RE what you discover.

@bmuller bmuller closed this as completed Jul 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants