-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Meros's ULimit code doesn't support Windows and is untested. #158
Comments
This from an OSError that there's too many open files. As this is likely per OS, there's no good answer to how to solve this. The three solutions are: The second isn't preferred, especially due to how we handle seed nodes. If the seed nodes become filled, no new peers could join the network. We'd need to add a HandshakeRejection network message combined which also sends a list of alternate peers. |
The Ubuntu ulimit is 1024. It kills Meros off at 490-500 sockets (980-1000 files). At least on Macs, the default ulimit is 256. Leaving 44 files open for whatever else Meros is doing (and increasing to 64 to be sure), that leaves 192 files or 96 connections. We should add a config value for this, as some servers can raise their ulimit further, but the config default should be 96. In order to handle nodes which are full, I'd like to propose |
Also, to be clear, we can deal with ulimit if we just skip exceptions when accepting sockets, which we already do, and use an unlimited file related ulimit on the seed nodes. The reason this is an issue is:
|
To be clear, I actually have no idea what 44 files Meros is opening on its own, even when lowering the amount to 24 (low bound). The only files I know of are the 4 database files (DB/WalletDB) and two server sockets (peer server/RPC server). We need to run strace to see what else is creating files and make sure 64 is a proper allocation. |
Even with only 3 other peer connections, Meros will eventually refuse new connections. Something isn't closing its files. |
ulimit, combined with lingering file accesses, are causing nodes to technically remain running yet shut off their network activity. This isn't a crash but it is an unstable setup. |
For this issue to be closed:
|
I believe this is because error'ed sockets leave their FDs open, yet I'm not sure. |
This may or may not be fixed in the chronos branch. It adds several close statements to errored sockets and stops the socket overwriting that was happening as part of #159. That said, due to the same issue, I can't currently test it. I just wanted to note the progress. |
https://github.com/MerosCrypto/Meros/tree/ulimit has been created in response to this issue. It handles the max file limit, amount of open files, and whether or not new sockets can be created. It also adds a Busy message, currently undocumented, identical to Peers. It can be sent in response to any initial Handshake/Syncing. Before it can be merged, it needs documentation of the Busy message and thorough testing. As of right now, it also doesn't support Windows, but I plan on merging it Windows support (or even confirmed BSD/macOS support), creating a new issue for Windows support/verified BSD and macOS support. If this is properly implemented, combined with verifying there's no lingering sockets, we should have a stable net that, even when the seed nodes are at capacity, will still let new nodes join. Then we can merge this close this issue. |
The ulimit code works on Linux, yet is untested on macOS/BSD. Meros will go up to however many peers ulimit allows, and then start sending Busy. At this time:
That said, despite having theoretical macOS/BSD support, we don't have Windows support. It should be implementable via WinAPI's This will be merged, and moved to bug/untested, when a test for the second case is added. Bug will remain since BSD is only theoretically supported and Windows is unsupported. I'll try to spin up a FreeBSD VM some time. |
Alrighty. The test for the second case was added and passes. I'm running the tests locally, but then I'll merge this, re-title this issue, change the labels, and remove it from the milestone. |
We need to implement Windows support for the FileLimitTracker. We need to test:
It should be noted for that last bullet point, we either need to cap the amount of RPC sockets OR start disconnecting peers to regain the buffer. As for testing if we still leak sockets, which should've been fixed with Chronos, I feel the best way to test that will be to start up a net and see what happens. |
This does compile on macOS, yet I have no idea if it works as expected. The user who compiled it didn't run the tests as well, and I didn't press them as they aren't especially tech savvy. |
There's two problems here:
EDIT: The original title was "Meros will accept 500 client connections and then refuse to handle any. ". Please read the full issue to see how this has evolved.
EDIT: The second title "Meros's network activity eventually hangs." has also been replaced.
The text was updated successfully, but these errors were encountered: