-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Few small bug fixes in the connectors. #5
Comments
Hello Delian, Thanks for the feedback, our project always likes to listen to new people trying out our software. As for the bugfixes:
Greetings, |
You may already know it, but S.close() does not help because of the unclean shutdown you do on a socket in the libmist. The result is that the loop is blocked and the whole process stop performing. I intentionally did close(S.getSocket()) as this is a way to correctly close the socket and not blocking the process, avoiding libmist. The code you have in mistlib's socket.cpp ...
...will block forever (at least on linux) if you have non empty buffers in the socket (quite often). Also, if you carefully look, you will see that the while will block forever. Why simply you do not replace that code with:
... and leave the kernel to do the thing you were trying to do? This way your threads will not block and S.close() will perform correctly. |
Hello Delian, Thanks for the reply. I am wondering however, to what extent the shutdown of our sockets would be Regards, On Mon, Jan 27, 2014 at 12:39 PM, Delian Delchev
|
Just do a little experiment - try to debug httpprogressive process, having S.close() set, while there is a traffic sent to the user (the user is having a lower bw connection so the output socket buffer is full). MistConnH 1733 root cwd DIR 252,0 4096 655618 /home/mist/mistserver http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html#closedown |
I am doing a little experiment - replacing the close() code in the mistlib with what I just said (pure close), as I just realized from my logs that my mist httpconnector has been blocked because the total max amount of threads has been reached, as they are not cleaned because they have been blocked by the close() method. |
For the moment the test I did behave well, I have now a lot less close_wait sockets, but I still have them. Thay may happen if the socket is not closed somewhere, while the thread exits (and this way it will even keep the thread alive). Soon that may exhaust the total amount of threads per process quota. So either this was not the reason for the CLOSE_WAIT or it was not the only reason. I need to wait for 24h to confirm though. However, I have another suggestion - I would suggest you instead of doing
to replace them with a macro, defined in debug.h for example (where you will do your checks). The macro can have the value like:
Then you can remove the unistd.h and the rest includes you use in every file just because of the debug, and replace the #if DEBUG>xxx or #ifdef DEBUG's with just dbg("%s", .... ); |
Funny that you should mention this - I have just finished making exactly this change. On a different but related topic: calling shutdown before closing is necessary on some systems to signal EOF to the other end of the connection. However, calling shutdown on a socket that is still going to be used is of course horribly wrong - and I'm working on fixing this, right now. Same as above - code will land sometime today or tomorrow. Thanks again for your continued feedback and comments! If you find anything else that you think isn't behaving as it should, please don't hesitate to let us know. :-) |
For reference - the code I spoke about just landed in the development branches of mistserver/mistlib. Make sure to checkout and compile both, or it won't work. Not all debug messages have been updated to the new format yet - and not all of them work at levels above 4 right now. Fixes for that will arrive soon, too. |
Ok, I will look at it, I never even thought there may be different branches there :) |
We keep the master branch at the currently stable version. The development branch has the latest changes and contains "soon to be stable" code. So, if you want the latest changes you'll want to keep up with the development branches :-) |
Hello to all,
I am very new to the mistserver (I heard of it for a first time couple of weeks ago). But I want to suggest few bug fixes.
I am dealing with issues related with the Mist server under "heavy" load (few hundred connections. This is not a heavy load to my experience, but seems to be very heavy to the Mist). Probably it will not surprise you to hear that this software in particular is not really well designed to scale. However, this is a different topic of discussions.
During my investigations of the code i found a bunch of kiddish bugs that also affect the stability and the scalability of the software. They are present in all of the releases I've check - 1.0, 1.2 and 1.3.
As I am not a contributor and I really didn't found how to report a bug or patch, so I am writing you here.
I have found bugs and issues in the buffer process, connector process and the RTMP process. However this message will be related to the connector only, as I have no much time to write you more.
The connector design is quite ineffective. It forks a process for each new customer connection. It has two kiddish bugs associated with this -
There is no signal interception at the master. So sometimes when a child dies, it became a Zombie and slowly kills the machine process table resources (although memory and everything else is freed).
To fix this is quite simple.
You just need to add in the beginning
and somewhere in the beginning of the main function:
I did it and it works very fine for me.
However, the major problem within every connector code is that within the While loop for accept -> fork of new Unix socket sessions, when it forks a child, the master does not close the accepted connection.
As an effect the master keeps a file descriptor open. Which will be copied in the next forked child (it copy all of the file descriptors) but the fd count will be increased in the master again and then copied to the next forked child, and so on, and so on.
The effect of this is that every new child fork will became slower and slower (all the fd has to be copied) but mostly the allocated fd per process (and the system in general) grows exponentially as they are never closed. In a mist server under heavy load (where we have stream of opening new short lived customer connections) the total limit of file descriptors of the system/process will be exhausted on a logarithmic basis (the max span of the process live before the issue happens will be constant * log2(total-process-limit)). At some point there will be not enough resources for a new child to open, and the new fork will crash. The old connections will still work perfectly though.
I cannot imagine the pain of all the customers of the software that have only one way to resolve this issue when it happens - to restart the mistserver periodically.
However, the resolution is quite, quite, simple.
You just need to add close(S.getSocket()) in the master process. In every connector code you have while loop with a code like that (taken from HTTPProgressive):
You just need to add there (where I have shown) "close(S.getSocket());" and everything will be fine.
You can easily verify the bug fix with lsof.
If everything is correct the lsof -p PID-of-the-master-connector should show you that the master has only two unix sockets open, and every child too.
If you see more sockets (and exponential growth of the allocated file descriptors) then you haven't applied the patch.
It works perfectly for me and fixes a lot of issues.
I have been very annoyed to apply this patch in the last 3 days to 1.0, 1.2 and 1.3 versions of the mist server, so please integrate it in the main code!
If I have a time I will write you about few other small patches you can do.
The text was updated successfully, but these errors were encountered: