New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
munged can deadlock if clients block while sending data #1
Comments
A simple way to reproduce this is to run N instances of munge under gdb and set a breakpoint at Original comment by |
Original comment by
|
This issue was updated by svn:r833 / 4106dba. Created issue-1 branch. Original comment by
|
This issue was updated by svn:r834 / 0aad63f. Fixed deadlock by adding (3-second) message timeouts via Original comment by |
Just noticed this TODO entry from svn:r356 / b9a647b:
Whoops. Well, better late than never. Original comment by |
Throughput degradation has been reduced to 5%. Original comment by |
This issue was closed by svn:r904 / e3ac6c6. Original comment by
|
Original comment by
|
What steps will reproduce the problem?
Have at least N clients (i.e., any user of
libmunge
) block while writing to the MUNGE unix domain socket, where N is the number of worker threads spawned by munged (the default is 2).What is the expected output? What do you see instead?
munged should respond to new client requests. But if enough clients block while sending request data (where "enough" is defined as the number of worker threads), munged will stop responding to requests while continuing to accept new client connections.
What version of the software are you using? On what operating system?
munge-0.5.9
chaos-release-4.3-1.ch4.3
Red Hat Enterprise Linux Server release 5.4 (Tikanga)
Please provide any additional information below.
A given client request is handled by a munged worker thread. The worker reads each client request in two parts: the request header (containing the length of the request body), followed by the request body. [
src/libcommon/m_msg.c:m_msg_recv()
]The read is performed by
src/libcommon/fd.c:fd_read_n()
, which keeps reading until eithern
bytes have been read or an error/eof occurs.A timeout value needs to be specified for reading a client request.
Original issue reported on code.google.com by
chris.m.dunlap
on 12 Jul 2010 at 10:55The text was updated successfully, but these errors were encountered: