Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

munged can deadlock if clients block while sending data #1

Closed
GoogleCodeExporter opened this issue May 15, 2015 · 8 comments
Closed
Labels
Milestone

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?

Have at least N clients (i.e., any user of libmunge) block while writing to the MUNGE unix domain socket, where N is the number of worker threads spawned by munged (the default is 2).

What is the expected output? What do you see instead?

munged should respond to new client requests. But if enough clients block while sending request data (where "enough" is defined as the number of worker threads), munged will stop responding to requests while continuing to accept new client connections.

What version of the software are you using? On what operating system?

munge-0.5.9
chaos-release-4.3-1.ch4.3
Red Hat Enterprise Linux Server release 5.4 (Tikanga)

Please provide any additional information below.

A given client request is handled by a munged worker thread. The worker reads each client request in two parts: the request header (containing the length of the request body), followed by the request body. [src/libcommon/m_msg.c:m_msg_recv()]

The read is performed by src/libcommon/fd.c:fd_read_n(), which keeps reading until either n bytes have been read or an error/eof occurs.

A timeout value needs to be specified for reading a client request.

Original issue reported on code.google.com by chris.m.dunlap on 12 Jul 2010 at 10:55

@GoogleCodeExporter
Copy link
Author

A simple way to reproduce this is to run N instances of munge under gdb and set a breakpoint at src/libcommon/m_msg.c:m_msg_send(). The (N+1)th instance will block since all munged worker threads are blocked.

Original comment by chris.m.dunlap on 13 Jul 2010 at 8:50

@GoogleCodeExporter
Copy link
Author

Original comment by chris.m.dunlap on 13 Jul 2010 at 8:52

  • Changed title: munged can deadlock if clients block while sending data

@GoogleCodeExporter
Copy link
Author

This issue was updated by svn:r833 / 4106dba.

Created issue-1 branch.

Original comment by chris.m.dunlap on 26 Aug 2010 at 12:34

  • Changed state: Started

@GoogleCodeExporter
Copy link
Author

This issue was updated by svn:r834 / 0aad63f.

Fixed deadlock by adding (3-second) message timeouts via poll(),but degraded throughput by 20%.

Original comment by chris.m.dunlap on 26 Aug 2010 at 1:54

@GoogleCodeExporter
Copy link
Author

Just noticed this TODO entry from svn:r356 / b9a647b:

-add server timeout for unresponsive clients

Whoops. Well, better late than never.

Original comment by chris.m.dunlap on 27 Aug 2010 at 6:19

@GoogleCodeExporter
Copy link
Author

Throughput degradation has been reduced to 5%.

Original comment by chris.m.dunlap on 26 Jan 2011 at 11:57

@GoogleCodeExporter
Copy link
Author

This issue was closed by svn:r904 / e3ac6c6.

Original comment by chris.m.dunlap on 27 Jan 2011 at 2:02

  • Changed state: Fixed

@GoogleCodeExporter
Copy link
Author

Original comment by chris.m.dunlap on 4 Feb 2011 at 3:31

  • Added labels: Milestone-0.5.10

@dun dun added bug and removed auto-migrated labels Jun 3, 2015
@dun dun added this to the 0.5.10 milestone Jun 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants