Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipc(sc) can deadlock if ipc_type is socket #46

Closed
jfriesse opened this issue Dec 12, 2012 · 8 comments
Closed

ipc(sc) can deadlock if ipc_type is socket #46

jfriesse opened this issue Dec 12, 2012 · 8 comments
Assignees

Comments

@jfriesse
Copy link
Member

Simple reproducer:
corosync.conf:
qb {
ipc_type: socket
}

start corosync
start test/cpgverify

corosync + cpgverify will deadlock in qb_ipc_us_sendv function because:

if (errno == EAGAIN &&
(processed > 0 || iov_p > 0)) {
goto retry_send;

but processed is >0. So it just retry forever (or until cpgverify is stopped).

@asalkeld
Copy link
Contributor

Hi Honzaf

For some strange reason I can't reproduce this. But was thinking that the socket send buf size might
have something to do with it, so I below is a patch that increases the send buf size. Would you mind
giving it a go?

-Angus

From: Angus Salkeld <asalkeld@redhat.com>
Date: Thu, 13 Dec 2012 12:03:36 +1100
Subject: [PATCH] IPC: try make sure the snd_buf size is set.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
---
 lib/ipc_us.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 49 insertions(+), 4 deletions(-)

diff --git a/lib/ipc_us.c b/lib/ipc_us.c
index 5cc238f..184d837 100644
--- a/lib/ipc_us.c
+++ b/lib/ipc_us.c
@@ -109,6 +109,38 @@ static void sigpipe_ctl(enum qb_sigpipe_ctl ctl)
 #define MSG_NOSIGNAL 0
 #endif

+static int32_t
+ensure_socket_buf_size(int fd, uint32_t required_max)
+{
+   uint32_t cur_buf_size;
+   socklen_t opt_len;
+   int32_t rc = 0;
+
+   if (getsockopt(fd, SOL_SOCKET, SO_SNDBUF,
+              &cur_buf_size,
+              &opt_len) < 0) {
+       rc = -errno;
+       qb_util_perror(LOG_ERR, "couldn't get the socket send size");
+       return rc;
+   }
+   if (cur_buf_size >= required_max) {
+       qb_util_log(LOG_INFO, "socket buf size is %d > %d",
+               cur_buf_size, required_max);
+       return 0;
+   } else {
+       qb_util_log(LOG_INFO, "socket buf size is %d < %d: increasing",
+               cur_buf_size, required_max);
+   }
+   if (setsockopt(fd, SOL_SOCKET, SO_SNDBUF,
+              &required_max,
+              sizeof(required_max)) < 0) {
+       rc = -errno;
+       qb_util_perror(LOG_ERR, "couldn't set the socket send size");
+       return rc;
+   }
+   return 0;
+}
+
 ssize_t
 qb_ipc_us_send(struct qb_ipc_one_way *one_way, const void *msg, size_t len)
 {
@@ -416,7 +448,9 @@ retry_recv:

+qb_ipcc_us_sock_connect(const char *socket_name,
+           uint32_t max_msg_size,
+           int32_t * sock_pt)
 {
    int32_t request_fd;
    struct sockaddr_un address;
@@ -434,6 +468,11 @@ qb_ipcc_us_sock_connect(const char *socket_name, int32_t * sock_pt)
        goto error_connect;
    }

+   res = ensure_socket_buf_size(request_fd, max_msg_size);
+   if (res < 0) {
+       goto error_connect;
+   }
+
    memset(&address, 0, sizeof(struct sockaddr_un));
    address.sun_family = AF_UNIX;
 #ifdef HAVE_STRUCT_SOCKADDR_UN_SUN_LEN
@@ -480,7 +519,8 @@ qb_ipcc_us_setup_connect(struct qb_ipcc_connection *c,
    int on = 1;
 #endif

-   res = qb_ipcc_us_sock_connect(c->name, &c->setup.u.us.sock);
+   res = qb_ipcc_us_sock_connect(c->name, c->setup.max_msg_size,
+                     &c->setup.u.us.sock);
    if (res != 0) {
        return res;
    }
@@ -571,7 +611,8 @@ qb_ipcc_us_connect(struct qb_ipcc_connection *c,

    close(fd_hdr);

-   res = qb_ipcc_us_sock_connect(c->name, &c->event.u.us.sock);
+   res = qb_ipcc_us_sock_connect(c->name, c->event.max_msg_size,
+                     &c->event.u.us.sock);
    if (res != 0) {
        goto cleanup_hdr;
    }
@@ -954,7 +995,6 @@ cleanup_and_return:
 #ifdef SO_PASSCRED
    setsockopt(sock, SOL_SOCKET, SO_PASSCRED, &off, sizeof(off));
 #endif
-
    return res;
 }

@@ -1005,6 +1045,11 @@ retry_accept:
         */
        return 0;
    }
+   res = ensure_socket_buf_size(new_fd, setup_msg.max_msg_size);
+   if (res < 0) {
+       close(new_fd);
+       return 0;
+   }

    res = qb_ipcs_uc_recv_and_auth(new_fd, &setup_msg, sizeof(setup_msg),
                       &ugp);
-- 
1.7.11.7

@jfriesse
Copy link
Member Author

Angus,
actually patch seems to be malformed and I was not able to apply it automatically. But I've tried to apply it manualy and sadly it didn't helped.

Just one note. Problem seems to appear on my machine reliably only if debug is set to trace + to_stderr is set to yes + running corosync as corosync -f.

What's even more interesting, only cpg_verify seems to have this problem (or at least, I'm able to reproduce it here). cpgbench or testcpg doesn't.

@jfriesse
Copy link
Member Author

Angus,
I've sent you better test case. That seems to work 100% (no matter on to_Stderr settings, ...)

@ghost ghost assigned asalkeld Dec 13, 2012
@asalkeld
Copy link
Contributor

Honza thanks for your help with the reproducer, it's a great help.

I have sent you 2 patches (hopefully better to apply).

I still want to test some more - bit worried about the portability of the last patch.

-A

@asalkeld
Copy link
Contributor

So those patches I sent will only fix part of the problem, I am working to resolve this better.
Basically the kernel has limits on message size and number of pending messages.

@asalkeld
Copy link
Contributor

Can you give this a go?

https://github.com/asalkeld/libqb/tree/no-sock-block

@asalkeld
Copy link
Contributor

Honza I think this is fixed in master, but can't hunt down your neat reproducer. Do you still have it by any chance?

@jfriesse
Copy link
Member Author

Angus,
I've send again you what I believe was reproducer. Bug seems to be really fixed in master (or at least after 15-minutes of running reproducer there is no lock).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants