IPC_SHM: sem_timedwait returns ETIMEDOUT in test test_ipc_server_fail #37

Closed
grueni opened this Issue May 31, 2012 · 8 comments

Comments

Projects
None yet
2 participants
@grueni
Contributor

grueni commented May 31, 2012

I compiled this fork of libqb with Oracle Solaris 11U7, gcc 4.5.2 and some packages of csw.
The ipc tests fail when shm is used.

In check_ipc.s in test "ipc_server_fail_shm" the function test_ipc_server_fail is called.
test_ipc_server_fail calls qb_ipcc_sendv_recv from lib/ipcc.c with a value of -1 for ms_timeout.
In the loop below qb_ipcc_recv is called which returns -ETIMEDOUT.
-ETIMEDOUT is the original return code of sem_timedwait.
The logic of the code is (as I understand it):

  • if ms_timeout lower than 0 then change -ETIMEDOUT to -EAGAIN
  • connectionstate ist not called, client keeps connected
  • if -EAGAIN and still connected then try again
    and this is an endless loop.
    I debugged all steps through the program up to the call of sem_timedwait and there are no changes to the errorcode or the connectionstate.

Maybe this is an problem when compiling with gcc 4.5.2 and Oracle Solaris 11.
Other systems may return another value as ETIMEDOUT.

The initialization of the values of timeout_now and timeout_rem in the loop should be before the loop.
It does not change anything in the actual implementation but it looks more logically (IHMO).

In lib/ipcc.c line 284:

    do {
        if (timeout_rem > QB_IPC_MAX_WAIT_MS || ms_timeout == -1) {
            timeout_now = QB_IPC_MAX_WAIT_MS;
        } else {
            timeout_now = timeout_rem;
        }

        res = qb_ipcc_recv(c, res_msg, res_len, timeout_now);
        if (res == -ETIMEDOUT) {
            if (ms_timeout < 0) {
                res = -EAGAIN;
            } else {
                timeout_rem -= timeout_now;
                if (timeout_rem > 0) {
                    res = -EAGAIN;
                }
            }
        } else  if (res < 0 && res != -EAGAIN) {
            errno = -res;
            qb_util_perror(LOG_DEBUG,
            "qb_ipcc_recv %d timeout:(%d/%d)",
            res, timeout_now, timeout_rem);
        }
    } while (res == -EAGAIN && c->is_connected);

My changes are here with debugging an without debugging code

        qb_util_log(LOG_DEBUG, "qb_ipcc_sendv_recv before do QB_IPC_MAX_WAIT_MS=%d timeout_rem=%d timeout_now=%d ms_timeout=%d",
           QB_IPC_MAX_WAIT_MS,timeout_rem,timeout_now,ms_timeout);
        if (timeout_rem > QB_IPC_MAX_WAIT_MS || ms_timeout == -1) {
           timeout_now = QB_IPC_MAX_WAIT_MS;
           timeout_rem = QB_IPC_MAX_WAIT_MS;
        } else {
           timeout_now = timeout_rem;
        }

        do {
            qb_util_log(LOG_DEBUG, "qb_ipcc_sendv_recv in do QB_IPC_MAX_WAIT_MS=%d timeout_rem=%d timeout_now=%d ms_timeout=%d",
               QB_IPC_MAX_WAIT_MS,timeout_rem,timeout_now,ms_timeout);

            res = qb_ipcc_recv(c, res_msg, res_len, timeout_now);
            qb_util_log(LOG_DEBUG, "qb_ipcc_sendv_recv in do res=%d ms_timeout=%d", res,ms_timeout);
            if (res == -ETIMEDOUT) {
            qb_util_log(LOG_DEBUG, "qb_ipcc_sendv_recv in do res=%d timeout_rem=%d timeout_now=%d diff=%d",
               res,timeout_rem,timeout_now,timeout_rem - timeout_now);
                timeout_rem -= timeout_now;
                if (timeout_rem > 0) {
                        res = -EAGAIN;
                } else {
                    _check_connection_state(c, -EIO);
                    res = -ENOTCONN;
                    errno = res;
                }
            } else  if (res < 0 && res != -EAGAIN) {
                errno = -res;
                qb_util_perror(LOG_DEBUG,
                               "qb_ipcc_recv %d timeout:(%d/%d)",
                               res, timeout_now, timeout_rem);
            }
        } while (res == -EAGAIN && c->is_connected);
        if (timeout_rem > QB_IPC_MAX_WAIT_MS || ms_timeout == -1) {
           timeout_now = QB_IPC_MAX_WAIT_MS;
           timeout_rem = QB_IPC_MAX_WAIT_MS;
        } else {
           timeout_now = timeout_rem;
        }

        do {
            res = qb_ipcc_recv(c, res_msg, res_len, timeout_now);
            if (res == -ETIMEDOUT) {
                timeout_rem -= timeout_now;
                if (timeout_rem > 0) {
                        res = -EAGAIN;
                } else {
                    _check_connection_state(c, -EIO);
                    res = -ENOTCONN;
                    errno = res;
                }
            } else  if (res < 0 && res != -EAGAIN) {
                errno = -res;
                qb_util_perror(LOG_DEBUG,
                               "qb_ipcc_recv %d timeout:(%d/%d)",
                               res, timeout_now, timeout_rem);
            }
        } while (res == -EAGAIN && c->is_connected);
@asalkeld

This comment has been minimized.

Show comment Hide comment
@asalkeld

asalkeld May 31, 2012

Contributor

Hi

Thanks for reporting, and help in digging into the code.

The way this works on linux is that in qb_ipcc_recv() we get the ETIMEDOUT
and it calls qb_ipc_us_recv_ready() which calls poll() on the unix socket
this returns POLLHUP and we convert that into ENOTCONN

On freebsd we don't get POLLHUP but when I last tested this seemed to work.
(I'll give that a quick go)

Contributor

asalkeld commented May 31, 2012

Hi

Thanks for reporting, and help in digging into the code.

The way this works on linux is that in qb_ipcc_recv() we get the ETIMEDOUT
and it calls qb_ipc_us_recv_ready() which calls poll() on the unix socket
this returns POLLHUP and we convert that into ENOTCONN

On freebsd we don't get POLLHUP but when I last tested this seemed to work.
(I'll give that a quick go)

@asalkeld

This comment has been minimized.

Show comment Hide comment
@asalkeld

asalkeld May 31, 2012

Contributor

Regarding your patch:

if the timeout is -1 (wait forever)
then going into the loop (timeout_rem == timeout_now == QB_IPC_MAX_WAIT_MS)
if (res == -ETIMEDOUT) {
timeout_rem -= timeout_now;
// now timeout_rem == 0
if (timeout_rem > 0) {
res = -EAGAIN;
} else {
// now disconnect, this is not right so after one loop we disconnect?
// the idea behind QB_IPC_MAX_WAIT_MS was to give the poll() a chance to
// figure out if the connection was dead, not as an absolute max timeout
_check_connection_state(c, -EIO);
res = -ENOTCONN;
errno = res;
}
} else if (res < 0 && res != -EAGAIN) {

Contributor

asalkeld commented May 31, 2012

Regarding your patch:

if the timeout is -1 (wait forever)
then going into the loop (timeout_rem == timeout_now == QB_IPC_MAX_WAIT_MS)
if (res == -ETIMEDOUT) {
timeout_rem -= timeout_now;
// now timeout_rem == 0
if (timeout_rem > 0) {
res = -EAGAIN;
} else {
// now disconnect, this is not right so after one loop we disconnect?
// the idea behind QB_IPC_MAX_WAIT_MS was to give the poll() a chance to
// figure out if the connection was dead, not as an absolute max timeout
_check_connection_state(c, -EIO);
res = -ENOTCONN;
errno = res;
}
} else if (res < 0 && res != -EAGAIN) {

@grueni

This comment has been minimized.

Show comment Hide comment
@grueni

grueni May 31, 2012

Contributor

I observed ETIMEOUT bubbling up to the loop where I made the changes.

I understood the intentions of the original code. The program should have several chances to get a connection before giving up.
With the original code this can't work because the variable to decrement is initialized with each ETIMEOUT as return value.
And the test must fail because ms_timeout equals -1 and this results in an endless loop because in this case the result was changed to EGAIN.

Delete this line:
timeout_rem = QB_IPC_MAX_WAIT_MS;
from the patch.
Then it should work again with chances for several polls depending on the values of ms_timeout.

if (timeout_rem > QB_IPC_MAX_WAIT_MS || ms_timeout == -1) {
           timeout_now = QB_IPC_MAX_WAIT_MS;
//           timeout_rem = QB_IPC_MAX_WAIT_MS;
        } else {
           timeout_now = timeout_rem;
Contributor

grueni commented May 31, 2012

I observed ETIMEOUT bubbling up to the loop where I made the changes.

I understood the intentions of the original code. The program should have several chances to get a connection before giving up.
With the original code this can't work because the variable to decrement is initialized with each ETIMEOUT as return value.
And the test must fail because ms_timeout equals -1 and this results in an endless loop because in this case the result was changed to EGAIN.

Delete this line:
timeout_rem = QB_IPC_MAX_WAIT_MS;
from the patch.
Then it should work again with chances for several polls depending on the values of ms_timeout.

if (timeout_rem > QB_IPC_MAX_WAIT_MS || ms_timeout == -1) {
           timeout_now = QB_IPC_MAX_WAIT_MS;
//           timeout_rem = QB_IPC_MAX_WAIT_MS;
        } else {
           timeout_now = timeout_rem;
@asalkeld

This comment has been minimized.

Show comment Hide comment
@asalkeld

asalkeld Jun 1, 2012

Contributor

I have pushed 1e81530
that I hope will sort your problem, can you test it and tell if how it goes?

Contributor

asalkeld commented Jun 1, 2012

I have pushed 1e81530
that I hope will sort your problem, can you test it and tell if how it goes?

@grueni

This comment has been minimized.

Show comment Hide comment
@grueni

grueni Jun 1, 2012

Contributor

Please see in the comments to the commit.
Where should I answer?

Contributor

grueni commented Jun 1, 2012

Please see in the comments to the commit.
Where should I answer?

@grueni grueni closed this Jun 1, 2012

@grueni grueni reopened this Jun 1, 2012

@grueni

This comment has been minimized.

Show comment Hide comment
@grueni

grueni Jun 1, 2012

Contributor

Erroneously closed, reopened.

Contributor

grueni commented Jun 1, 2012

Erroneously closed, reopened.

@asalkeld

This comment has been minimized.

Show comment Hide comment
@asalkeld

asalkeld Jun 1, 2012

Contributor

Thanks, for the re-test. I'll try merge your changes. You could also send
patch(s) to the mailing list/or do a pull request - I am easy.

There is also a mailing list: https://fedorahosted.org/mailman/listinfo/quarterback-devel

Contributor

asalkeld commented Jun 1, 2012

Thanks, for the re-test. I'll try merge your changes. You could also send
patch(s) to the mailing list/or do a pull request - I am easy.

There is also a mailing list: https://fedorahosted.org/mailman/listinfo/quarterback-devel

@asalkeld

This comment has been minimized.

Show comment Hide comment
@asalkeld

asalkeld Sep 3, 2012

Contributor

This should be fixed now. If you still have a problem please re-open.

Contributor

asalkeld commented Sep 3, 2012

This should be fixed now. If you still have a problem please re-open.

@asalkeld asalkeld closed this Sep 3, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment