mon: do crushtool test with fork and timeout, but w/o exec of crushtool #16025

liewegas · 2017-06-29T19:16:50Z

This basically copies the Subprocess hackery for the timeout of the forked
subprocess, but operates on a lamba (in this case, CrushTester::test()) instead
of running crushtool via exec(2).

trociny · 2017-07-07T08:24:16Z

src/common/fork_function.h

+      return 128 + WTERMSIG(status);
+    }
+    if (WIFEXITED(status)) {
+      int r = WEXITSTATUS(status);


@liewegas
WEXITSTATUS(status) evaluates to the low-order 8 bits of the argument passed to exit() by the child.
We could make the std::function int8_t, and change this to:

int8_t r = WEXITSTATUS(status);

I suppose this would allow to avoid "unsigned awkwardness".

trociny · 2017-07-07T08:25:16Z

src/common/fork_function.h

+	errstr << ": timed out\n";
+	return -ETIMEDOUT;
+      }
+      errstr << ": exit status: " << r << "\n";


Probably it is better to output r -1 value?

trociny · 2017-07-07T08:40:40Z

After fork() the child shares all opened files and sockets. In SubProcess::spawn() we have a trick to close all inherited descriptors. In the case of fork_function it is not so critical as it is in the SubProcess case (when it could share with a foreign process) still I think it is probably a good idea to close all descriptors in fork_function too, after the first fork.

liewegas · 2017-07-07T12:36:12Z

Updated, mind looking? Thanks!

trociny

otherwise LGTM

trociny · 2017-07-07T14:56:41Z

src/common/fork_function.h

+
+// Run a function post-fork, with a timeout.  The exit code encoding
+// weirdness does seem to want me to allow negative values, so we make
+// the std::function unsigned.  fork_function() itself is signed


@liewegas need to update the comment and the commit log message?

Signed-off-by: Sage Weil <sage@redhat.com>

Not strictly necessary, but a tidier. Signed-off-by: Sage Weil <sage@redhat.com>

We see timeouts here, but I very much suspect they are due to the overhead of launching the crushtool process and not the test itself. We have perfectly code already in our process, though; we just want to isolate failure and time out reliably. So, fork and timeout, without executing a new binary. Hopefully-fixes: http://tracker.ceph.com/issues/19964 Signed-off-by: Sage Weil <sage@redhat.com>

test_with_fork is superior in all ways :) Signed-off-by: Sage Weil <sage@redhat.com>

Signed-off-by: Sage Weil <sage@redhat.com>

liewegas · 2017-07-07T15:12:09Z

made a few changes, will test again.

liewegas · 2017-07-08T14:00:58Z

http://pulpito.ceph.com/sage-2017-07-07_20:50:14-rados-wip-sage-testing-distro-basic-smithi/

liewegas force-pushed the wip-19964 branch from f377814 to 4d1fe94 Compare June 29, 2017 19:17

liewegas added bug-fix mon wip-sage-testing and removed wip-sage-testing labels Jun 29, 2017

tchaikov self-requested a review July 6, 2017 16:04

liewegas modified the milestone: luminous Jul 6, 2017

liewegas force-pushed the wip-19964 branch 2 times, most recently from 2b169fe to 093415d Compare July 6, 2017 21:32

liewegas added the needs-qa label Jul 6, 2017

liewegas requested a review from trociny July 6, 2017 22:16

liewegas added the wip-sage2-testing label Jul 7, 2017

liewegas force-pushed the wip-19964 branch from 093415d to 5b761d2 Compare July 7, 2017 02:42

trociny reviewed Jul 7, 2017

View reviewed changes

liewegas force-pushed the wip-19964 branch from 5b761d2 to 84be197 Compare July 7, 2017 12:36

trociny approved these changes Jul 7, 2017

View reviewed changes

liewegas added 5 commits July 7, 2017 11:11

common/fork_function: helper to run a function, forked, with a timeout

f46d76e

Signed-off-by: Sage Weil <sage@redhat.com>

common/fork_function: close all fds in children

55f76d5

Not strictly necessary, but a tidier. Signed-off-by: Sage Weil <sage@redhat.com>

crush/CrushTester: remove old test_with_crushtool helper

857867f

test_with_fork is superior in all ways :) Signed-off-by: Sage Weil <sage@redhat.com>

qa/workunits/cephtool/test.sh: remove two crushtool validation tests

65d9d66

Signed-off-by: Sage Weil <sage@redhat.com>

liewegas force-pushed the wip-19964 branch from 84be197 to 65d9d66 Compare July 7, 2017 15:11

liewegas removed the wip-sage2-testing label Jul 7, 2017

liewegas added the wip-sage-testing label Jul 7, 2017

liewegas merged commit a0ba660 into ceph:master Jul 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mon: do crushtool test with fork and timeout, but w/o exec of crushtool #16025

mon: do crushtool test with fork and timeout, but w/o exec of crushtool #16025

liewegas commented Jun 29, 2017 •

edited

trociny Jul 7, 2017

trociny Jul 7, 2017

trociny commented Jul 7, 2017

liewegas commented Jul 7, 2017

trociny left a comment

trociny Jul 7, 2017

liewegas commented Jul 7, 2017

liewegas commented Jul 8, 2017

mon: do crushtool test with fork and timeout, but w/o exec of crushtool #16025

mon: do crushtool test with fork and timeout, but w/o exec of crushtool #16025

Conversation

liewegas commented Jun 29, 2017 • edited

trociny Jul 7, 2017

Choose a reason for hiding this comment

trociny Jul 7, 2017

Choose a reason for hiding this comment

trociny commented Jul 7, 2017

liewegas commented Jul 7, 2017

trociny left a comment

Choose a reason for hiding this comment

trociny Jul 7, 2017

Choose a reason for hiding this comment

liewegas commented Jul 7, 2017

liewegas commented Jul 8, 2017

liewegas commented Jun 29, 2017 •

edited