New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon: do crushtool test with fork and timeout, but w/o exec of crushtool #16025

Merged
merged 5 commits into from Jul 8, 2017

Conversation

Projects
None yet
2 participants
@liewegas
Member

liewegas commented Jun 29, 2017

This basically copies the Subprocess hackery for the timeout of the forked
subprocess, but operates on a lamba (in this case, CrushTester::test()) instead
of running crushtool via exec(2).

@tchaikov tchaikov self-requested a review Jul 6, 2017

@liewegas liewegas modified the milestone: luminous Jul 6, 2017

@liewegas liewegas added the needs-qa label Jul 6, 2017

@liewegas liewegas requested a review from trociny Jul 6, 2017

return 128 + WTERMSIG(status);
}
if (WIFEXITED(status)) {
int r = WEXITSTATUS(status);

This comment has been minimized.

@trociny

trociny Jul 7, 2017

Contributor

@liewegas
WEXITSTATUS(status) evaluates to the low-order 8 bits of the argument passed to exit() by the child.
We could make the std::function int8_t, and change this to:

    int8_t r = WEXITSTATUS(status);

I suppose this would allow to avoid "unsigned awkwardness".

errstr << ": timed out\n";
return -ETIMEDOUT;
}
errstr << ": exit status: " << r << "\n";

This comment has been minimized.

@trociny

trociny Jul 7, 2017

Contributor

Probably it is better to output r -1 value?

@trociny

This comment has been minimized.

Contributor

trociny commented Jul 7, 2017

After fork() the child shares all opened files and sockets. In SubProcess::spawn() we have a trick to close all inherited descriptors. In the case of fork_function it is not so critical as it is in the SubProcess case (when it could share with a foreign process) still I think it is probably a good idea to close all descriptors in fork_function too, after the first fork.

@liewegas

This comment has been minimized.

Member

liewegas commented Jul 7, 2017

Updated, mind looking? Thanks!

@trociny

trociny approved these changes Jul 7, 2017

otherwise LGTM

// Run a function post-fork, with a timeout. The exit code encoding
// weirdness does seem to want me to allow negative values, so we make
// the std::function unsigned. fork_function() itself is signed

This comment has been minimized.

@trociny

trociny Jul 7, 2017

Contributor

@liewegas need to update the comment and the commit log message?

liewegas added some commits Jun 29, 2017

mon/OSDMonitor: test crush with fork but not crushtool spawn
We see timeouts here, but I very much suspect they are due to the overhead
of launching the crushtool process and not the test itself.  We have
perfectly code already in our process, though; we just want to isolate
failure and time out reliably.  So, fork and timeout, without executing
a new binary.

Hopefully-fixes: http://tracker.ceph.com/issues/19964
Signed-off-by: Sage Weil <sage@redhat.com>
crush/CrushTester: remove old test_with_crushtool helper
test_with_fork is superior in all ways :)

Signed-off-by: Sage Weil <sage@redhat.com>
common/fork_function: close all fds in children
Not strictly necessary, but a tidier.

Signed-off-by: Sage Weil <sage@redhat.com>
qa/workunits/cephtool/test.sh: remove two crushtool validation tests
Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas

This comment has been minimized.

Member

liewegas commented Jul 7, 2017

made a few changes, will test again.

@liewegas liewegas merged commit a0ba660 into ceph:master Jul 8, 2017

4 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment