Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon: do crushtool test with fork and timeout, but w/o exec of crushtool #16025

Merged
merged 5 commits into from Jul 8, 2017

Conversation

liewegas
Copy link
Member

@liewegas liewegas commented Jun 29, 2017

This basically copies the Subprocess hackery for the timeout of the forked
subprocess, but operates on a lamba (in this case, CrushTester::test()) instead
of running crushtool via exec(2).

return 128 + WTERMSIG(status);
}
if (WIFEXITED(status)) {
int r = WEXITSTATUS(status);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liewegas
WEXITSTATUS(status) evaluates to the low-order 8 bits of the argument passed to exit() by the child.
We could make the std::function int8_t, and change this to:

    int8_t r = WEXITSTATUS(status);

I suppose this would allow to avoid "unsigned awkwardness".

errstr << ": timed out\n";
return -ETIMEDOUT;
}
errstr << ": exit status: " << r << "\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it is better to output r -1 value?

@trociny
Copy link
Contributor

trociny commented Jul 7, 2017

After fork() the child shares all opened files and sockets. In SubProcess::spawn() we have a trick to close all inherited descriptors. In the case of fork_function it is not so critical as it is in the SubProcess case (when it could share with a foreign process) still I think it is probably a good idea to close all descriptors in fork_function too, after the first fork.

@liewegas
Copy link
Member Author

liewegas commented Jul 7, 2017

Updated, mind looking? Thanks!

Copy link
Contributor

@trociny trociny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise LGTM


// Run a function post-fork, with a timeout. The exit code encoding
// weirdness does seem to want me to allow negative values, so we make
// the std::function unsigned. fork_function() itself is signed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liewegas need to update the comment and the commit log message?

Not strictly necessary, but a tidier.

Signed-off-by: Sage Weil <sage@redhat.com>
We see timeouts here, but I very much suspect they are due to the overhead
of launching the crushtool process and not the test itself.  We have
perfectly code already in our process, though; we just want to isolate
failure and time out reliably.  So, fork and timeout, without executing
a new binary.

Hopefully-fixes: http://tracker.ceph.com/issues/19964
Signed-off-by: Sage Weil <sage@redhat.com>
test_with_fork is superior in all ways :)

Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas
Copy link
Member Author

liewegas commented Jul 7, 2017

made a few changes, will test again.

@liewegas
Copy link
Member Author

liewegas commented Jul 8, 2017

@liewegas liewegas merged commit a0ba660 into ceph:master Jul 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants