Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr: die on bind() failure #20595

Merged
merged 1 commit into from Feb 28, 2018

Conversation

Projects
None yet
4 participants
@jcsp
Copy link
Contributor

commented Feb 26, 2018

Previously, the daemon would get wedged if it
competed for the same port as another daemon
on the same host and lost.

Fixes: https://tracker.ceph.com/issues/23037
Signed-off-by: John Spray john.spray@redhat.com

John Spray
mgr: die on bind() failure
Previously, the daemon would get wedged if it
competed for the same port as another daemon
on the same host and lost.

Fixes: https://tracker.ceph.com/issues/23037
Signed-off-by: John Spray <john.spray@redhat.com>

@jcsp jcsp added bug fix mgr labels Feb 26, 2018

@liewegas liewegas added the needs-qa label Feb 26, 2018

@wjwithagen

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2018

Why not do the same as with binding in MON and OSD.
Now systems that do not have systemd wll have to start wrapping mgr with something to keep autostarting it. :(

@jcsp

This comment has been minimized.

Copy link
Contributor Author

commented Feb 27, 2018

@wjwithagen I thought the mon and osd were also exiting when they had a bind failure?

@wjwithagen

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2018

@jcsp

I know there is, because FreeBSD and linux do not see eye to eye on socket reuse and the opetions used for that. So there is some code in
./src/msg/async/AsyncMessenger.cc:96
That part loops on bind() problems and tries several ports several times before giving up.

Otherwise restart MON and OSD would fail, due to socket still in use.

@jcsp

This comment has been minimized.

Copy link
Contributor Author

commented Feb 27, 2018

Since ceph-mgr is using the same underlying messenger code, it would get the same ms_bind_retry_count though, so I still don't see what the difference is between what mgr is doing and what mon/osd are doing?

@wjwithagen

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2018

@jcsp
Neither do I then. Both Simple as Async play this trick.

So this is more or less only possible when there is only one MGR port that is retried several times, after which the trying fails.
Or there are more ports and all of those were taken.
So it feels more like a configuration problem???

Let alone that this PR does end up doing what MON and OSD do in the end: abort and die.

@tchaikov tchaikov merged commit 55b4662 into ceph:master Feb 28, 2018

5 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.