New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: warn if insufficient standbys exist #12074

Merged
merged 2 commits into from Mar 28, 2017

Conversation

Projects
None yet
2 participants
@batrick
Member

batrick commented Nov 18, 2016

No description provided.

@@ -398,7 +398,8 @@ COMMAND("fs get name=fs_name,type=CephString", \
COMMAND("fs set " \
"name=fs_name,type=CephString " \
"name=var,type=CephChoices,strings=max_mds|max_file_size"
"|allow_new_snaps|inline_data|cluster_down|allow_multimds|allow_dirfrags|balancer " \
"|allow_new_snaps|inline_data|cluster_down|allow_multimds|allow_dirfrags|balancer" \
"|standby_count_desired " \

This comment has been minimized.

@jcsp

jcsp Nov 22, 2016

Contributor

The name of the variable is a bit awkward, but I can also see how something simpler like "standbys_wanted" isn't obviously a count rather than a boolean.

Naming is hard!

This comment has been minimized.

@batrick

batrick Nov 22, 2016

Member

"desired" in particular kinda strange. I think I'll changed that to "wanted".

This comment has been minimized.

@batrick

batrick Nov 22, 2016

Member

@jcsp, updated to "standby_count_wanted".

@jcsp

Some more comments from a more detailed look. I think we also need a test for this as it's a new user visible feature.

@@ -290,6 +292,15 @@ class MDSMap {
mds_rank_t get_max_mds() const { return max_mds; }
void set_max_mds(mds_rank_t m) { max_mds = m; }
mds_rank_t get_standby_count_wanted() {

This comment has been minimized.

@jcsp

jcsp Nov 22, 2016

Contributor

Method should be const

@@ -687,6 +696,8 @@ void FSMap::promote(
// An existing rank is being assigned to a replacement
info.state = MDSMap::STATE_REPLAY;
mds_map.failed.erase(assigned_rank);
if (mds_map.standby_count_wanted == 0)

This comment has been minimized.

@jcsp

jcsp Nov 22, 2016

Contributor

It's not obvious to me why this belongs here: it seems like we're only going to miss setting standby_count_wanted initially on a healthy system with one active and one standby -- if the standby dies we won't warn, unless the active has died and been replaced previously.

@@ -193,6 +193,7 @@ class MDSMap {
*/
mds_rank_t max_mds; /* The maximum number of active MDSes. Also, the maximum rank. */
mds_rank_t standby_count_wanted;

This comment has been minimized.

@jcsp

jcsp Nov 22, 2016

Contributor

Currently I think that if the user decides he doesn't like the warning and wants to reset this to zero, we're going to go ahead and set it back to 1 in the cases where it's auto set during promote(). That's probably not desired behaviour -- what if instead we defaulted/initialized standby_count_wanted to -1, and then only auto-set it to 1 from -1. If it was zero, we should assume that is a human-set value and not modify it for them.

@jcsp

This comment has been minimized.

Contributor

jcsp commented Nov 23, 2016

Please add docs for the new health warning in doc/cephfs/health-messages.rst (and probably mention how to configure it in doc/cephfs/standby.rst)

@batrick

This comment has been minimized.

Member

batrick commented Dec 7, 2016

@jcsp I've implemented your suggestions. I feel a little guilty about changing standby_count_wanted in a getter but it seemed like the right spot. Do you think it's okay?

I'm writing tests for this now.

@batrick

This comment has been minimized.

Member

batrick commented Dec 7, 2016

Oh, this doesn't work the way I thought it would... disregard. Need to make more adjustments.

@jcsp

This comment has been minimized.

Contributor

jcsp commented Jan 26, 2017

@batrick this is waiting for updates from you, right?

@batrick

This comment has been minimized.

Member

batrick commented Jan 26, 2017

Yes, I haven't forgotten... just on the back burner right now.

@batrick batrick assigned jcsp and unassigned batrick Feb 21, 2017

@batrick

This comment has been minimized.

Member

batrick commented Feb 21, 2017

@jcsp this is ready for another review.

batrick added some commits Nov 18, 2016

mds: simplify iteration code
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mds: warn if insufficient standbys exist
Fixes: http://tracker.ceph.com/issues/17604

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@batrick

This comment has been minimized.

Member

batrick commented Feb 28, 2017

Rebase on current master due to conflicts.

@batrick

This comment has been minimized.

Member

batrick commented Mar 21, 2017

@jcsp friendly reminder ping

@jcsp

This comment has been minimized.

Contributor

jcsp commented Mar 27, 2017

retest this please (jenkins)

@jcsp

jcsp approved these changes Mar 27, 2017

@jcsp jcsp merged commit d495900 into ceph:master Mar 28, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@batrick batrick deleted the batrick:i17604 branch May 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment