Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon: enable luminous monmap feature on full quorum #13379

Merged
merged 5 commits into from Mar 6, 2017

Conversation

Projects
None yet
5 participants
@jecluis
Copy link
Member

jecluis commented Feb 13, 2017

We will automatically enable the luminous persistent monmap feature
when we have a quorum composed of all the monitors in the monmap.

This patch series also drops the 'mon debug features' admin socket
interface, and adds instead a proper 'mon features ...' interface on the
MonmapMonitor, allowing listing, setting and unsetting features.

Signed-off-by: Joao Eduardo Luis <joao@suse.de>

@jecluis jecluis requested a review from liewegas Feb 13, 2017

@jecluis

This comment has been minimized.

Copy link
Member Author

jecluis commented Feb 13, 2017

Please note, I realized I forgot to add cli tests to the branch just as i was pushing it. That will be taken care of once morning comes.

FEATURE_NONE
);
}

This comment has been minimized.

Copy link
@liewegas

liewegas Feb 13, 2017

Member

do we need a new category for this? can't we just apply the full quorum rule for all persistent features? the fact that they're persistent seems like it implies we want to be careful about enabling them when they can't be disabled...

This comment has been minimized.

Copy link
@jecluis

jecluis Feb 13, 2017

Author Member

No. We can simply rely on the persistent features. Just thought that there could be some features we would want to set on a full quorum, others maybe not so, but yeah... I suppose all the persistent would be nice to set only when all the monitors belong to the quorum.

"mon", "r", "cli")
COMMAND("mon features set " \
"name=feature_name,type=CephString " \
"name=feature_type,type=CephChoices,strings=persistent|optional,req=false " \

This comment has been minimized.

Copy link
@liewegas

liewegas Feb 13, 2017

Member

this interface doesn't make sense to me. any given feature is either persistent or not; having the user pass that as an argument is just an opportunity for them to get it wrong (the mon already knows which it is). (unless it's meant as a safety check?)

I'd suggest instead

ceph mon features set
ceph mon features set-persistent [--force]

Or perhaps 'feature' instead of 'features'. :/

This comment has been minimized.

Copy link
@jecluis

jecluis Feb 13, 2017

Author Member

in a way, I really dislike the idea of having users modifying the monmap features manually, so this was kind of a sledge hammer forcing the user to provide as much info as possible, even though 'persistent' and 'optional' are not required (defaults to optional).

In any case, I'm fine with adding a 'set-persistent' instead. If we don't like it, in the future, we can always refactor this bit - but I'm guessing it will be fine, as these should not be used often.

COMMAND("mon features set " \
"name=feature_name,type=CephString " \
"name=feature_type,type=CephChoices,strings=persistent|optional,req=false " \
"name=by_value,type=CephChoices,strings=--by-value,req=false " \

This comment has been minimized.

Copy link
@liewegas

liewegas Feb 13, 2017

Member

is there any reason to support setting a feature numerically?

This comment has been minimized.

Copy link
@jecluis

jecluis Feb 13, 2017

Author Member

Testing unknown features. But I just realized it will only work on a cluster with one single monitor, because otherwise the monitors will nak each other on probing, and they will shutdown.

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Feb 13, 2017

I don't really love the syntax here, but I'm not sure what would be better. Other parts of the system just use 'ceph osd set/unset ...' but those are called flags and not features. The mon features are sort of both in that we calculate the intersection across the quorum and require them.. but the are allowed to be unset, whereas for the osdmap the 'require_*_osds' flags are just coded so that they can't be unset (and aren't automatically set). So maybe the 'feature/features' part of the prefix makes sense, but singular is probably better?

@jecluis

This comment has been minimized.

Copy link
Member Author

jecluis commented Feb 13, 2017

that was the rationale behind going with 'mon features set/unset/list' instead of 'mon set/unset/list' - that and that having a 'list' would need something more verbose to avoid confusion (e.g., 'list_features', and that's kinda ugly).

As for the plural, sure, I'll change it. Just felt nicer wrt 'features list' instead of 'feature list'. Didn't think much about it and am definitely not attached to it ;)

@wjwithagen

This comment has been minimized.

Copy link
Contributor

wjwithagen commented Feb 13, 2017

@jecluis @liewegas
My experience with the features in ZFS is that I really like the fact that I have to set them myself.
automagicall setting will have the impact that unsetting is not an option, and I would never like that to happen. Nagging "the hell" out of the system if it is really that important is oke, but IMHO auto-magically is not.

@jecluis

This comment has been minimized.

Copy link
Member Author

jecluis commented Feb 13, 2017

@wjwithagen these features are meant to make sure the monitors require them once all the monitors in the cluster support them, and they are not meant to ever be unset. So far we've been using them as upgrade checkpoints, but I can imagine a few scenarios in which we may rely on them for protocol updates as well, and in all these cases involving the user seems pointless - and I don't personally see a good reason to require an explicit administrator action to enable them.

The cli will also allow the administrator to set and unset persistent features, but I would like to point out that this is highly discouraged unless "you-are-really-really-sure".

OTOH, we also have 'optional' features, that require the administrator to set/unset, but those are meant to enable/disable certain features (which are not currently being used, but I can see a few use-cases in the nearby future).

Also, the simple nature of the monitor cluster, backed by the monmap, allows to make these decisions about when to flip the switch on a given feature a lot simpler: the map contains N monitors, so we will flip the switch when all N monitors are in the quorum; not before. Although it annoys me to suggest this, I can imagine a scenario in which an administrator, wanting to postpone the switch to be flipped, would hold back a single monitor from being upgraded - keeping in mind all the possible cons of doing so, and checking the release notes' upgrade section beforehand would be imperative.

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Feb 13, 2017

@jecluis

This comment has been minimized.

Copy link
Member Author

jecluis commented Feb 13, 2017

We can do that, but internally they'll still be "features". It's easier that way, especially given we'll still have to compute the required quorum features, which will be the union of persistent and optional/flags.

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Feb 13, 2017

@jecluis

This comment has been minimized.

Copy link
Member Author

jecluis commented Feb 13, 2017

Using mgr for pg stats could be one instance; another would be a monitor-specific network (mon<->mon), having the monitors on both cluster and public networks (for osd<->mon over cluster network), or even an admin network (client.admin <-> mon). The flags/optional features/wtv would be means to convey to consumers of the monmap what they require to talk to the monitors (e.g., sending a message via the cluster network, instead of the typical public).

Other ideas that have crossed my mind, although the usefulness may be disputable, would be to enable remote-site paxos sync for DR, for instance.

We could map these flags to features, and require features depending on which flags we enable, but implementation-wise that would be pretty much the same logic we currently have, I think.

@jecluis jecluis force-pushed the jecluis:wip-mon-luminous-features branch from c79e137 to b6c1d8d Feb 13, 2017

@jecluis

This comment has been minimized.

Copy link
Member Author

jecluis commented Feb 13, 2017

@liewegas pushed revised patches for two of the existing commits. Also, pushed an additional patch to be squashed against the others.

This last patch removes all references to 'optional' features from any cli command (except 'list', which will still output the optional features on formatted output). This way we can wait for a bit for features leveraging the 'optional' features, 'flags' or whatever, before we change them or rip them out. If we don't put them to good use, we can simply rip them in a few releases time. What do you think?

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Feb 13, 2017

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Feb 13, 2017

@jecluis

This comment has been minimized.

Copy link
Member Author

jecluis commented Feb 13, 2017

Yeah, forgot to handle the '--yes-i-really-mean-it' flag, even though the command was specified with that. (Maybe removed it on an earlier patch?)

Anyway, good point about unsetting them. We'll let the user do that via the monmap tool if they ever come across a situation in which that may be needed.

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Feb 14, 2017

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Feb 17, 2017

ping

@jecluis jecluis force-pushed the jecluis:wip-mon-luminous-features branch from b6c1d8d to 77a0f45 Feb 21, 2017

@jecluis

This comment has been minimized.

Copy link
Member Author

jecluis commented Feb 21, 2017

@liewegas pushed. tests are passing locally, but let's wait for the checks to go green before merging.

@jecluis

This comment has been minimized.

Copy link
Member Author

jecluis commented Feb 21, 2017

repushed. failures seems to be due to port collision on the test.

@jecluis

This comment has been minimized.

Copy link
Member Author

jecluis commented Feb 22, 2017

latest failure is due to jq's version on jenkins being older than the one i was working on. This means my patch is using '--exit-status' to make decisions on whether we had success or not, but the build system is completely oblivious about the magic of not having to check return strings :(

Adjusting the patch atm and will push once it passes locally.

mon: drop weird/failed mon features debug cli
Signed-off-by: Joao Eduardo Luis <joao@suse.de>

@jecluis jecluis force-pushed the jecluis:wip-mon-luminous-features branch from 4e21f93 to a3e9ca8 Feb 22, 2017

@jecluis

This comment has been minimized.

Copy link
Member Author

jecluis commented Feb 23, 2017

ok, now this is just getting silly. looks like i forgot to change a few commands away from 'jq -e'. sigh.

@jecluis

This comment has been minimized.

Copy link
Member Author

jecluis commented Feb 27, 2017

@liewegas if everything seems fine to you, I'll just squash that last commit where it belongs and it should then be okay to merge.

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Feb 27, 2017

@jecluis yep, squash away!

@jecluis jecluis force-pushed the jecluis:wip-mon-luminous-features branch from 4557394 to 462f601 Feb 27, 2017

@jecluis

This comment has been minimized.

Copy link
Member Author

jecluis commented Feb 27, 2017

@liewegas done.

@liewegas liewegas added the needs-qa label Feb 27, 2017

pending_map.last_changed = ceph_clock_now();
propose = true;

dout(1) << __func__ << ss << "; new features will be: "

This comment has been minimized.

This comment has been minimized.

Copy link
@jecluis

jecluis Mar 2, 2017

Author Member

wait wat? how did that ';' get in there? /me checks

This comment has been minimized.

Copy link
@jecluis

jecluis Mar 2, 2017

Author Member

nevermind - it's part of the string. why it's not compiling is a mystery to me though, because "it works for me (tm)", but yeah, I'll add a ss.str() instead for safe measure.

This comment has been minimized.

Copy link
@jecluis

jecluis Mar 2, 2017

Author Member

I can't figure out why it was compiling locally, but as with any other compile mystery I'll just blame it on ccache and move on. Pushing the fixed branch now.

jecluis added some commits Feb 13, 2017

mon: better 'mon features' cli
Allows listing supported and currently set monmap features, as well as
setting and unsetting them.

Signed-off-by: Joao Eduardo Luis <joao@suse.de>
mon: enable persistent monmap features on full quorum
We will now only enable persistent features automatically
when ALL the monitors in the monmap are in the quorum.
 #noMonitorLeftBehind

Signed-off-by: Joao Eduardo Luis <joao@suse.de>
qa/workunits/ceph-helpers: add wait_for_quorum()
Takes optional timeout and desired quorum size

Signed-off-by: Joao Eduardo Luis <joao@suse.de>
mon: test 'mon feature' cli
Signed-off-by: Joao Eduardo Luis <joao@suse.de>

@jecluis jecluis force-pushed the jecluis:wip-mon-luminous-features branch from 462f601 to 2374011 Mar 2, 2017

@yuriw

This comment has been minimized.

Copy link
Contributor

yuriw commented Mar 6, 2017

passed testing
@tchaikov

@liewegas liewegas merged commit ef6da79 into ceph:master Mar 6, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.