New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mon: more aggressively convert crush rulesets -> distinct rules #17508
Conversation
ab535b0
to
5dc3ab0
Compare
src/mon/OSDMonitor.cc
Outdated
// Now, go ahead and renumber all the rules so that their | ||
// rule_id field corresponds to their position in the array | ||
auto old_to_new = newcrush.renumber_rules(); | ||
dout(1) << __func__ << "Rewrote " << old_to_new << " crush IDs:" << dendl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, add a space before "Rewrote".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
* Return a map of old ID -> new ID. Caller must update OSDMap | ||
* to use new IDs. | ||
*/ | ||
std::map<int, int> renumber_rules(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, make this method a const
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not const.. it modifies the ruleset field in the mask
src/osd/OSDMap.cc
Outdated
auto& pool = i.second; | ||
int ruleno = pool.get_crush_rule(); | ||
if (!newcrush->rule_exists(ruleno)) { | ||
if (ss) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ss
is always valid, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y
src/osd/OSDMap.cc
Outdated
if (newcrush->get_rule_mask_ruleset(ruleno) != ruleno) { | ||
if (ss) | ||
*ss << "rule " << ruleno << " mask ruleset does not match rule id"; | ||
return -EINVAL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also check
if (!newcrush->check_crush_rule(ruleno, pool.get_type(), pool.get_size(), *ss)) {
return -EINVAL;
}
? so we can verify the type and size range of the rule are sane.
but we might need to update check_crush_rule()
, so it prints out the ruleid in the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated = validate to make sure min/max size are ok
5dc3ab0
to
f382847
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't followed along in the earlier drafts here.
- Have we already made it impossible for admins to have set multiple rules to a ruleset?
- Is there anything we can do about existing ceph-external databases referencing rules by the wrong number, once renumbered? If we're just doing it automatically that seems a bit cruel (especially if we want to backport this to Luminous in a point release).
src/mon/OSDMonitor.cc
Outdated
|
||
// Rewrite CRUSH rule IDs if they are using legacy "ruleset" | ||
// structure. Only do this if there is no pending crush map or pool | ||
// creations/deletions/updates and we don't have to worry about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There doesn't seem to be any logic enforcing this constraint that we not have other stuff going on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, that comment is obsolete from when this code was elsewehre. it's now in create_pending(), so by definition there is nothing (yet) going on. fixed
Before we only fixed the rulesets in the trivial case; this patch does it for any cluster (by also fixing up pg_pool_t::crush_rule as needed). So now we will prevent multiple rules in a ruleset. I think kraken had a weak guard to steer users away from doing that, but notably jewel allowed it. What kind of external database do you have in mind? tendrl or something? The "good news" is that we've also changed the interface for setting this (ceph osd pool set $pool crush_rule ... instead of crush_ruleset), and the prevent any incoming crush map from doing anything untoward, so I don't think an unmodified external thing could accidentally do the wrong thing here (it would fail outright instead). |
f382847
to
a4c4978
Compare
Also notably I guess 12.2.0 doesn't allow these to be created so this conversion won't trigger for earlier luminous folks.. it will only trigger for clusters that upgraded from an older version. |
<< " fall within rule " << ruleno | ||
<< " min_size " << newcrush->get_rule_mask_min_size(ruleno) | ||
<< " and max_size " << newcrush->get_rule_mask_max_size(ruleno); | ||
return -EINVAL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we check poo's type against rule's type as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, fixed!
59ef476
to
5f4c8fc
Compare
Do we know anything about the ordering on existing rules, though? Maybe there aren't actually any clusters with multiple rules in a ruleset, but if there are I presume this is not going to play nicely for them as we forcibly renumber stuff...?
It doesn't need to be anything as sophisticated as tendrl if they've got cookbooks or playbooks that automatically set up new pools when hardware pods are added or something. |
Can you explain what might not play nicely? We're renumbering the ruleset field to match the rule id, and then changing each pool to make sure it maps to the new ruleset (and ruleid) that it used before. No net change except the ruleset ids are different. Any external tool that was fiddling with ruleset before will fail and need changes because the crush_ruleset pool property is now called crush_rule, so I don't think we need to worry about misbehavior there. |
Ah okay, I don't think I don't actually know how the IDs associate with rules and rule sets. Was concerned we could go from having e.g. rules 1,3,5 to 1,2,3. |
src/mon/OSDMonitor.cc
Outdated
<< osdmap.get_pool_name(pool_id) << " crush ruleset " | ||
<< pool.crush_rule << " -> rule id " << new_rule_id << dendl; | ||
pg_pool_t &updated = pending_inc.new_pools[pool_id]; | ||
updated = pool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment still applies? https://github.com/ceph/ceph/pull/17255/files#r135267195
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably.. i was focusing just on the bit that I'd like to get backported to luminous.
5f4c8fc
to
337975c
Compare
Signed-off-by: John Spray <john.spray@redhat.com>
Previously, the rules were only modified in the trivial case, so we continued to potentially have CRUSH maps with the legacy ruleset functionality in use. In order to ultimately remove rulesets entirely, we need to do this more aggressively, renumbering all the rules and then updating any pools as needed. Signed-off-by: John Spray <john.spray@redhat.com>
337975c
to
3823fc6
Compare
- move into OSDMap method - ensure that rules exist for each pool - ensure pool type matches rule type - ensure rule mask min/max size cover the pool size Signed-off-by: Sage Weil <sage@redhat.com>
3823fc6
to
8f8f0b2
Compare
I think this is ready to merge; final review please? |
The existing mon code only makes a half-hearted attempt to adjust crush rules so that ruleset == rule id. This PR
Mimic will eventually remove the rule mask crap from the decompiled crush maps with the addition of a tunable flag, but it's too late to do that in luminous. Instead, let's just make sure that luminous clusters normalize the ruleset/ruleid business completely.