Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon: more aggressively convert crush rulesets -> distinct rules #17508

Merged
merged 3 commits into from Sep 19, 2017

Conversation

liewegas
Copy link
Member

@liewegas liewegas commented Sep 5, 2017

The existing mon code only makes a half-hearted attempt to adjust crush rules so that ruleset == rule id. This PR

  • renumbers as needed so that they always match
  • improves validation on injected crush maps

Mimic will eventually remove the rule mask crap from the decompiled crush maps with the addition of a tunable flag, but it's too late to do that in luminous. Instead, let's just make sure that luminous clusters normalize the ruleset/ruleid business completely.

// Now, go ahead and renumber all the rules so that their
// rule_id field corresponds to their position in the array
auto old_to_new = newcrush.renumber_rules();
dout(1) << __func__ << "Rewrote " << old_to_new << " crush IDs:" << dendl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, add a space before "Rewrote".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

* Return a map of old ID -> new ID. Caller must update OSDMap
* to use new IDs.
*/
std::map<int, int> renumber_rules();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, make this method a const.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not const.. it modifies the ruleset field in the mask

auto& pool = i.second;
int ruleno = pool.get_crush_rule();
if (!newcrush->rule_exists(ruleno)) {
if (ss)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ss is always valid, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y

if (newcrush->get_rule_mask_ruleset(ruleno) != ruleno) {
if (ss)
*ss << "rule " << ruleno << " mask ruleset does not match rule id";
return -EINVAL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also check

if (!newcrush->check_crush_rule(ruleno, pool.get_type(), pool.get_size(), *ss)) {
  return -EINVAL;
}

? so we can verify the type and size range of the rule are sane.

but we might need to update check_crush_rule(), so it prints out the ruleid in the error message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated = validate to make sure min/max size are ok

Copy link
Member

@gregsfortytwo gregsfortytwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't followed along in the earlier drafts here.

  • Have we already made it impossible for admins to have set multiple rules to a ruleset?
  • Is there anything we can do about existing ceph-external databases referencing rules by the wrong number, once renumbered? If we're just doing it automatically that seems a bit cruel (especially if we want to backport this to Luminous in a point release).


// Rewrite CRUSH rule IDs if they are using legacy "ruleset"
// structure. Only do this if there is no pending crush map or pool
// creations/deletions/updates and we don't have to worry about
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be any logic enforcing this constraint that we not have other stuff going on?

Copy link
Member Author

@liewegas liewegas Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, that comment is obsolete from when this code was elsewehre. it's now in create_pending(), so by definition there is nothing (yet) going on. fixed

@liewegas
Copy link
Member Author

liewegas commented Sep 7, 2017

Before we only fixed the rulesets in the trivial case; this patch does it for any cluster (by also fixing up pg_pool_t::crush_rule as needed). So now we will prevent multiple rules in a ruleset.

I think kraken had a weak guard to steer users away from doing that, but notably jewel allowed it.

What kind of external database do you have in mind? tendrl or something? The "good news" is that we've also changed the interface for setting this (ceph osd pool set $pool crush_rule ... instead of crush_ruleset), and the prevent any incoming crush map from doing anything untoward, so I don't think an unmodified external thing could accidentally do the wrong thing here (it would fail outright instead).

@liewegas
Copy link
Member Author

liewegas commented Sep 7, 2017

Also notably I guess 12.2.0 doesn't allow these to be created so this conversion won't trigger for earlier luminous folks.. it will only trigger for clusters that upgraded from an older version.

<< " fall within rule " << ruleno
<< " min_size " << newcrush->get_rule_mask_min_size(ruleno)
<< " and max_size " << newcrush->get_rule_mask_max_size(ruleno);
return -EINVAL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we check poo's type against rule's type as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, fixed!

@liewegas liewegas force-pushed the wip-crushrule-convert branch 2 times, most recently from 59ef476 to 5f4c8fc Compare September 7, 2017 19:08
@gregsfortytwo
Copy link
Member

Before we only fixed the rulesets in the trivial case; this patch does it for any cluster (by also fixing up pg_pool_t::crush_rule as needed). So now we will prevent multiple rules in a ruleset.

Do we know anything about the ordering on existing rules, though? Maybe there aren't actually any clusters with multiple rules in a ruleset, but if there are I presume this is not going to play nicely for them as we forcibly renumber stuff...?

What kind of external database do you have in mind? tendrl or something? The "good news" is that we've also changed the interface for setting this (ceph osd pool set $pool crush_rule ... instead of crush_ruleset), and the prevent any incoming crush map from doing anything untoward, so I don't think an unmodified external thing could accidentally do the wrong thing here (it would fail outright instead).

It doesn't need to be anything as sophisticated as tendrl if they've got cookbooks or playbooks that automatically set up new pools when hardware pods are added or something.

@liewegas
Copy link
Member Author

liewegas commented Sep 7, 2017

Can you explain what might not play nicely? We're renumbering the ruleset field to match the rule id, and then changing each pool to make sure it maps to the new ruleset (and ruleid) that it used before. No net change except the ruleset ids are different.

Any external tool that was fiddling with ruleset before will fail and need changes because the crush_ruleset pool property is now called crush_rule, so I don't think we need to worry about misbehavior there.

@gregsfortytwo
Copy link
Member

Ah okay, I don't think I don't actually know how the IDs associate with rules and rule sets. Was concerned we could go from having e.g. rules 1,3,5 to 1,2,3.
(I told you I hadn't followed along with the earlier crush work!)

<< osdmap.get_pool_name(pool_id) << " crush ruleset "
<< pool.crush_rule << " -> rule id " << new_rule_id << dendl;
pg_pool_t &updated = pending_inc.new_pools[pool_id];
updated = pool;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment still applies? https://github.com/ceph/ceph/pull/17255/files#r135267195

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably.. i was focusing just on the bit that I'd like to get backported to luminous.

John Spray added 2 commits September 14, 2017 12:30
Signed-off-by: John Spray <john.spray@redhat.com>
Previously, the rules were only modified in the trivial case,
so we continued to potentially have CRUSH maps with the
legacy ruleset functionality in use.

In order to ultimately remove rulesets entirely, we need
to do this more aggressively, renumbering all the rules
and then updating any pools as needed.

Signed-off-by: John Spray <john.spray@redhat.com>
- move into OSDMap method
- ensure that rules exist for each pool
- ensure pool type matches rule type
- ensure rule mask min/max size cover the pool size

Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas
Copy link
Member Author

I think this is ready to merge; final review please?

@tchaikov tchaikov self-requested a review September 16, 2017 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants