New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mon/OSDMonitor: implement cluster pg limit #17427
Conversation
src/common/options.cc
Outdated
@@ -1477,6 +1477,11 @@ std::vector<Option> get_global_options() { | |||
.set_default(65536) | |||
.set_description(""), | |||
|
|||
Option("mon_max_pool_pg_num_per_osd", Option::TYPE_INT, Option::LEVEL_ADVANCED) | |||
.set_default(100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems low - could make it 300 to match the warning on existing pools mon_pg_warn_max_per_osd
... maybe just use that option for this check instead of adding a new one
I guess the other place PGs per OSD could change would be when changing pool size - could add the same check there |
Yeah... it kind of seems like it's better to let them back around this limitation by increasing pool size than it is to make it hard to increase the replica count |
e2c934c
to
a38b7fa
Compare
src/mon/OSDMonitor.cc
Outdated
int OSDMonitor::check_pg_num(int pg_num, int size, ostream *ss) | ||
{ | ||
int64_t max_pgs_per_osd = g_conf->mon_pg_warn_max_per_osd; | ||
int64_t max_pgs = max_pgs_per_osd * osdmap.get_num_osds(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
counting only up+in osds might be useful if a bunch of osds are not running but not removed from crush
src/mon/OSDMonitor.cc
Outdated
int64_t max_pgs_per_osd = g_conf->mon_pg_warn_max_per_osd; | ||
int64_t max_pgs = max_pgs_per_osd * osdmap.get_num_osds(); | ||
int64_t max_pg_num = MAX(1, max_pgs / size); | ||
if (pg_num > max_pg_num) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we check total PGs, rather than just those in this pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might get confusing with multiple crush roots, I guess we could account for those as well...
pool size changes are a lot less likely too, so I'm ok without the check there |
a38b7fa
to
45d02b2
Compare
@jdurgin updated to look at total pgs and to also guard pool size changes! |
src/mon/OSDMonitor.cc
Outdated
@@ -6052,6 +6086,10 @@ int OSDMonitor::prepare_command_pool_set(map<string,cmd_vartype> &cmdmap, | |||
ss << "pool size must be between 1 and 10"; | |||
return -EINVAL; | |||
} | |||
int r = check_pg_num(-1, p.get_pg_num(), n, &ss); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pool instead of -1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh it doesn't affect the result though, nvm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yes it does, since we pass the new size in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/mon/OSDMonitor.cc
Outdated
int OSDMonitor::check_pg_num(int64_t pool, int pg_num, int size, ostream *ss) | ||
{ | ||
int64_t max_pgs_per_osd = g_conf->mon_pg_warn_max_per_osd; | ||
int64_t max_pgs = max_pgs_per_osd * osdmap.get_num_in_osds(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the 'make check' result suggests we should use MAX(num_in_osds, 1).
for the mon_pg_warn_max_per_osd == 0 case, which someone might have used to disable the warning, maybe we should default to 200 instead, since there's a nice message explaining it if that limit is hit here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used max with 3, since that seems like the minimum useful cluster size. (And it fixes the standalone tests, which create pools before any osds exist). Sound ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we can do that. In that case I think we should rename the config option to be more like mon_max_pgs_per_osd (remove the 'warn' part)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, 3 seems fine, and renaming the config option is a good idea too.
b4fdedc
to
9fe34de
Compare
@@ -346,6 +346,7 @@ class OSDMonitor : public PaxosService { | |||
const string &erasure_code_profile, | |||
unsigned *stripe_width, | |||
ostream *ss); | |||
int check_pg_num(int64_t pool, int pg_num, int size, ostream* ss); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mark this method const
if you please.
tests out okay. i think we need to decide what the final config option is going to be since this'll get backported? or should we keep the current config option for luminous? |
This is 2x the recommended target (100 per OSD). Signed-off-by: Sage Weil <sage@redhat.com>
Check total pg count for the cluster vs osd count and max pgs per osd before allowing pool creation, pg_num change, or pool size change. "in" OSDs are the ones we distribute data too, so this should be the right count to use. (Whether they happen to be up or down at the moment is incidental.) If the user really wants to create the pool, they can change the configurable limit. Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
This runs afoul of the new max pg per osd limit. Signed-off-by: Sage Weil <sage@redhat.com>
4028af5
to
c9ffeee
Compare
Signed-off-by: Sage Weil <sage@redhat.com>
Fiddling with pgp_num doesn't help with TOO_MANY_PGS. Signed-off-by: Sage Weil <sage@redhat.com>
backport #17814 |
Prevent total pg count from exceeding a max per osd value. Guard pg create, pg_num change, and size change.