New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mon/osdmonitor: fix incorrect output of "osd df" due to osd out #10308
Conversation
@@ -796,14 +796,20 @@ class OSDUtilizationDumper : public CrushTreeDumper::Dumper<F> { | |||
|
|||
bool get_osd_utilization(int id, int64_t* kb, int64_t* kb_used, | |||
int64_t* kb_avail) const { | |||
// we always return true for OSDs as the specific OSDs can be out, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you be more specific in the commit message, "chaos" is sort of vague, imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if an OSD could be marked out, can we just check its state in get_bucket_utilization()
explicitly using osdmap.is_out(id)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my first thought.
But there is no reason we stop to process the whole bucket if any one of the OSDs under this bucket is abnormal because it is out or broken or something alike. And if it is, reset its stats will be fair enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But there is no reason we stop to process the whole bucket if any one of the OSDs under this bucket is abnormal
agreed. and could you be specific why checking osdmap.is_out(id)
in get_bucket_utilization()
does not work, or is a bad idea?
we are tackling two cases here:
- we should not stop calc a bucket's usage when running into an OSD gets marked out: in this case, i'd suggest check
osdmap.is_out(id)
explicitly. actually marking an OSD out will reset its stat to 0, seePGMonitor::check_osd_map()
andPGMap::apply_incremental()
- in other cases, i just want to be sure it's expected behavior, if it's not, i don't want to cover it up using this fix. if we destroy an OSD, its stat will be removed from
PGMap::osd_stat
, seePGMonitor::check_osd_map()
, in that case, why is its parent bucket still holding it?
if (p->second & CEPH_OSD_EXISTS) {
// whether it was created *or* destroyed, we can safely drop
// it's osd_stat_t record.
dout(10) << __func__ << " osd." << p->first
<< " created or destroyed" << dendl;
pending_inc.rm_stat(p->first);
// and adjust full, nearfull set
pg_map.nearfull_osds.erase(p->first);
pg_map.full_osds.erase(p->first);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you be specific why checking osdmap.is_out(id) in get_bucket_utilization() does not work, or is a bad idea?
@tchaikov That's not true. I think the is_out()
method shall work too(It is my first at all solution actually:-). But I have the following concerns too:
- I think https://github.com/ceph/ceph/pull/10308/files?diff=unified#diff-0a5db46a44ae9900e226289a810f10e8R803 is unreachable, as we have done sanity check at:
void dump_stray(F *f) {
for (int i = 0; i < osdmap->get_max_osd(); i++) {
if (osdmap->exists(i) && !this->is_touched(i))
dump_item(CrushTreeDumper::Item(i, 0, 0), f);
}
}
- I am afraid there is other possibilities that an OSD report zeroed stats, please check:
void OSDService::update_osd_stat(vector<int>& hb_peers)
{
Mutex::Locker lock(stat_lock);
osd_stat.hb_in.swap(hb_peers);
osd_stat.hb_out.clear();
osd->op_tracker.get_age_ms_histogram(&osd_stat.op_queue_age_hist);
// fill in osd stats too
struct store_statfs_t stbuf;
int r = osd->store->statfs(&stbuf);
if (r < 0) {
derr << "statfs() failed: " << cpp_strerror(r) << dendl;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid there is other possibilities that an OSD report zeroed stats, please check:
void OSDService::update_osd_stat(vector& hb_peers)
{
Mutex::Locker lock(stat_lock);osd_stat.hb_in.swap(hb_peers);
osd_stat.hb_out.clear();osd->op_tracker.get_age_ms_histogram(&osd_stat.op_queue_age_hist);
// fill in osd stats too
struct store_statfs_t stbuf;
int r = osd->store->statfs(&stbuf);
if (r < 0) {
derr << "statfs() failed: " << cpp_strerror(r) << dendl;
Sorry, not zeroed but stale OSD stats...
So I'll try in your way.
I'll give an elaborate explanation when I am back to work:) |
c025a5f
to
3d91ef4
Compare
The change is locally tested and verified, as below:
The bucket statistics can reflect the summarized statistics of all OSDs under correctly now even if one or more OSDs are offline and marked as out. |
49cf9f5
to
fdadce8
Compare
Upgraded. @tchaikov Can you look again? Thanks! |
@@ -796,6 +796,15 @@ class OSDUtilizationDumper : public CrushTreeDumper::Dumper<F> { | |||
|
|||
bool get_osd_utilization(int id, int64_t* kb, int64_t* kb_used, | |||
int64_t* kb_avail) const { | |||
if (osdmap->is_out(id)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do this in get_bucket_utilization()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, done.
fdadce8
to
ab685f7
Compare
if (id >= 0) { | ||
if (osdmap->is_out(id)) { | ||
// return true if current OSD is marked as out, | ||
// so we can continue to process other OSDs under this bucket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think this comment is necessary, as the code is self-explained.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
and @xiexingguo , could you please file a ticket on tracker and connect ab685f7 with it, so we can backport it to jewel ? |
If an osd is automatically marked as out, the output of "osd df" is not right, as follow: -5 10.00999 - 5586G 2989G 2596G 0 0 host ceph192-9-9-8 11 0.90999 1.00000 931G 542G 388G 58.25 0.99 osd.11 14 0.90999 1.00000 931G 530G 400G 56.97 0.97 osd.14 20 0.90999 1.00000 931G 716G 214G 76.99 1.31 osd.20 22 0.90999 1.00000 931G 477G 453G 51.29 0.87 osd.22 26 0.90999 0 0 0 0 0 0 osd.26 28 0.90999 1.00000 931G 587G 343G 63.09 1.07 osd.28 30 0.90999 1.00000 931G 602G 328G 64.75 1.10 osd.30 16 0.90999 1.00000 931G 589G 341G 63.34 1.08 osd.16 18 0.90999 1.00000 931G 530G 400G 56.93 0.97 osd.18 24 0.90999 1.00000 931G 202G 728G 21.77 0.37 osd.24 32 0.90999 1.00000 931G 477G 454G 51.23 0.87 osd.32 Two problems are identified from the above output: 1. the total capacity(total, total used, total avial) only includes osd.32, osd.24, osd.18, osd.16, osd.30, osd.28, and other healthy osds such as osd.11, osd.14 etc. are excluded. 2. the average utilization/deviation are forced resetted. Fixes: http://tracker.ceph.com/issues/16706 Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
a3a1292
to
1e47354
Compare
All addressed. Thanks @tchaikov |
lgtm |
retest this please |
@tchaikov Thanks for the testing and merging, kefu. |
If an osd is automatically marked as out, the output of "osd df"
is not right, as follow:
Two problems are identified from the above output:
only includes osd.32, osd.24, osd.18, osd.16, osd.30, osd.28, and other
healthy osds such as osd.11, osd.14 etc. are excluded.
Fixes: http://tracker.ceph.com/issues/16706