mon/osdmonitor: fix incorrect output of "osd df" due to osd out #10308

xiexingguo · 2016-07-15T09:00:56Z

If an osd is automatically marked as out, the output of "osd df"
is not right, as follow:

-5 10.00999        -  5586G  2989G  2596G     0    0     host ceph192-9-9-8
11  0.90999  1.00000   931G   542G   388G 58.25 0.99         osd.11
14  0.90999  1.00000   931G   530G   400G 56.97 0.97         osd.14
20  0.90999  1.00000   931G   716G   214G 76.99 1.31         osd.20
22  0.90999  1.00000   931G   477G   453G 51.29 0.87         osd.22
26  0.90999        0      0      0      0     0    0         osd.26
28  0.90999  1.00000   931G   587G   343G 63.09 1.07         osd.28
30  0.90999  1.00000   931G   602G   328G 64.75 1.10         osd.30
16  0.90999  1.00000   931G   589G   341G 63.34 1.08         osd.16
18  0.90999  1.00000   931G   530G   400G 56.93 0.97         osd.18
24  0.90999  1.00000   931G   202G   728G 21.77 0.37         osd.24
32  0.90999  1.00000   931G   477G   454G 51.23 0.87         osd.32

Two problems are identified from the above output:

the total capacity(total, total used, total avial)
only includes osd.32, osd.24, osd.18, osd.16, osd.30, osd.28, and other
healthy osds such as osd.11, osd.14 etc. are excluded.
the average utilization/deviation are forced resetted.

Fixes: http://tracker.ceph.com/issues/16706

tchaikov · 2016-07-15T10:19:50Z

src/mon/OSDMonitor.cc

@@ -796,14 +796,20 @@ class OSDUtilizationDumper : public CrushTreeDumper::Dumper<F> {

  bool get_osd_utilization(int id, int64_t* kb, int64_t* kb_used,
 			   int64_t* kb_avail) const {
+    // we always return true for OSDs as the specific OSDs can be out,


could you be more specific in the commit message, "chaos" is sort of vague, imho.

if an OSD could be marked out, can we just check its state in get_bucket_utilization() explicitly using osdmap.is_out(id)?

This is my first thought.

But there is no reason we stop to process the whole bucket if any one of the OSDs under this bucket is abnormal because it is out or broken or something alike. And if it is, reset its stats will be fair enough.

But there is no reason we stop to process the whole bucket if any one of the OSDs under this bucket is abnormal

agreed. and could you be specific why checking osdmap.is_out(id) in get_bucket_utilization() does not work, or is a bad idea?

we are tackling two cases here:

we should not stop calc a bucket's usage when running into an OSD gets marked out: in this case, i'd suggest check osdmap.is_out(id) explicitly. actually marking an OSD out will reset its stat to 0, see PGMonitor::check_osd_map() and PGMap::apply_incremental()

in other cases, i just want to be sure it's expected behavior, if it's not, i don't want to cover it up using this fix. if we destroy an OSD, its stat will be removed from PGMap::osd_stat, see PGMonitor::check_osd_map(), in that case, why is its parent bucket still holding it?

if (p->second & CEPH_OSD_EXISTS) { // whether it was created *or* destroyed, we can safely drop // it's osd_stat_t record. dout(10) << __func__ << " osd." << p->first << " created or destroyed" << dendl; pending_inc.rm_stat(p->first); // and adjust full, nearfull set pg_map.nearfull_osds.erase(p->first); pg_map.full_osds.erase(p->first); }

could you be specific why checking osdmap.is_out(id) in get_bucket_utilization() does not work, or is a bad idea?

@tchaikov That's not true. I think the is_out() method shall work too(It is my first at all solution actually:-). But I have the following concerns too:

I think https://github.com/ceph/ceph/pull/10308/files?diff=unified#diff-0a5db46a44ae9900e226289a810f10e8R803 is unreachable, as we have done sanity check at:

void dump_stray(F *f) { for (int i = 0; i < osdmap->get_max_osd(); i++) { if (osdmap->exists(i) && !this->is_touched(i)) dump_item(CrushTreeDumper::Item(i, 0, 0), f); } }

I am afraid there is other possibilities that an OSD report zeroed stats, please check:

void OSDService::update_osd_stat(vector<int>& hb_peers) { Mutex::Locker lock(stat_lock); osd_stat.hb_in.swap(hb_peers); osd_stat.hb_out.clear(); osd->op_tracker.get_age_ms_histogram(&osd_stat.op_queue_age_hist); // fill in osd stats too struct store_statfs_t stbuf; int r = osd->store->statfs(&stbuf); if (r < 0) { derr << "statfs() failed: " << cpp_strerror(r) << dendl;

I am afraid there is other possibilities that an OSD report zeroed stats, please check:

void OSDService::update_osd_stat(vector& hb_peers)
{
Mutex::Locker lock(stat_lock);

osd_stat.hb_in.swap(hb_peers);
osd_stat.hb_out.clear();

osd->op_tracker.get_age_ms_histogram(&osd_stat.op_queue_age_hist);

// fill in osd stats too
struct store_statfs_t stbuf;
int r = osd->store->statfs(&stbuf);
if (r < 0) {
derr << "statfs() failed: " << cpp_strerror(r) << dendl;

Sorry, not zeroed but stale OSD stats...
So I'll try in your way.

xiexingguo · 2016-07-15T11:48:35Z

I'll give an elaborate explanation when I am back to work:)

xiexingguo · 2016-07-18T00:36:09Z

The change is locally tested and verified, as below:

-9 3.42000        -  2937G   135M  2937G 0.00  0.58     host ceph192-9-9-236     
 3 0.53999  1.00000   548G 35064k   548G 0.01  0.78         osd.3                
 5 0.53999  1.00000   548G 34560k   548G 0.01  0.77         osd.5                
 6 0.53999        0      0      0      0    0     0         osd.6                
 7 0.89999  1.00000   920G 34384k   920G 0.00  0.46         osd.7                
 4 0.89999  1.00000   920G 34408k   920G 0.00  0.46         osd.4                
              TOTAL  2996G   239M  2996G 0.01

The bucket statistics can reflect the summarized statistics of all OSDs under correctly now even if one or more OSDs are offline and marked as out.

xiexingguo · 2016-07-18T06:24:41Z

Upgraded. @tchaikov Can you look again? Thanks!

tchaikov · 2016-07-18T09:55:52Z

src/mon/OSDMonitor.cc

@@ -796,6 +796,15 @@ class OSDUtilizationDumper : public CrushTreeDumper::Dumper<F> {

  bool get_osd_utilization(int id, int64_t* kb, int64_t* kb_used,
 			   int64_t* kb_avail) const {
+    if (osdmap->is_out(id)) {


can we do this in get_bucket_utilization()?

sure, done.

tchaikov · 2016-07-18T10:41:04Z

src/mon/OSDMonitor.cc

+    if (id >= 0) {
+      if (osdmap->is_out(id)) {
+        // return true if current OSD is marked as out,
+        // so we can continue to process other OSDs under this bucket.


i don't think this comment is necessary, as the code is self-explained.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

tchaikov · 2016-07-18T10:46:30Z

and @xiexingguo , could you please file a ticket on tracker and connect ab685f7 with it, so we can backport it to jewel ?

If an osd is automatically marked as out, the output of "osd df" is not right, as follow: -5 10.00999 - 5586G 2989G 2596G 0 0 host ceph192-9-9-8 11 0.90999 1.00000 931G 542G 388G 58.25 0.99 osd.11 14 0.90999 1.00000 931G 530G 400G 56.97 0.97 osd.14 20 0.90999 1.00000 931G 716G 214G 76.99 1.31 osd.20 22 0.90999 1.00000 931G 477G 453G 51.29 0.87 osd.22 26 0.90999 0 0 0 0 0 0 osd.26 28 0.90999 1.00000 931G 587G 343G 63.09 1.07 osd.28 30 0.90999 1.00000 931G 602G 328G 64.75 1.10 osd.30 16 0.90999 1.00000 931G 589G 341G 63.34 1.08 osd.16 18 0.90999 1.00000 931G 530G 400G 56.93 0.97 osd.18 24 0.90999 1.00000 931G 202G 728G 21.77 0.37 osd.24 32 0.90999 1.00000 931G 477G 454G 51.23 0.87 osd.32 Two problems are identified from the above output: 1. the total capacity(total, total used, total avial) only includes osd.32, osd.24, osd.18, osd.16, osd.30, osd.28, and other healthy osds such as osd.11, osd.14 etc. are excluded. 2. the average utilization/deviation are forced resetted. Fixes: http://tracker.ceph.com/issues/16706 Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo · 2016-07-18T10:53:06Z

All addressed. Thanks @tchaikov

tchaikov · 2016-07-18T14:51:28Z

lgtm

xiexingguo · 2016-07-21T22:40:29Z

retest this please

tchaikov · 2016-07-25T04:44:11Z

tested at

xiexingguo · 2016-07-25T05:43:03Z

@tchaikov Thanks for the testing and merging, kefu.

tchaikov reviewed Jul 15, 2016
View reviewed changes

tchaikov added bug-fix core labels Jul 15, 2016

xiexingguo force-pushed the xxg-wip-fix-osd-df branch from c025a5f to 3d91ef4 Compare July 18, 2016 00:26

xiexingguo force-pushed the xxg-wip-fix-osd-df branch 2 times, most recently from 49cf9f5 to fdadce8 Compare July 18, 2016 06:04

tchaikov reviewed Jul 18, 2016
View reviewed changes

xiexingguo force-pushed the xxg-wip-fix-osd-df branch from fdadce8 to ab685f7 Compare July 18, 2016 10:19

tchaikov reviewed Jul 18, 2016
View reviewed changes

mon/osdmonitor: initialize local variable "kb_avail_i"

98f50ce

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo force-pushed the xxg-wip-fix-osd-df branch 2 times, most recently from a3a1292 to 1e47354 Compare July 18, 2016 10:51

tchaikov self-assigned this Jul 18, 2016

tchaikov added the needs-qa label Jul 18, 2016

tchaikov added the wip-kefu-testing label Jul 25, 2016

tchaikov merged commit 88c604d into ceph:master Jul 25, 2016

xiexingguo deleted the xxg-wip-fix-osd-df branch July 25, 2016 05:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mon/osdmonitor: fix incorrect output of "osd df" due to osd out #10308

mon/osdmonitor: fix incorrect output of "osd df" due to osd out #10308

xiexingguo commented Jul 15, 2016 •

edited

tchaikov Jul 15, 2016

xiexingguo Jul 18, 2016

tchaikov Jul 18, 2016

xiexingguo Jul 18, 2016 •

edited

tchaikov Jul 18, 2016

xiexingguo Jul 18, 2016

xiexingguo Jul 18, 2016

xiexingguo commented Jul 15, 2016 •

edited

xiexingguo commented Jul 18, 2016 •

edited

xiexingguo commented Jul 18, 2016

tchaikov Jul 18, 2016

xiexingguo Jul 18, 2016

tchaikov Jul 18, 2016

tchaikov commented Jul 18, 2016

xiexingguo commented Jul 18, 2016

tchaikov commented Jul 18, 2016

xiexingguo commented Jul 21, 2016

tchaikov commented Jul 25, 2016

xiexingguo commented Jul 25, 2016

mon/osdmonitor: fix incorrect output of "osd df" due to osd out #10308

mon/osdmonitor: fix incorrect output of "osd df" due to osd out #10308

Conversation

xiexingguo commented Jul 15, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiexingguo Jul 18, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiexingguo commented Jul 15, 2016 • edited

xiexingguo commented Jul 18, 2016 • edited

xiexingguo commented Jul 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tchaikov commented Jul 18, 2016

xiexingguo commented Jul 18, 2016

tchaikov commented Jul 18, 2016

xiexingguo commented Jul 21, 2016

tchaikov commented Jul 25, 2016

xiexingguo commented Jul 25, 2016

xiexingguo commented Jul 15, 2016 •

edited

xiexingguo Jul 18, 2016 •

edited

xiexingguo commented Jul 15, 2016 •

edited

xiexingguo commented Jul 18, 2016 •

edited