Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon/osdmonitor: fix incorrect output of "osd df" due to osd out #10308

Merged
merged 2 commits into from Jul 25, 2016

Conversation

xiexingguo
Copy link
Member

@xiexingguo xiexingguo commented Jul 15, 2016

If an osd is automatically marked as out, the output of "osd df"
is not right, as follow:

-5 10.00999        -  5586G  2989G  2596G     0    0     host ceph192-9-9-8
11  0.90999  1.00000   931G   542G   388G 58.25 0.99         osd.11
14  0.90999  1.00000   931G   530G   400G 56.97 0.97         osd.14
20  0.90999  1.00000   931G   716G   214G 76.99 1.31         osd.20
22  0.90999  1.00000   931G   477G   453G 51.29 0.87         osd.22
26  0.90999        0      0      0      0     0    0         osd.26
28  0.90999  1.00000   931G   587G   343G 63.09 1.07         osd.28
30  0.90999  1.00000   931G   602G   328G 64.75 1.10         osd.30
16  0.90999  1.00000   931G   589G   341G 63.34 1.08         osd.16
18  0.90999  1.00000   931G   530G   400G 56.93 0.97         osd.18
24  0.90999  1.00000   931G   202G   728G 21.77 0.37         osd.24
32  0.90999  1.00000   931G   477G   454G 51.23 0.87         osd.32

Two problems are identified from the above output:

  1. the total capacity(total, total used, total avial)
    only includes osd.32, osd.24, osd.18, osd.16, osd.30, osd.28, and other
    healthy osds such as osd.11, osd.14 etc. are excluded.
  2. the average utilization/deviation are forced resetted.

Fixes: http://tracker.ceph.com/issues/16706

@@ -796,14 +796,20 @@ class OSDUtilizationDumper : public CrushTreeDumper::Dumper<F> {

bool get_osd_utilization(int id, int64_t* kb, int64_t* kb_used,
int64_t* kb_avail) const {
// we always return true for OSDs as the specific OSDs can be out,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you be more specific in the commit message, "chaos" is sort of vague, imho.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if an OSD could be marked out, can we just check its state in get_bucket_utilization() explicitly using osdmap.is_out(id)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my first thought.

But there is no reason we stop to process the whole bucket if any one of the OSDs under this bucket is abnormal because it is out or broken or something alike. And if it is, reset its stats will be fair enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there is no reason we stop to process the whole bucket if any one of the OSDs under this bucket is abnormal

agreed. and could you be specific why checking osdmap.is_out(id) in get_bucket_utilization() does not work, or is a bad idea?

we are tackling two cases here:

  1. we should not stop calc a bucket's usage when running into an OSD gets marked out: in this case, i'd suggest check osdmap.is_out(id) explicitly. actually marking an OSD out will reset its stat to 0, see PGMonitor::check_osd_map() and PGMap::apply_incremental()
  2. in other cases, i just want to be sure it's expected behavior, if it's not, i don't want to cover it up using this fix. if we destroy an OSD, its stat will be removed from PGMap::osd_stat, see PGMonitor::check_osd_map(), in that case, why is its parent bucket still holding it?
      if (p->second & CEPH_OSD_EXISTS) {
    // whether it was created *or* destroyed, we can safely drop
    // it's osd_stat_t record.
    dout(10) << __func__ << "  osd." << p->first
             << " created or destroyed" << dendl;
    pending_inc.rm_stat(p->first);

    // and adjust full, nearfull set
    pg_map.nearfull_osds.erase(p->first);
    pg_map.full_osds.erase(p->first);
      }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you be specific why checking osdmap.is_out(id) in get_bucket_utilization() does not work, or is a bad idea?

@tchaikov That's not true. I think the is_out() method shall work too(It is my first at all solution actually:-). But I have the following concerns too:

  void dump_stray(F *f) {
    for (int i = 0; i < osdmap->get_max_osd(); i++) {
      if (osdmap->exists(i) && !this->is_touched(i))
        dump_item(CrushTreeDumper::Item(i, 0, 0), f);
    }
  }
  • I am afraid there is other possibilities that an OSD report zeroed stats, please check:
void OSDService::update_osd_stat(vector<int>& hb_peers)
{
  Mutex::Locker lock(stat_lock);

  osd_stat.hb_in.swap(hb_peers);
  osd_stat.hb_out.clear();

  osd->op_tracker.get_age_ms_histogram(&osd_stat.op_queue_age_hist);

  // fill in osd stats too
  struct store_statfs_t stbuf;
  int r = osd->store->statfs(&stbuf);
  if (r < 0) {
    derr << "statfs() failed: " << cpp_strerror(r) << dendl;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid there is other possibilities that an OSD report zeroed stats, please check:

void OSDService::update_osd_stat(vector& hb_peers)
{
Mutex::Locker lock(stat_lock);

osd_stat.hb_in.swap(hb_peers);
osd_stat.hb_out.clear();

osd->op_tracker.get_age_ms_histogram(&osd_stat.op_queue_age_hist);

// fill in osd stats too
struct store_statfs_t stbuf;
int r = osd->store->statfs(&stbuf);
if (r < 0) {
derr << "statfs() failed: " << cpp_strerror(r) << dendl;

Sorry, not zeroed but stale OSD stats...
So I'll try in your way.

@xiexingguo
Copy link
Member Author

xiexingguo commented Jul 15, 2016

I'll give an elaborate explanation when I am back to work:)

@xiexingguo
Copy link
Member Author

xiexingguo commented Jul 18, 2016

The change is locally tested and verified, as below:

-9 3.42000        -  2937G   135M  2937G 0.00  0.58     host ceph192-9-9-236     
 3 0.53999  1.00000   548G 35064k   548G 0.01  0.78         osd.3                
 5 0.53999  1.00000   548G 34560k   548G 0.01  0.77         osd.5                
 6 0.53999        0      0      0      0    0     0         osd.6                
 7 0.89999  1.00000   920G 34384k   920G 0.00  0.46         osd.7                
 4 0.89999  1.00000   920G 34408k   920G 0.00  0.46         osd.4                
              TOTAL  2996G   239M  2996G 0.01  

The bucket statistics can reflect the summarized statistics of all OSDs under correctly now even if one or more OSDs are offline and marked as out.

@xiexingguo xiexingguo force-pushed the xxg-wip-fix-osd-df branch 2 times, most recently from 49cf9f5 to fdadce8 Compare July 18, 2016 06:04
@xiexingguo
Copy link
Member Author

Upgraded. @tchaikov Can you look again? Thanks!

@@ -796,6 +796,15 @@ class OSDUtilizationDumper : public CrushTreeDumper::Dumper<F> {

bool get_osd_utilization(int id, int64_t* kb, int64_t* kb_used,
int64_t* kb_avail) const {
if (osdmap->is_out(id)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do this in get_bucket_utilization()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done.

if (id >= 0) {
if (osdmap->is_out(id)) {
// return true if current OSD is marked as out,
// so we can continue to process other OSDs under this bucket.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this comment is necessary, as the code is self-explained.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
@tchaikov
Copy link
Contributor

and @xiexingguo , could you please file a ticket on tracker and connect ab685f7 with it, so we can backport it to jewel ?

If an osd is automatically marked as out, the output of "osd df"
is not right, as follow:

-5 10.00999        -  5586G  2989G  2596G     0    0     host ceph192-9-9-8
11  0.90999  1.00000   931G   542G   388G 58.25 0.99         osd.11
14  0.90999  1.00000   931G   530G   400G 56.97 0.97         osd.14
20  0.90999  1.00000   931G   716G   214G 76.99 1.31         osd.20
22  0.90999  1.00000   931G   477G   453G 51.29 0.87         osd.22
26  0.90999        0      0      0      0     0    0         osd.26
28  0.90999  1.00000   931G   587G   343G 63.09 1.07         osd.28
30  0.90999  1.00000   931G   602G   328G 64.75 1.10         osd.30
16  0.90999  1.00000   931G   589G   341G 63.34 1.08         osd.16
18  0.90999  1.00000   931G   530G   400G 56.93 0.97         osd.18
24  0.90999  1.00000   931G   202G   728G 21.77 0.37         osd.24
32  0.90999  1.00000   931G   477G   454G 51.23 0.87         osd.32

Two problems are identified from the above output:

1. the total capacity(total, total used, total avial)
only includes osd.32, osd.24, osd.18, osd.16, osd.30, osd.28, and other
healthy osds such as osd.11, osd.14 etc. are excluded.

2. the average utilization/deviation are forced resetted.

Fixes: http://tracker.ceph.com/issues/16706
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
@xiexingguo xiexingguo force-pushed the xxg-wip-fix-osd-df branch 2 times, most recently from a3a1292 to 1e47354 Compare July 18, 2016 10:51
@xiexingguo
Copy link
Member Author

All addressed. Thanks @tchaikov

@tchaikov tchaikov self-assigned this Jul 18, 2016
@tchaikov
Copy link
Contributor

lgtm

@xiexingguo
Copy link
Member Author

retest this please

@tchaikov tchaikov merged commit 88c604d into ceph:master Jul 25, 2016
@xiexingguo
Copy link
Member Author

@tchaikov Thanks for the testing and merging, kefu.

@xiexingguo xiexingguo deleted the xxg-wip-fix-osd-df branch July 25, 2016 05:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants