Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon: don't set last_osd_report when the pg stats msg is ignored #12975

Merged
merged 1 commit into from Jan 22, 2017

Conversation

wonzhq
Copy link
Contributor

@wonzhq wonzhq commented Jan 18, 2017

In some cases, this may lead to mon wrongly marking an osd down
because of no pg stats after a specified time period.

Signed-off-by: Zhiqiang Wang zhiqiang@xsky.com

In some cases, this may lead to mon wrongly marking an osd down
because of no pg stats after a specified time period.

Signed-off-by: Zhiqiang Wang <zhiqiang@xsky.com>
if (!stats->get_orig_source().is_osd() ||
!mon->osdmon()->osdmap.is_up(from) ||
stats->get_orig_source_inst() != mon->osdmon()->osdmap.get_inst(from)) {
dout(1) << " ignoring stats from non-active osd." << dendl;
return false;
}

last_osd_report[from] = ceph_clock_now();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases

@wonzhq what is the case exactly? for example,

  1. an osd is marked down by monitor and then
  2. we received a straying pg stat message from it
  3. last_osd_report is marked with the time stamp.
  4. after a while, a new osd joined in, and monitor assigned it the osd id of the previously marked down osd, // but the leader calls check_osd_map() after the map is committed, and check_osd_map() will clear the last_osd_report for that osd.
  5. in the tick() the newly joined osd could be wrongly mark down? (but i doubt. see above)

Copy link
Contributor Author

@wonzhq wonzhq Jan 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_osd_map could return earlier without clearing last_osd_report if osdmap is not readable or pgmap is not writeable. This is a long time ago fixed bug, and the log file has been removed. I can't verify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wonzhq okay, just wanted to understand if we need to backport this fix or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov sure :)

@liewegas liewegas merged commit 5dccac8 into ceph:master Jan 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants