Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon/PGMonitor: fix wrongly report "pg stuck in inactive" #14391

Merged
merged 1 commit into from Apr 28, 2017

Conversation

Projects
None yet
3 participants
@LiumxNL
Copy link
Contributor

LiumxNL commented Apr 7, 2017

Signed-off-by: Mingxin Liu mingxin@xsky.com

@liewegas liewegas changed the title PGMonitor: fix wrongly report "pg stuck in inactive" mon/PGMonitor: fix wrongly report "pg stuck in inactive" Apr 8, 2017

// mater much while faulty last_active will make "pg stuck in" check unhappy.
if ((s.acting_primary != n.acting_primary) &&
!(n.state & (PG_STATE_ACTIVE | PG_STATE_PEERED)))
n.last_active = s.last_active;

This comment has been minimized.

Copy link
@liewegas

liewegas Apr 8, 2017

Member

I'm confused by this. Is the bug that the OSD is changing last_active even though the pg state isn't actually active?

This comment has been minimized.

Copy link
@tchaikov

tchaikov Apr 10, 2017

Contributor

LiumxNL neither do i understand this fix. if the bug is that the PG is still peering and not active, its last_active should not be updated. see PG::publish_stats_to_osd(), the last_active is set only if the new state is PG_STATE_ACTIVE.

This comment has been minimized.

Copy link
@LiumxNL

LiumxNL Apr 12, 2017

Author Contributor

@liewegas @tchaikov yep, changing last_active happens at publish_stats_to_osd() if state is PG_STATE_ACTIVE or at PG creating. but, suppose that, primary OSD was down for some reason,PG should peering -> active, continuously updating last_active by new primary, then the old primary OSD restored within few minutes,whose pglog catch up the tail, this OSD will be selected as primary again soon, before its statemachine goes to GotInfo, its pgstats remain stale, these stats may be sent to Monitor and misleading it, so we should handle this.

This comment has been minimized.

Copy link
@liewegas

liewegas Apr 25, 2017

Member

I see, this makes sense to me now, thanks! There are probably other fields that might go here too besides last_active, but we can start with this.

@LiumxNL LiumxNL force-pushed the LiumxNL:wip-170407 branch from 54994a5 to b69bb9d Apr 14, 2017

// mater much while faulty last_active will make "pg stuck in" check unhappy.
if ((s.acting_primary != n.acting_primary) &&
!(n.state & (PG_STATE_ACTIVE | PG_STATE_PEERED)))
n.last_active = s.last_active;

This comment has been minimized.

Copy link
@liewegas

liewegas Apr 25, 2017

Member

I see, this makes sense to me now, thanks! There are probably other fields that might go here too besides last_active, but we can start with this.

@liewegas liewegas added the needs-qa label Apr 25, 2017

PGMonitor: fix wrongly report "pg stuck in inactive"
Signed-off-by: Mingxin Liu <mingxin@xsky.com>

@LiumxNL LiumxNL force-pushed the LiumxNL:wip-170407 branch from b69bb9d to 77b0ff4 Apr 26, 2017

@LiumxNL

This comment has been minimized.

Copy link
Contributor Author

LiumxNL commented Apr 26, 2017

@liewegas thank you! and i updated the commit, previous one is deficient.

@LiumxNL

This comment has been minimized.

Copy link
Contributor Author

LiumxNL commented Apr 26, 2017

retest this please

@liewegas liewegas merged commit e9fb6a0 into ceph:master Apr 28, 2017

4 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
arm build successfully built on arm
Details
default Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.