New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd/PG.cc: Optimistic estimation on PG.last_active #14799

Merged
merged 1 commit into from May 2, 2017

Conversation

Projects
None yet
3 participants
@xiaoxichen
Contributor

xiaoxichen commented Apr 26, 2017

PG may go through inactive state(such like peering) shortly during
state transition, user usually want to only get alter just when
PG is stucking in inactive for long enough time.

Such monitoring depends on PG.last_active > cutoff, but we will not
update PG.last_active if there is neither state change nor IO happened
on this PG. As a result, PG.last_active may lag behind a long time, in
idle cluster/pool.

This patch update the last_active filed to now(), when we first find it
as inactiv, as kind of optimistic estimation to solve the problem.

Signed-off-by: Xiaoxi Chen xiaoxchen@ebay.com

osd/PG.cc: Optimistic estimation on PG.last_active
PG may go through inactive state(such like peering) shortly during
state transition, user usually want to only get alter just when
PG is stucking in inactive for long enough time.

Such monitoring depends on PG.last_active > cutoff, but we will not
update PG.last_active if there is neither state change nor IO happened
on this PG. As a result, PG.last_active may lag behind a long time, in
idle cluster/pool.

This patch update the last_active filed to now(), when we first find it
as inactiv, as kind of optimistic estimation to solve the problem.

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>

@liewegas liewegas added the core label Apr 26, 2017

@liewegas

This comment has been minimized.

Member

liewegas commented Apr 26, 2017

Is this addressing the same problem as #14391 ? If so I think the other PR is a cleaner approach..

@xiaoxichen

This comment has been minimized.

Contributor

xiaoxichen commented Apr 27, 2017

@liewegas not really the same problem, my PR seems wider in target, it doesnt necessary to have acting_primary changed. But just if PG go through inactive state for whatever reason.

The problem we are facing on production is we have some inactive(rarely access) pools so PGs belongs to the pool doesnt have last_active update frequently. Thus OSD down usually trigger "PG stuck in inactive for more than X seconds" and the X could be large to tens of thousands . While "stuck in inactive " is a critical alter so such false alarm is annoying for on-call resources

@LiumxNL

This comment has been minimized.

Contributor

LiumxNL commented Apr 27, 2017

@xiaoxichen can we turn osd_pg_stat_report_interval_max down to a value less than mon_pg_stuck_threshold to resolve this problem?

@xiaoxichen

This comment has been minimized.

Contributor

xiaoxichen commented Apr 27, 2017

@LiumxNL No, we cannot. The PG::publish_stats_to_osd() will only be called either in read/write IO path or PG state transition. Silent(No IO) PG doesnt respect to such configuration.

Look at http://tracker.ceph.com/issues/14028 for more details.

@liewegas

This comment has been minimized.

Member

liewegas commented Apr 27, 2017

@xiaoxichen

This comment has been minimized.

Contributor

xiaoxichen commented Apr 27, 2017

@liewegas do you mean last_refresh(instead of last_update) ? it will be update on L2734 as the state changed.

Yeah it is good to have all PG stats on mgr and keeping some historical data. We do it now by pulling "pg dump" periodically , which is ugly though.

@xiaoxichen

This comment has been minimized.

Contributor

xiaoxichen commented May 1, 2017

@liewegas ping?

@liewegas liewegas merged commit 475daee into ceph:master May 2, 2017

4 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
arm build successfully built on arm
Details
default Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment