Join GitHub today
osd/PG.cc: Optimistic estimation on PG.last_active #14799
PG may go through inactive state(such like peering) shortly during
Such monitoring depends on PG.last_active > cutoff, but we will not
This patch update the last_active filed to now(), when we first find it
Signed-off-by: Xiaoxi Chen firstname.lastname@example.org
@liewegas not really the same problem, my PR seems wider in target, it doesnt necessary to have acting_primary changed. But just if PG go through inactive state for whatever reason.
The problem we are facing on production is we have some inactive(rarely access) pools so PGs belongs to the pool doesnt have last_active update frequently. Thus OSD down usually trigger "PG stuck in inactive for more than X seconds" and the X could be large to tens of thousands . While "stuck in inactive " is a critical alter so such false alarm is annoying for on-call resources
@liewegas do you mean last_refresh(instead of last_update) ? it will be update on L2734 as the state changed.
Yeah it is good to have all PG stats on mgr and keeping some historical data. We do it now by pulling "pg dump" periodically , which is ugly though.