Join GitHub today
qa/tasks: assert on pg status with a timeout #14608
I think this is a band-aid. The problem is the stats delay (mon stats also lag a bit too). The problem is that there are weird implicit timing assumptoins in the tests that once one condition is true (e.g., wait_for_healthy) then the pg stats will reflect that. It used to be true because both conditions were coming out of the mon, but now the mon and mgr info is disconnected, and we get occasionaly failures due to that.
We also have the problem that an ill-timed mgr failure-over may kick the pg stats back in time a bit too.
I'm worried that we need to sort these cases out individually by fixing the tests. I'm also concerned that this function is called from lots of placing and making it sleep 10s every time is going to be a problem...
@jcsp i think we failed to schedule a scrub because the "last_scrub_stamp" returned by the first and the second "pg dump" calls in