Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
mimic: rgw: renew resharding locks to prevent expiration #24899
rgw: renew resharding locks to prevent expiration
This is a mimic backport of #24406.
During resharding sometimes the log would expire not allowing resharding to complete. In the case of dynamic resharding, the resharding process would re-start, run into the same issue, and thus fail repeatedly. This combined with another issue (http://tracker.ceph.com/issues/34307) would cause a lot of incomplete bucket index shards to be left behind.
This addresses the issue in a couple of ways. First a new type of lock semantics was added to CLS locks. Currently when the lock is created it can renew an existing lock, but it would also create a new lock if it did not already have a lock. This made it impossible to know if the lock was continuously held. The new type of semantics -- MUST_RENEW -- will only succeed if the lock is held when called.
The RGWBucketReshard::do_reshard function now renews the lock when it's used half the time allotted for the lock. Furthermore, an optional callback can be passed in so it can renew locks from callers. For example, during dynamic resharding, a lock is also held on the logshards object, so that lock can also be renewed regularly.
Since the clock's now function is called repeatedly, for efficiency the ceph::coarse_mono_clock is now used.
Because the objects on which reshard locks are taken only exist to support the locks, an exclusive_ephemeral type of lock is added that removes the object when the lock is unlocked.
Other refactoring was done to allow the code that waits for resharding to complete to detect a failed reshard and restore flags allowing waiting operations to complete.