New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimic: rgw: renew resharding locks to prevent expiration #24899

Merged
merged 9 commits into from Nov 20, 2018

Conversation

@ivancich
Copy link
Member

ivancich commented Nov 2, 2018

http://tracker.ceph.com/issues/36687


rgw: renew resharding locks to prevent expiration

This is a mimic backport of #24406.

During resharding sometimes the log would expire not allowing resharding to complete. In the case of dynamic resharding, the resharding process would re-start, run into the same issue, and thus fail repeatedly. This combined with another issue (http://tracker.ceph.com/issues/34307) would cause a lot of incomplete bucket index shards to be left behind.

This addresses the issue in a couple of ways. First a new type of lock semantics was added to CLS locks. Currently when the lock is created it can renew an existing lock, but it would also create a new lock if it did not already have a lock. This made it impossible to know if the lock was continuously held. The new type of semantics -- MUST_RENEW -- will only succeed if the lock is held when called.

The RGWBucketReshard::do_reshard function now renews the lock when it's used half the time allotted for the lock. Furthermore, an optional callback can be passed in so it can renew locks from callers. For example, during dynamic resharding, a lock is also held on the logshards object, so that lock can also be renewed regularly.

Since the clock's now function is called repeatedly, for efficiency the ceph::coarse_mono_clock is now used.

Because the objects on which reshard locks are taken only exist to support the locks, an exclusive_ephemeral type of lock is added that removes the object when the lock is unlocked.

Other refactoring was done to allow the code that waits for resharding to complete to detect a failed reshard and restore flags allowing waiting operations to complete.

Fixes: http://tracker.ceph.com/issues/27219

Orit Wasserman and others added some commits Sep 21, 2018

rgw: renew resharding lock during bucket resharding
Signed-off-by: Orit Wasserman <owasserm@owasserm.redhat.com>
(cherry picked from commit 32d8597)
rgw: use the same lock when resharding
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
(cherry picked from commit 173bfc8)
cls: add semantics for cls locks to require renewal without expiring
Add ability to *require* renewal of an existing lock in addition
toexisting ability to *allow* renewal of an existing lock. The key
difference is that a MUST_RENEW will fail if the lock has expired
(where a MAY_RENEW) will succeed. This provides calling code with the
ability to verify that a lock is held continually and that it was
never lost/expired.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit 479c909)
rgw: renew resharding locks to prevent expiration
Fix lock expiration problem with resharding. The resharding process
will renew its bucket lock (and logshard lock if necessary) when half
the remaining time is left on the lock. If the lock is expired and
cannot renew the process fails and errors out appropriately.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit 8cebffa)
cls: add exclusive ephemeral locks that auto-clean
Add a new type of cls lock -- exclusive ephemeral for which the
object only exists to represent the lock and for which the object
should be deleted at unlock. This is to prevent the accumulation of
unneeded objects in the cluster by automatically cleaning them up.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit a289f2d)
rgw: change the bucket reshard lock to exclusive-ephemeral
The bucket reshard lock was simply an exclusive lock that existed on
an object solely for the purpose of representing the lock. This is now
changed to exclusvie-ephemeral lock, so as not to leave these objects
behind.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit bc0a5ff)
rgw: failed resharding clears resharding status from shard heads
Previously, when resharding failed, we restored the shard status on
the bucket info object. However the status on each of the shards was
left indicating a reshard was underway. This prevented some write
operations from taking place, as they would wait for resharding to
complete. This adds the missing functionality. It also makes the
functionality available to other classes via static functions in
RGWBucketReshard.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit 4577801)
rgw: move RGWReshardBucket lock to its own separate class
There are other processes beyond resharding that would need to take a
bucket reshard lock (e.g., correcting bucet resharding flags in event
of crash, tools to remove bucket shard information from earlier
versions of ceph). Pulling this logic outside of RGWReshardBucket
allows this code to be re-used.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit 18ab99c)
rgw: recover from incomplete reshard attempt
In case a reshard attempt is left in an incomplete state, i.e., flags
still show resharding even though the bucket reshard lock isn't being
held, try to recover by taking the bucket reshard lock and clearing
flags associated with resharding.

This change requires access to an RGWBucketInfo object. So call stack
into this function should provide that to prevent unnecessary
work. Changes were made to provide this object.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit 4891ae5)
@ivancich

This comment has been minimized.

Copy link
Member

ivancich commented Nov 2, 2018

@smithfarm Since I'd just done the downstream backport I wanted to just get the upstream backports out of the way. Please let me know if this PR needs any further attention from me. Thanks!

@cbodley cbodley added this to the mimic milestone Nov 2, 2018

@smithfarm smithfarm changed the title rgw: renew resharding locks to prevent expiration -- mimic backport mimic: rgw: renew resharding locks to prevent expiration Nov 3, 2018

@smithfarm smithfarm requested review from cbodley , oritwas and mattbenjamin Nov 3, 2018

@smithfarm smithfarm removed the backport label Nov 15, 2018

@smithfarm

This comment has been minimized.

Copy link
Contributor

smithfarm commented Nov 15, 2018

Cherry-picks look good - no conflicts. Thanks @ivancich

@yuriw

This comment has been minimized.

Copy link
Contributor

yuriw commented Nov 19, 2018

@yuriw yuriw merged commit 5d6ee50 into ceph:mimic Nov 20, 2018

4 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment