New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr / volumes: background purge queue for subvolumes #28003
Merged
Merged
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
ae470a6
test: cleanup removing all subvolumes before removing subvolume group
vshankar dd569e2
mgr / volumes: use negative error codes everywhere
vshankar 04547c9
mgr / volumes: wrap rmtree() call within try..except block
vshankar 5c41e94
mgr / volumes: maintain connection pool for fs volumes
vshankar 483a214
mgr / volumes: purge queue for async subvolume delete
vshankar 06e44be
mgr / volumes: schedule purge job for volumes on init
vshankar aec1d90
test: add basic purge queue validation test
vshankar f7987f5
pybind/mgr/volumes: remove unused property
batrick 3d63cd9
pybind/mgr/subvolumes: use bytes for paths
batrick 7cc4b72
pybind/mgr/volumes: cleanup fs removal
batrick e9c314a
pybind/mgr/volumes: use existing client provided recursive mkdir
batrick e71e918
pybind/mgr/volumes: refactor trash readdir
batrick ecf8502
pybind/mgr/volumes: print exceptions in purge thread
batrick b049eb9
client: do not return EEXIST for mkdirs
batrick File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,198 @@ | ||
import time | ||
import logging | ||
import threading | ||
import traceback | ||
from collections import deque | ||
|
||
log = logging.getLogger(__name__) | ||
|
||
class PurgeQueueBase(object): | ||
""" | ||
Base class for implementing purge queue strategies. | ||
""" | ||
class PurgeThread(threading.Thread): | ||
def __init__(self, name, purge_fn): | ||
self.purge_fn = purge_fn | ||
# event object to cancel ongoing purge | ||
self.cancel_event = threading.Event() | ||
threading.Thread.__init__(self, name=name) | ||
|
||
def run(self): | ||
try: | ||
self.purge_fn() | ||
except Exception as e: | ||
trace = "".join(traceback.format_exception(None, e, e.__traceback__)) | ||
log.error("purge queue thread encountered fatal error:\n"+trace) | ||
|
||
def cancel_job(self): | ||
self.cancel_event.set() | ||
|
||
def should_cancel(self): | ||
return self.cancel_event.isSet() | ||
|
||
def reset_cancel(self): | ||
self.cancel_event.clear() | ||
|
||
def __init__(self, volume_client): | ||
self.vc = volume_client | ||
# volumes whose subvolumes need to be purged | ||
self.q = deque() | ||
# job tracking | ||
self.jobs = {} | ||
# lock, cv for kickstarting purge | ||
self.lock = threading.Lock() | ||
self.cv = threading.Condition(self.lock) | ||
# lock, cv for purge cancellation | ||
self.waiting = False | ||
self.c_lock = threading.Lock() | ||
self.c_cv = threading.Condition(self.c_lock) | ||
|
||
def queue_purge_job(self, volname): | ||
with self.lock: | ||
if not self.q.count(volname): | ||
self.q.append(volname) | ||
self.jobs[volname] = [] | ||
self.cv.notifyAll() | ||
|
||
def cancel_purge_job(self, volname): | ||
log.info("cancelling purge jobs for volume '{0}'".format(volname)) | ||
self.lock.acquire() | ||
unlock = True | ||
try: | ||
if not self.q.count(volname): | ||
return | ||
self.q.remove(volname) | ||
if not self.jobs.get(volname, []): | ||
return | ||
# cancel in-progress purge operation and wait until complete | ||
for j in self.jobs[volname]: | ||
j[1].cancel_job() | ||
# wait for cancellation to complete | ||
with self.c_lock: | ||
unlock = False | ||
self.waiting = True | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ajarr -- note that this essentially makes this interface unsafe for reentrancy. so, my question is (and I think Patrick mentioned about this somewhere): is there a need for the cancel interface to be reentrant safe? |
||
self.lock.release() | ||
while self.waiting: | ||
log.debug("waiting for {0} in-progress purge jobs for volume '{1}' to " \ | ||
"cancel".format(len(self.jobs[volname]), volname)) | ||
self.c_cv.wait() | ||
finally: | ||
if unlock: | ||
self.lock.release() | ||
|
||
def register_job(self, volname, purge_dir): | ||
log.debug("registering purge job: {0}.{1}".format(volname, purge_dir)) | ||
|
||
thread_id = threading.currentThread() | ||
self.jobs[volname].append((purge_dir, thread_id)) | ||
|
||
def unregister_job(self, volname, purge_dir): | ||
log.debug("unregistering purge job: {0}.{1}".format(volname, purge_dir)) | ||
|
||
thread_id = threading.currentThread() | ||
self.jobs[volname].remove((purge_dir, thread_id)) | ||
|
||
cancelled = thread_id.should_cancel() | ||
thread_id.reset_cancel() | ||
|
||
# wake up cancellation waiters if needed | ||
if not self.jobs[volname] and cancelled: | ||
logging.info("waking up cancellation waiters") | ||
self.jobs.pop(volname) | ||
with self.c_lock: | ||
self.waiting = False | ||
self.c_cv.notifyAll() | ||
|
||
def get_trash_entry_for_volume(self, volname): | ||
log.debug("fetching trash entry for volume '{0}'".format(volname)) | ||
|
||
exclude_entries = [v[0] for v in self.jobs[volname]] | ||
ret = self.vc.get_subvolume_trash_entry( | ||
None, vol_name=volname, exclude_entries=exclude_entries) | ||
if not ret[0] == 0: | ||
log.error("error fetching trash entry for volume '{0}': {1}".format(volname), ret[0]) | ||
return ret[0], None | ||
return 0, ret[1] | ||
|
||
def purge_trash_entry_for_volume(self, volname, purge_dir): | ||
ajarr marked this conversation as resolved.
Show resolved
Hide resolved
|
||
log.debug("purging trash entry '{0}' for volume '{1}'".format(purge_dir, volname)) | ||
|
||
thread_id = threading.currentThread() | ||
ret = self.vc.purge_subvolume_trash_entry( | ||
None, vol_name=volname, purge_dir=purge_dir, should_cancel=lambda: thread_id.should_cancel()) | ||
return ret[0] | ||
|
||
class ThreadPoolPurgeQueueMixin(PurgeQueueBase): | ||
""" | ||
Purge queue mixin class maintaining a pool of threads for purging trash entries. | ||
Subvolumes are chosen from volumes in a round robin fashion. If some of the purge | ||
entries (belonging to a set of volumes) have huge directory tree's (such as, lots | ||
of small files in a directory w/ deep directory trees), this model may lead to | ||
_all_ threads purging entries for one volume (starving other volumes). | ||
""" | ||
def __init__(self, volume_client, tp_size): | ||
super(ThreadPoolPurgeQueueMixin, self).__init__(volume_client) | ||
self.threads = [] | ||
for i in range(tp_size): | ||
self.threads.append( | ||
PurgeQueueBase.PurgeThread(name="purgejob.{}".format(i), purge_fn=self.run)) | ||
self.threads[-1].start() | ||
ajarr marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
def pick_purge_dir_from_volume(self): | ||
log.debug("processing {0} purge job entries".format(len(self.q))) | ||
nr_vols = len(self.q) | ||
to_remove = [] | ||
to_purge = None, None | ||
while nr_vols > 0: | ||
volname = self.q[0] | ||
# do this now so that the other thread picks up trash entry | ||
# for next volume. | ||
self.q.rotate(1) | ||
ret, purge_dir = self.get_trash_entry_for_volume(volname) | ||
if purge_dir: | ||
to_purge = volname, purge_dir | ||
break | ||
# this is an optimization when for a given volume there are no more | ||
# entries in trash and no purge operations are in progress. in such | ||
# a case we remove the volume from the tracking list so as to: | ||
# | ||
# a. not query the filesystem for trash entries over and over again | ||
# b. keep the filesystem connection idle so that it can be freed | ||
# from the connection pool | ||
# | ||
# if at all there are subvolume deletes, the volume gets added again | ||
# to the tracking list and the purge operations kickstarts. | ||
# note that, we do not iterate the volume list fully if there is a | ||
# purge entry to process (that will take place eventually). | ||
if ret == 0 and not purge_dir and not self.jobs[volname]: | ||
to_remove.append(volname) | ||
nr_vols -= 1 | ||
for vol in to_remove: | ||
log.debug("auto removing volume '{0}' from purge job".format(vol)) | ||
self.q.remove(vol) | ||
self.jobs.pop(vol) | ||
return to_purge | ||
|
||
def get_next_trash_entry(self): | ||
while True: | ||
# wait till there's a purge job | ||
while not self.q: | ||
log.debug("purge job list empty, waiting...") | ||
self.cv.wait() | ||
volname, purge_dir = self.pick_purge_dir_from_volume() | ||
if purge_dir: | ||
return volname, purge_dir | ||
log.debug("no purge jobs available, waiting...") | ||
self.cv.wait() | ||
|
||
def run(self): | ||
while True: | ||
with self.lock: | ||
volname, purge_dir = self.get_next_trash_entry() | ||
self.register_job(volname, purge_dir) | ||
ret = self.purge_trash_entry_for_volume(volname, purge_dir) | ||
if ret != 0: | ||
log.warn("failed to purge {0}.{1}".format(volname, purge_dir)) | ||
with self.lock: | ||
self.unregister_job(volname, purge_dir) | ||
time.sleep(1) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run the tests locally (with vstart_runner on vstart cluster), this test times out,
I checked the contents of the filesystem with a manual fuse-mount, the trash folder wasn't empty. It never got purged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I manually populated a subvolume with data, and removed the subvolume. The corresponding trash folder didn't get purged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what was the uid/gid of the populated data? did you perform a plain
cp
? did you observe any error messages in manager log?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the purge thread will fail to remove the directory (and it's entries) from trash if it does not have perms to remove those based on uid/gid. Were the files/dirs created by root?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajarr ping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajarr -- pinging back on this? If this is not an issue then I can post the updated PR for review (minus the fix for assertion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The uid/gid of the populated data was same as that of the
ceph-mgr
process.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added more logging and the d_name.decode() fix as follows,
Even though a thread picks up the trashdir and registers it, the thread doesn't seem to purge it, i.e., execute
purge_trash_entry_for_volume
. See the mgr log snippet, after thefs subvolume rm
of a subvolume is called, https://pastebin.com/0MWqCJG7 . I didn't see the debug log message of being in purge_trash_entry_for_volume in the mgr log.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that may be due to buffered logging (I've seen that before in an mgr module). if an exception is raised in
purge_trash_entry_for_volume()
, the string is never logged.