-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qa/tasks/quiescer: dump ops in parallel #57302
Conversation
Not yet tested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this, Patrick!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! It should be easy to test this with any of the upcoming teuthology batches by using this branch as the suite
jenkins test api |
jenkins test make check arm64 |
2 similar comments
jenkins test make check arm64 |
jenkins test make check arm64 |
This PR is under test in https://tracker.ceph.com/issues/65867. |
jenkins test make check arm64 |
qa/tasks/quiescer.py
Outdated
@@ -186,14 +186,15 @@ def dump_ops_all_ranks(self, dump_tag): | |||
|
|||
self.logger.debug(f"Dumping ops on rank {rank} ({name}) to a remote file {remote_path}") | |||
try: | |||
_ = self.fs.rank_tell(['ops', '--flags=locks', f'--path={daemon_path}'], rank=rank) | |||
remote_dumps.append((info, remote_path)) | |||
p = self.fs.rank_tell(['ops', '--flags=locks', f'--path={daemon_path}'], rank=rank, wait=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake, I cannot do this:
2024-05-09T00:35:43.460 ERROR:tasks.quiescer.fs.[cephfs]:Couldn't pull ops dump at '/var/run/ceph/b96c13bc-0d98-11ef-bc97-c7b262605968/ops-7749c26b-1-mds.i.json' on rank 2 (i), error: 'dict' object has no attribute 'wait'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:'( I was looking forward to the PR... How hard is it to add the async capability to the rank_tell
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using the helper rank_tell
, probably just manually submit the command instead.
Since this --flags=locks takes the mds_lock and dumps thousands of ops, this may take a long time to complete for each individual MDS. The entire quiesce set may timeout (and all q ops killed) before we finish dumping ops. Fixes: https://tracker.ceph.com/issues/65823 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
jenkins test make check arm64 |
seems to work as advertised now: /teuthology/pdonnell-2024-05-16_16:19:21-fs:workload-main-distro-default-smithi/7709343/teuthology.log |
Since this --flags=locks takes the mds_lock and dumps thousands of ops, this may take a long time to complete for each individual MDS. The entire quiesce set may timeout (and all q ops killed) before we finish dumping ops.
Fixes: https://tracker.ceph.com/issues/65823
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e