-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mds/quiesce: fix timeouts, a crash, and overdrive a tree export when possible #57579
Conversation
3b748ea
to
7956b5e
Compare
jenkins test make check arm64 |
1 similar comment
jenkins test make check arm64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mds/quiesce: don't force a remote authpin for the quiesce lock
was not part of the two runs I did for
https://pulpito.ceph.com/?branch=wip-lusov-quiesce-overdrive-export
I would prefer to keep that in another PR/ticket. Let's get the export fix merged quickly.
I'm not sure. I think they work together, where the absence of the remote authpin in most of the cases eliminates the possibility for a deadlock. It also reduces the inter-rank messaging. And it's true to the current design of the quiesce. And it's required to fix that test that fails otherwise. Your run shows one quiesce timeout due to exporting. I have looked at that one and it appears to be the renaming issue or rather a different kind of it. I'm re-running just the exporter with full replication x8 to stress-test this PR: https://pulpito.ceph.com/leonidus-2024-05-21_18:10:18-fs-wip-lusov-quiesce-overdrive-export-distro-default-smithi/ |
No quiesce errors. I do think that the absence of the authpin plays the role. To validate it I'll rerun the same suite as above but with the previous version of the branch, without the authpin change |
|
scheduling 20 jobs of the workload that has seen the rename issue. without the AP change: https://pulpito.ceph.com/leonidus-2024-05-22_06:28:15-fs-wip-lusov-quiesce-overdrive-export-distro-default-smithi/ with the AP change: https://pulpito.ceph.com/leonidus-2024-05-22_06:28:39-fs-wip-lusov-quiesce-overdrive-export-distro-default-smithi/ |
Yes, I found that issue with the rename, and it's indeed due to the remote authpin from the quiesce request. quiesce request on mds.1:
took authpin on mds 0:
then the rename request tried to authpin-freeze the same inode:
this inode is now freezing and is blocking the quiesce but it can't freeze until the quiesce is over and that first authpin is lifted --> deadlock |
7956b5e
to
0cfaf3a
Compare
|
@batrick can we please have a single PR for those two tickets? They will have to be backported everywhere in a batch... 🙏🏻 🙏🏻 |
0cfaf3a
to
9dc8c01
Compare
jenkins test make check arm64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to QA -s fs:functional --filter quiesce
.
mds/qiuesce: allow quiescing a single file in dispatch_quiesce_path
will break test_quiesce_path_regfile
mds: quiesce_path: don't block the asock thread and return an adequate rc
-> mds: do not block the asok thread via quiesce_path and return an adequate rc
otherwise the change is 👍
mds: add
lifetimecmd param to
lock path`` 👍
qa/cephfs: quiesce: test that a quiesce op doesn't hold remote ap
👍
Other fixes are good. You've convinced me to combine the changes.
240841e
to
6e074e1
Compare
8e0427a
to
11ffcf2
Compare
Latest: Stress exporting/renaming/fragmenting - pass (*half of the jobs had failed because of the backtrace scrub, but no quiesce failures)
General with-quiesce: 2 timeouts
7724309 and 772410 is a problem with the QuiesceAgent ack reordering, reported this in a new defect: https://tracker.ceph.com/issues/66219 |
… adequate rc Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
1. avoid taking a remote authpin for the quiesce lock 2. drop remote authpins that were taken because of other locks We should not be forcing a mustpin when taking quiesce lock. This creates unnecessary overhead due to the distributed nature of the quiesce: all ranks will execute quiesce_inode, including the auth rank, which will authpin the inode. Auth pinning on the auth rank is important to synchronize quiesce with operations that are managed by the auth, like fragmenting and exporting. If we let a remote quiesce process take a foreign authpin then it may block freezing on the auth, which will stall quiesce locally. This wouldn't be a problem if the quiesce that is blocked on the auth and the quiesce that's holding a remote authpin from the replica side were unrelated, but in our case it may be the same logical quiesce that effectively steps on its own toes. This creates an opportunity for a deadlock. Fixes: https://tracker.ceph.com/issues/66152 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Just like with the fragmenting, we should abort an ongoing export if a quiesce is attempted for the directory. To minimize the stress for the system, we only allow the abort if the export hasn't yet managed to freeze the tree. If that is the case, then quiesce will have to wait for the export to finish. Fixes: https://tracker.ceph.com/issues/66123 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
…rancy Fixes: https://tracker.ceph.com/issues/66208 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
In this scenario, the agent thread is able to run and generate an ack before the db_update call returns to the caller. Fixes: https://tracker.ceph.com/issues/66219 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Defer to the agent thread to perform all acking. This avoids race conditions between the updating thread and the acking thread. Fixes: https://tracker.ceph.com/issues/66219 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
11ffcf2
to
9a4c585
Compare
Fixes: https://tracker.ceph.com/issues/66225 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
jenkins test make check |
jenkins test windows |
A rerun with the fix of https://tracker.ceph.com/issues/66219 found by the tests in #57579 (comment): Functional - pass Export/replication kernel_untar_build stress 8/20 pass, no quiesce errors
A generic
A second generic
The analysis suggests that this is a client issue, see https://tracker.ceph.com/issues/66229. In total, there were 980 successful quiesces over the three runs of this batch. The quiesce times are measured by the script, and as such are not reliable, but still, here's the distribution: ![]() |
A re-run with the change for [quiesce] disable debug parameters on quiesce roots Functional - pass A generic
Both issues are due to messaging errors preventing the delivery of an ack: 7727755 - during the quiesce
7727756 - during the release
Stress test with replication and exporting
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic!
Fixes: https://tracker.ceph.com/issues/66152 - mds/quiesce: holding remote authpins for the duration of the quiesce op may cause deadlocks
Fixes: https://tracker.ceph.com/issues/66123 - Quiesce timeout due to exporting
Fixes: https://tracker.ceph.com/issues/66208 - Segfault when
quiesce_overdrive_fragmenting
synchronously callsdispatch_fragment_dir
for an abortFixes: https://tracker.ceph.com/issues/66219 - [quiesce timeout] QuiesceAgent may send an async QUIESCED ack before the QuiesceManager does the sync QUIESCING ack, which causes the QUIESCED ack to be lost
Fixes: https://tracker.ceph.com/issues/66225 - [quiesce] disable debug parameters on quiesce roots
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e