New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mds: multiple mds scrub support #35749
Conversation
e3e0410
to
96606b7
Compare
6648144
to
73f24d7
Compare
9747945
to
02701f9
Compare
jenkins test dashboard backend |
02701f9
to
a29a0c1
Compare
jenkins retest this please |
a29a0c1
to
9caf8d2
Compare
9caf8d2
to
a604d03
Compare
a604d03
to
5f1d3a3
Compare
* refs/pull/35749/head: qa/cephfs: Add more tests for multimds scrub qa/cephfs: add tests for multimds scrub qa/cephfs: update existing scrub test cases mds: don't skip validating disk state of symlink mds: abort/pause/resume scrubs in multiple mds mds: track scrub status in multiple mds mds: remove on_finish from {CInode,CDir}::scrub_info_t Continuation: don't delete self while there are in-processing stages mds: auth pin CInode when validating its disk state mds: rdlock file/nest lock when accumulating stats of subtree dirfrags mds: multiple mds scrub support include/frag: add encode/decode functions for fragset_t mds: remove object can't be scrubbed immediately from scrub stack mds: prevent dirfrag scrub/fragment from running at the same time mds: change scrub traverse from post-order to breadth-first search mds: make both CInode and CDir as entities of scrub mds: remove ScrubStack::scrubstack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few scrub errors (expected?):
Looks normal otherwise but I'll dig deeper on Monday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should cite this tracker: https://tracker.ceph.com/issues/12274
b088aff
to
41786a4
Compare
just add "Fixes: https://tracker.ceph.com/issues/12274" to commit "mds: multiple mds scrub support". no other change |
@ukernel still need log ignorelist change for: #35749 (review) |
comment says it's hack for dout. I don't see any reason it's needed. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Making CDir as entity of scrub is preparetion for scrubbing across multiple mds. When subtree bound is encountered, scrub should be forwarded to subtree's auth mds. The auth mds adds CDir to scrub stack. Signed-off-by: Simon Gao <simon29rock@gmail.com> Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
After using breadth-first search, scrubing a dir inode does not need to wait until all of its descendant difrags/inodes are scrubbed. This simplfies scrub code a lot. The downside is that a scrubbed dir inode no longer implies corresponding subtree has been fully scrubbed. It makes later scrub (without force option) less efficient. Signed-off-by: Simon Gao <simon29rock@gmail.com> Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
When a CDir is in scrub stack, mds should not split/merge it. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
This avoid checking objects that can't be scrubbed repeatly. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
If a non-auth object is encountered during scrubbing, forward scrub to the object's auth mds. Fixes: https://tracker.ceph.com/issues/12274 Signed-off-by: Simon Gao <simon29rock@gmail.com> Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Simon Gao <simon29rock@gmail.com> Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
A CInode/CDir is scrubbed no longer means corresponding subtree is fully scrubbed. The on_finish in {CInode,CDir}::scrub_info_t become useless. This patch also removes code that flushs journal if scrub has repaired anything. Later patch will add the code back at different place. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Scrubs are always initialized from mds.0. So mds.0 can ensure that scrub tags are unique globally. mds.0 periodically gathers scrubs running in itself and in other mds. A scrub is finished only if it's not running in any mds. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Limit scrub abort/pause/resume commands to mds.0. mds.0 sends messages to other mds, asks them to abort/pause/resume scrubs. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
we can check if backtrace is valid and check if inode number is in-use Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Now scrub is always async. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Sidharth Anupkrishnan <sanupkri@redhat.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
41786a4
to
c42570a
Compare
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20201116.174825 First round of errors were transient. |
Fixes: https://tracker.ceph.com/issues/12274
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard backend
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox