Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: multiple mds scrub support #35749

Merged
merged 19 commits into from Nov 17, 2020
Merged

mds: multiple mds scrub support #35749

merged 19 commits into from Nov 17, 2020

Conversation

ukernel
Copy link
Contributor

@ukernel ukernel commented Jun 24, 2020

Fixes: https://tracker.ceph.com/issues/12274

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard backend
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@ukernel ukernel added bug-fix cephfs Ceph File System labels Jun 24, 2020
@ukernel ukernel requested a review from batrick June 24, 2020 14:20
@batrick batrick changed the title mds: multiple mds support mds: multiple mds scrub support Jun 24, 2020
@batrick batrick requested review from sidharthanup and a team June 24, 2020 18:37
@ukernel ukernel force-pushed the simon_work_scrub branch 10 times, most recently from 6648144 to 73f24d7 Compare July 2, 2020 10:01
@ukernel ukernel force-pushed the simon_work_scrub branch 3 times, most recently from 9747945 to 02701f9 Compare July 14, 2020 12:19
@batrick
Copy link
Member

batrick commented Jul 16, 2020

jenkins test dashboard backend

@ukernel
Copy link
Contributor Author

ukernel commented Jul 20, 2020

jenkins retest this please

qa/tasks/cephfs/test_multimds_misc.py Outdated Show resolved Hide resolved
qa/tasks/cephfs/test_multimds_misc.py Show resolved Hide resolved
batrick added a commit to batrick/ceph that referenced this pull request Nov 7, 2020
* refs/pull/35749/head:
	qa/cephfs: Add more tests for multimds scrub
	qa/cephfs: add tests for multimds scrub
	qa/cephfs: update existing scrub test cases
	mds: don't skip validating disk state of symlink
	mds: abort/pause/resume scrubs in multiple mds
	mds: track scrub status in multiple mds
	mds: remove on_finish from {CInode,CDir}::scrub_info_t
	Continuation: don't delete self while there are in-processing stages
	mds: auth pin CInode when validating its disk state
	mds: rdlock file/nest lock when accumulating stats of subtree dirfrags
	mds: multiple mds scrub support
	include/frag: add encode/decode functions for fragset_t
	mds: remove object can't be scrubbed immediately from scrub stack
	mds: prevent dirfrag scrub/fragment from running at the same time
	mds: change scrub traverse from post-order to breadth-first search
	mds: make both CInode and CDir as entities of scrub
	mds: remove ScrubStack::scrubstack
Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few scrub errors (expected?):

https://pulpito.ceph.com/pdonnell-2020-11-07_19:18:17-multimds-wip-pdonnell-testing-20201107.025143-distro-basic-smithi/

Looks normal otherwise but I'll dig deeper on Monday.

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should cite this tracker: https://tracker.ceph.com/issues/12274

@ukernel
Copy link
Contributor Author

ukernel commented Nov 12, 2020

just add "Fixes: https://tracker.ceph.com/issues/12274" to commit "mds: multiple mds scrub support". no other change

@batrick
Copy link
Member

batrick commented Nov 13, 2020

@ukernel still need log ignorelist change for: #35749 (review)

ukernel and others added 18 commits November 16, 2020 09:02
comment says it's hack for dout. I don't see any reason it's needed.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Making CDir as entity of scrub is preparetion for scrubbing across
multiple mds. When subtree bound is encountered, scrub should be
forwarded to subtree's auth mds. The auth mds adds CDir to scrub stack.

Signed-off-by: Simon Gao <simon29rock@gmail.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
After using breadth-first search, scrubing a dir inode does not need
to wait until all of its descendant difrags/inodes are scrubbed. This
simplfies scrub code a lot. The downside is that a scrubbed dir inode
no longer implies corresponding subtree has been fully scrubbed. It
makes later scrub (without force option) less efficient.

Signed-off-by: Simon Gao <simon29rock@gmail.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
When a CDir is in scrub stack, mds should not split/merge it.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
This avoid checking objects that can't be scrubbed repeatly.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
If a non-auth object is encountered during scrubbing, forward scrub
to the object's auth mds.

Fixes: https://tracker.ceph.com/issues/12274
Signed-off-by: Simon Gao <simon29rock@gmail.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Simon Gao <simon29rock@gmail.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
A CInode/CDir is scrubbed no longer means corresponding subtree is fully
scrubbed. The on_finish in {CInode,CDir}::scrub_info_t become useless.
This patch also removes code that flushs journal if scrub has repaired
anything.  Later patch will add the code back at different place.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Scrubs are always initialized from mds.0. So mds.0 can ensure that scrub
tags are unique globally. mds.0 periodically gathers scrubs running in
itself and in other mds. A scrub is finished only if it's not running in
any mds.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Limit scrub abort/pause/resume commands to mds.0. mds.0 sends messages
to other mds, asks them to abort/pause/resume scrubs.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
we can check if backtrace is valid and check if inode number is in-use

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Now scrub is always async.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Sidharth Anupkrishnan <sanupkri@redhat.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
@batrick
Copy link
Member

batrick commented Nov 17, 2020

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20201116.174825

First round of errors were transient.

@batrick batrick merged commit f6639c8 into ceph:master Nov 17, 2020
@ukernel ukernel deleted the simon_work_scrub branch November 19, 2020 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants