-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mds: distribute dirfrags for ephemeral distributed directory #36537
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR will also need to update the tests in test_exports.py
.
@ukernel I will try giving this a spin on on the io500 cluster setup. |
retest this please |
@ukernel there are some qa tox failures to fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mostly focusing on the tests to identify changes in behavior. I'll glance at the code once I grok that.
Still not done with the tests though. Submitting my comments so far...
self.assertIsNotNone(which) | ||
self._setup_tree(distributed=True) | ||
subtrees = self._wait_distributed_subtrees(3 * 2, status=self.status, rank="all") | ||
self.mount_a.setfattr("tree", "ceph.dir.pin", "0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've changed what the test was checking. We want to check that the immediate child directory (like /tree/3
) which is ephemerally pinned to rank 1 can be moved to rank 0 via an export pin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize the directory is fragmented now... can we load the MDS cache to find a sub-directory is pinned to rank 1 and then export pin that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_ephemeral_pin_dist_override_before does that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so what is the purpose of this test? Now the test is pinning /tree
instead of /tree/<N>
. We expect the fragments to stay pinned now?
Then the test below should just be:
self._wait_subtrees([('/tree', 0)], status=self.status)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._wait_subtrees([('/tree', 0)], ...) does not work. we don't know how many frags are there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm not understanding the new behavior but we expect 3*2 fragments to be created and distributed after setting the distributed epin on /tree
. Then you're pinning /tree
to 0
. That should cause all of the fragments to lose their ephemeral pins and move back to rank 0, right? Then I would expect the subtrees to merge and /tree
should be the only subtree, pinned to rank 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merging dirfrags needs a trigger. this is no one
""" | ||
That turning off ephemeral distributed pin config merges subtrees. | ||
FIXME: who triggers the merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test relies on bug in CInode::maybe_export_pin.
retest this please |
retest this please |
5cacb81
to
98846ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
if (dir->is_auth()) { | ||
/* do this now that we are auth for the CDir */ | ||
dir->inode->maybe_pin(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was part of 306003c. Why is this no longer necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MDBalancer::handle_export_pins() already checks auth subtrees. no reason do the check here
src/mds/CInode.cc
Outdated
} | ||
} | ||
|
||
void CInode::maybe_export_pin(bool update) | ||
{ | ||
if (!g_conf()->mds_bal_export_pin) | ||
return; | ||
if (!is_dir() || !is_normal()) | ||
if (!is_dir()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because '!is_dir() || !is_normal()' is always true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because '!is_dir() || !is_normal()' is always true
uh? No it's not? The main point of that check was to exclude root.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right
retest this please |
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise LGTM
src/mds/CInode.cc
Outdated
} | ||
} | ||
|
||
void CInode::maybe_export_pin(bool update) | ||
{ | ||
if (!g_conf()->mds_bal_export_pin) | ||
return; | ||
if (!is_dir() || !is_normal()) | ||
if (!is_dir()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because '!is_dir() || !is_normal()' is always true
uh? No it's not? The main point of that check was to exclude root.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jenkins test make check |
jenkins test make check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ukernel this needs rebased. Testing done here:
https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20201009.012350
I didn't see any failures that looked to be caused by this PR. Please have a glance yourself. Let's merge if you're satisfied.
Instead of distribute individual dir inodes inside the ephemeral distributed dir. Distributing dirfrags can limit number of subtrees created by the ephemeral dist pin. This patch also unifies codes that handle export pin and ephemeral pin. Fixes: https://tracker.ceph.com/issues/46696 Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Instead of distribute individual dir inodes inside the ephemeral
distributed dir. Distributing dirfrags can limit number of subtrees
created by the ephemeral dist pin.
This patch also unifies codes that handle export pin and ephemeral pin.
Fixes: https://tracker.ceph.com/issues/46696
Signed-off-by: "Yan, Zheng" zyan@redhat.com
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox