Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: distribute dirfrags for ephemeral distributed directory #36537

Merged
merged 2 commits into from
Oct 13, 2020

Conversation

ukernel
Copy link
Contributor

@ukernel ukernel commented Aug 10, 2020

Instead of distribute individual dir inodes inside the ephemeral
distributed dir. Distributing dirfrags can limit number of subtrees
created by the ephemeral dist pin.

This patch also unifies codes that handle export pin and ephemeral pin.

Fixes: https://tracker.ceph.com/issues/46696
Signed-off-by: "Yan, Zheng" zyan@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR will also need to update the tests in test_exports.py.

@markhpc
Copy link
Member

markhpc commented Aug 10, 2020

@ukernel I will try giving this a spin on on the io500 cluster setup.

@ukernel
Copy link
Contributor Author

ukernel commented Aug 12, 2020

@batrick
Copy link
Member

batrick commented Aug 19, 2020

retest this please

@batrick
Copy link
Member

batrick commented Aug 20, 2020

@ukernel there are some qa tox failures to fix.

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mostly focusing on the tests to identify changes in behavior. I'll glance at the code once I grok that.

Still not done with the tests though. Submitting my comments so far...

self.assertIsNotNone(which)
self._setup_tree(distributed=True)
subtrees = self._wait_distributed_subtrees(3 * 2, status=self.status, rank="all")
self.mount_a.setfattr("tree", "ceph.dir.pin", "0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've changed what the test was checking. We want to check that the immediate child directory (like /tree/3) which is ephemerally pinned to rank 1 can be moved to rank 0 via an export pin.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize the directory is fragmented now... can we load the MDS cache to find a sub-directory is pinned to rank 1 and then export pin that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_ephemeral_pin_dist_override_before does that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so what is the purpose of this test? Now the test is pinning /tree instead of /tree/<N>. We expect the fragments to stay pinned now?

Then the test below should just be:

self._wait_subtrees([('/tree', 0)], status=self.status)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._wait_subtrees([('/tree', 0)], ...) does not work. we don't know how many frags are there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm not understanding the new behavior but we expect 3*2 fragments to be created and distributed after setting the distributed epin on /tree. Then you're pinning /tree to 0. That should cause all of the fragments to lose their ephemeral pins and move back to rank 0, right? Then I would expect the subtrees to merge and /tree should be the only subtree, pinned to rank 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merging dirfrags needs a trigger. this is no one

"""
That turning off ephemeral distributed pin config merges subtrees.
FIXME: who triggers the merge?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test relies on bug in CInode::maybe_export_pin.

@batrick
Copy link
Member

batrick commented Sep 3, 2020

retest this please

@batrick
Copy link
Member

batrick commented Sep 22, 2020

retest this please

@ukernel ukernel force-pushed the wip-46696 branch 2 times, most recently from 5cacb81 to 98846ae Compare September 23, 2020 08:05
Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

if (dir->is_auth()) {
/* do this now that we are auth for the CDir */
dir->inode->maybe_pin();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was part of 306003c. Why is this no longer necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MDBalancer::handle_export_pins() already checks auth subtrees. no reason do the check here

src/mds/MDSRank.cc Show resolved Hide resolved
}
}

void CInode::maybe_export_pin(bool update)
{
if (!g_conf()->mds_bal_export_pin)
return;
if (!is_dir() || !is_normal())
if (!is_dir())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because '!is_dir() || !is_normal()' is always true

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because '!is_dir() || !is_normal()' is always true

uh? No it's not? The main point of that check was to exclude root.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right

@ukernel
Copy link
Contributor Author

ukernel commented Sep 24, 2020

retest this please

@ukernel
Copy link
Contributor Author

ukernel commented Sep 24, 2020

retest this please

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise LGTM

src/mds/MDSRank.cc Show resolved Hide resolved
}
}

void CInode::maybe_export_pin(bool update)
{
if (!g_conf()->mds_bal_export_pin)
return;
if (!is_dir() || !is_normal())
if (!is_dir())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because '!is_dir() || !is_normal()' is always true

uh? No it's not? The main point of that check was to exclude root.

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ukernel
Copy link
Contributor Author

ukernel commented Sep 29, 2020

@batrick
Copy link
Member

batrick commented Oct 9, 2020

jenkins test make check

@batrick
Copy link
Member

batrick commented Oct 10, 2020

jenkins test make check

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ukernel this needs rebased. Testing done here:

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20201009.012350

I didn't see any failures that looked to be caused by this PR. Please have a glance yourself. Let's merge if you're satisfied.

Instead of distribute individual dir inodes inside the ephemeral
distributed dir. Distributing dirfrags can limit number of subtrees
created by the ephemeral dist pin.

This patch also unifies codes that handle export pin and ephemeral pin.

Fixes: https://tracker.ceph.com/issues/46696
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants