Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: add ceph.dir.bal.mask vxattr for MDS Balancer #52373

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

yongseokoh
Copy link
Contributor

tracker: https://tracker.ceph.com/issues/61777

Introduction
This PR introduces the ceph.dir.bal.mask vxattr, which is an option to rebalance a subtree within specific active MDSs. Similar to the CPU mask, this feature enables load balancing of specific directories across multiple MDS ranks. It is especially useful for fine-tuning and improving performance in various scenarios. Previously, the bal_rank_mask in #43284 supports isolating unpinned subtrees under the root directory ('/') to a specific MDS rank. However, with this new option vxattr, it becomes possible to isolate specific subdirectories to designated MDS ranks. By introducing the ceph.dir.bal.mask vxattr, this PR empowers Ceph administrators with enhanced control and flexibility for optimizing performance and fine-tuning their deployments.

Use Cases
The first is when it is difficult to pin a subdir to one MDS rank. The /home/images directory exists. There are /0 to /99 directories under it, and 10 million image files are stored in each directory. In this case, it is difficult to pin the entire images directory to one MDS rank. Also, pinning the huge 100 directories manually or using ephemeral pinning is not an easy task. Therefore, efficient resource management is possible by using ceph.dir.bal.mask.

Second, when there are several large directories such as /home/images, performance can be optimized by distributing them to different MDS rank groups using ceph.dir.bal.mask. Since the existing mdsmap’s bal_rank_mask isolated the entire ‘/’ directory to specific ranks, it can affect performance due to each other's migration overhead. For example, mdsmap’s bal_rank_mask is set to 0xf and /home/images and /home/backups large directories exist. If the load on /home/images instantaneously increases, metadata distribution occurs across ranks 0 to 3. Thus, users of /home/backups may be affected by noisy neighbors unnecessarily. If the two directories are set to MDS rank 0-1 (ceph.dir.bal.mask 0x3) and 2-3 (ceph.dir.bal.mask 0xC) respectively, the effect on each other can be minimized. Like this, it can be used efficiently for various directories.

How to use

# / root is distributed within MDS rank 0 to 3
# it overrides mdsmap bal_rank_mask value
setfattr -n ceph.dir.bal.mask -v 0x0f /

# /home/images subdir is distributed within MDS rank 1 and 2
setfattr -n ceph.dir.bal.mask -v 0x06 /home/images

# /home/images subdir is distributed within MDS rank 3 and 4
setfattr -n ceph.dir.bal.mask -v 0x18 /home/backups

# remove values and they will obey parent the value or mdsmap bal_rank_mask
setfattr -n ceph.dir.bal.mask -v -1 /home/images
setfattr -n ceph.dir.bal.mask -v -1 /home/backups

# /home/images is placed only on MDS rank1 (similar to ceph.dir.dir)
setfattr -n ceph.dir.bal.mask -v 0x02 /home/images

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@yongseokoh yongseokoh requested a review from a team as a code owner July 10, 2023 06:18
@github-actions github-actions bot added cephfs Ceph File System core mon labels Jul 10, 2023
@vshankar vshankar self-assigned this Aug 17, 2023
Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few things:

  • Please add the PR discussion to your commit message.
  • Because the bitset is 256 bits, it's not generally easy to compute the xattr value for the caller. I think it would be helpful to have the MDS compute the value by allowing something like setfattr -n ceph.dir.bal.mak -v 0,1,3,15 dir/ such that the MDS will do the bitwise or of those bits.
  • I'd like to see some tests in qa/tasks/cephfs/test_exports.py. You can use vstart_runner.py to test.
  • There should be some docs added to explain this and the MDSMap bal rank mask. Which should users prefer and when? Is it valid to set the rank mask on the root directory? Are values inherited or override-able?

@@ -432,6 +432,12 @@ class SetHandler : public FileSystemCommandHandler
if (r != 0) {
return r;
}
std::bitset<MAX_MDS> rank_mask = std::bitset<MAX_MDS>(bin_string);
if (!rank_mask.test(0)) {
ss << "the first bit must be set for rank0";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain (to the user)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There have been some slight changes in the implementation. My current implementation involves creating a subtree root for the target directory and exporting that node to a different rank. Then, rebalancing is performed within the masked ranks. For example, if you run "setfattr -n ceph.dir.bal.mask -v 1,2 /home/backup," it converts /home/backup into a subtree and moves it to rank 1. Subsequently, the balancer queries the bal_rank_mask vxattr values of subtrees with authority on that rank and redistributes them within the range of 1 to 2. The '/' root inode is indeed fixed at rank 0, making it impossible to move it to a different rank and apply the logic. Therefore, I’ve added the constraint to ensure that rank 0 is always included for the root ‘/‘.

The existing bal_rank_mask implementation in #43284 sends unpinned subtrees (excluding the root '/') to masked ranks for redistribution. The ceph.dir.bal.mask functionality has been implemented, and during the process of integrating the bal_rank_mask feature, constraints were added out of necessity.

src/include/cephfs/types.h Outdated Show resolved Hide resolved
@@ -432,6 +432,12 @@ class SetHandler : public FileSystemCommandHandler
if (r != 0) {
return r;
}
std::bitset<MAX_MDS> rank_mask = std::bitset<MAX_MDS>(bin_string);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please carve out this into a separate commit since this is through the fs set interface (for mdsmap) rather than the vxattr interface proposal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will split the PR once the discussion about restrictionon rank0 is resolved.
#52373 (comment)

src/mds/CInode.cc Outdated Show resolved Hide resolved
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@yongseokoh
Copy link
Contributor Author

@batrick @vshankar Please review my changes.

Few things:

  • Please add the PR discussion to your commit message.

Done.

  • Because the bitset is 256 bits, it's not generally easy to compute the xattr value for the caller. I think it would be helpful to have the MDS compute the value by allowing something like setfattr -n ceph.dir.bal.mak -v 0,1,3,15 dir/ such that the MDS will do the bitwise or of those bits.

I agree. It is not easy to calculate numerous bits and configure the bitfield.

  • I'd like to see some tests in qa/tasks/cephfs/test_exports.py. You can use vstart_runner.py to test.

Test cases for ceph.dir.bal.mask were implemented in test_exports.py.

  • There should be some docs added to explain this and the MDSMap bal rank mask. Which should users prefer and when? Is it valid to set the rank mask on the root directory? Are values inherited or override-able?

How to use ceph.dir.bal.mask is explained in the document.

@idryomov idryomov removed the request for review from a team September 16, 2023 13:21
@idryomov idryomov removed the rbd label Sep 16, 2023
@yongseokoh
Copy link
Contributor Author

jenkins retest this please

@yongseokoh yongseokoh force-pushed the wip-61777 branch 2 times, most recently from 4aca9f4 to 53a786c Compare September 18, 2023 04:55
@yongseokoh
Copy link
Contributor Author

jenkins test make check

@yongseokoh
Copy link
Contributor Author

jenkins test make check arm64

@yongseokoh yongseokoh mentioned this pull request Oct 31, 2023
14 tasks
@yongseokoh
Copy link
Contributor Author

@vshankar ping?

@vshankar
Copy link
Contributor

vshankar commented Nov 2, 2023

@vshankar ping?

Was on vacation. Will recheck tomorrow or early next week.

@cbodley cbodley removed the request for review from a team November 2, 2023 13:50
Copy link

github-actions bot commented Jan 1, 2024

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jan 1, 2024
Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise the change look reasonable.

@@ -520,6 +521,7 @@ struct inode_t {

double export_ephemeral_random_pin = 0;
bool export_ephemeral_distributed_pin = false;
std::string bal_rank_mask;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, probably the main concern that I have with this change is the increase in the size of the inode which would happen irrespective of the xattr being set.

What are your thoughts on persisting this in the xattr map @yongseokoh ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vshankar There are downsides that may affect inode size. On the other hand, CephFS users have the advantage of being able to flexibly map different subtrees to multiple mds ranks depending on their use case. I think the benefits are great. Also, CephFS users are familiar with using xattr, so I think this method is convenient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine - the xattr interface can remain as it is - just that we could benefit from not increasing the inode size if we can store the mask in xattr_map rather than in the inode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean xattr_map used in #36603? I will try to see if it can be employed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vshankar

I have a rough understanding of xattrs through ceph.mirror.dirty_snap_id xattr.

"xattrs": [
      {
          "key": "ceph.mirror.dirty_snap_id",
          "val": "7"
      }
  ]

Using xattrs seems like a good idea. However, it would be better for this proposed PR to be managed as the ceph.dir.* family. It seems to be easier to manage in the long term if it is managed in a similar way to the existing ceph.dir.pin and ceph.dir.pin.random. Then, please give your opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vshankar @batrick I will change it to use the xattrs map based on your feedback. (It was late because I was on vacation until yesterday. )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yongseokoh Sure thing - thx!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vshankar @batrick I changed this PR to use xattrs map. I confirmed that ceph.dir.bal.mask xattr can be changed through setfattr/getfattr even without changing Client.cc.
afa6c73#diff-277dc6e796ccecb6aa14c9357f7a86898d6ddcf4113c110a4e00e0a10f6fefa7R6427

Do I have to change client.cc as mentioned by Patrick ?

#52373 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I have to change client.cc as mentioned by Patrick ?

I don't think that's required since the mirror_info xattr is a bit differently handled as it's a compound xattr.

Make sure relevant tests are added for validating get/set of the xattr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions github-actions bot removed the stale label Jan 2, 2024
@vshankar
Copy link
Contributor

vshankar commented Jan 11, 2024 via email

yongseokoh referenced this pull request Feb 23, 2024
That introduces the ceph.dir.bal.mask vxattr, which
is an option to rebalance a subtree within
specific active MDSs. Similar to the CPU mask,
this feature enables load balancing of specific
directories across multiple MDS ranks. It is especially
useful for fine-tuning and improving performance
in various scenarios. Previously, the bal_rank_mask
in #43284 supports isolating unpinned subtrees under
the root directory ('/') to a specific MDS rank.
However, with this new option vxattr, it becomes
possible to isolate specific subdirectories to
designated MDS ranks. By introducing the
ceph.dir.bal.mask vxattr, this PR empowers
Ceph administrators with enhanced control and
flexibility for optimizing performance and
fine-tuning their deployments.

trakcer: https://tracker.ceph.com/issues/61777
Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
@vshankar
Copy link
Contributor

jenkins test make check

@vshankar
Copy link
Contributor

vshankar commented Mar 8, 2024

https://pulpito.ceph.com/?branch=wip-vshankar-testing-20240307.022905

Will review the run next week.

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yongseokoh PTAL at the failed tests.

As far as this run in concerned, we ran into a bug with the testing kernel causing some tests failures unrelated to this change. See - https://tracker.ceph.com/issues/64679#note-4

Once those two issues are fixed, the failed tests would need a rerun. Other test look good 👍

This is getting really close to getting merged.


self.mount_a.setfattr("1", "ceph.dir.bal.mask", "2")
value = self.mount_a.getfattr("1", "ceph.dir.bal.mask")
self.assertEqual(value, "2")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert check fails - https://pulpito.ceph.com/vshankar-2024-03-07_10:41:12-fs-wip-vshankar-testing-20240307.022905-testing-default-smithi/7584618/

2024-03-07T22:51:06.010 INFO:tasks.cephfs_test_runner:======================================================================
2024-03-07T22:51:06.010 INFO:tasks.cephfs_test_runner:FAIL: test_mds_failover (tasks.cephfs.test_exports.TestBalMask)
2024-03-07T22:51:06.010 INFO:tasks.cephfs_test_runner:That MDS failover does not affect the ceph.dir.bal.mask.
2024-03-07T22:51:06.010 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-03-07T22:51:06.011 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2024-03-07T22:51:06.011 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_b0ea271bb96f7f4cec2f2a40ff88a5def1e0530f/qa/tasks/cephfs/test_exports.py", line 934, in test_mds_failover
2024-03-07T22:51:06.011 INFO:tasks.cephfs_test_runner:    self.assertEqual(value, "2")
2024-03-07T22:51:06.011 INFO:tasks.cephfs_test_runner:AssertionError: None != '2'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vshankar I will check and get back to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the vstart-based local test, all 20 test cases passed. I will look at the log you provided and see if I can reproduce it.

2024-03-13 11:38:19,031.031 INFO:__main__:test_unset_rank_mask_under_nested_parent (tasks.cephfs.test_exports.TestBalMask) ... ok
2024-03-13 11:38:19,031.031 INFO:__main__:test_unset_rank_mask_under_nested_root (tasks.cephfs.test_exports.TestBalMask) ... ok
2024-03-13 11:38:19,031.031 INFO:__main__:
2024-03-13 11:38:19,031.031 INFO:__main__:----------------------------------------------------------------------
2024-03-13 11:38:19,031.031 INFO:__main__:Ran 20 tests in 1378.896s
2024-03-13 11:38:19,031.031 INFO:__main__:
2024-03-13 11:38:19,032.032 INFO:__main__:
2024-03-13 11:38:19,032.032 INFO:__main__:> ip netns list
2024-03-13 11:38:19,039.039 INFO:__main__:> sudo ip link delete ceph-brx
Cannot find device "ceph-brx"
2024-03-13 11:38:19,066.066 INFO:__main__:OK
2024-03-13 11:38:19,067.067 INFO:__main__:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vshankar
I think it's a timing issue. This is the case where getfattr fails after setfattr regardless of MDS fail. Can MDS logs be shared?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The major difference is the use of the kernel client. I only tested ceph-fuse.
In the vstart local test, both kernelclient and ceph-fuse tests passed without any problems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vshankar
I think it's a timing issue. This is the case where getfattr fails after setfattr regardless of MDS fail. Can MDS logs be shared?

I can have a look at the logs to see if I can spot something. The logs would be huge, but here they are: http://qa-proxy.ceph.com/teuthology/vshankar-2024-03-07_10:41:12-fs-wip-vshankar-testing-20240307.022905-testing-default-smithi/7584618/remote/

Check for "ceph-mds*.gz" in both the remote directories.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setxattr client request was processed and a getxattr request was requested between early_reply and reply_client_request. I think the client inode xattrs was not updated at that time. Is it possible to guarantee that the client inode's xattrs will be updated after early_reply? Otherwise, should I just add some logic in the test case to wait for a few seconds?

2024-03-07T22:50:52.520+0000 7f8607400640  4 mds.0.server handle_client_request client_request(client.5210:27 setxattr #0x10000000000 ceph.dir.bal.mask 2024-03-07T22:50:52.519761+0000 caller_uid=1000, caller_gid=1301{6,36,1000,1301,}) v6
2024-03-07T22:50:52.521+0000 7f8607400640 10 mds.0.server setxattr 'ceph.dir.bal.mask' len 1 on [inode 0x10000000000 [...2,head] /1/ auth v4 ap=2 DIRTYPARENT f(v0 m2024-03-07T22:50:51.975767+0000 1=0+1) n(v0 rc2024-03-07T22:50:51.975767+0000 2=0+2) (isnap sync r=1) (inest lock) (ifile excl) (ixattr xlock x=1 by 0x55c05e997a80) (iversion lock w=1 last_client=5210) caps={5210=pAsLsFsx/pAsLsXsFsxcral@2},l=5210 | request=1 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 authpin=1 0x55c05e90a100]
2024-03-07T22:50:52.521+0000 7f8607400640 10 mds.0.server early_reply 0 ((0) Success) client_request(client.5210:27 setxattr #0x10000000000 ceph.dir.bal.mask 2024-03-07T22:50:52.519761+0000 caller_uid=1000, caller_gid=1301{6,36,1000,1301,}) v6


2024-03-07T22:50:52.580+0000 7f8607400640  4 mds.0.server handle_client_request client_request(client.5210:28 ??? #0x10000000000 ceph.dir.bal.mask 2024-03-07T22:50:52.578761+0000 caller_uid=1000, caller_gid=1301{6,36,1000,1301,}) v6
2024-03-07T22:50:52.580+0000 7f8607400640  7 mds.0.server reply_client_request -61 ((61) No data available) client_request(client.5210:28 ??? #0x10000000000 ceph.dir.bal.mask 2024-03-07T22:50:52.578761+0000 caller_uid=1000, caller_gid=1301{6,36,1000,1301,}) v6

inode->xattrs may be not updated.
https://github.com/ceph/ceph/blob/main/src/client/Client.cc#L13757

2024-03-07T22:50:55.379+0000 7f85ff200640  7 mds.0.server reply_client_request 0 ((0) Success) client_request(client.5210:27 setxattr #0x10000000000 ceph.dir.bal.mask 2024-03-07T22:50:52.519761+0000 caller_uid=1000, caller_gid=1301{6,36,1000,1301,}) v6

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setxattr client request was processed and a getxattr request was requested between early_reply and reply_client_request. I think the client inode xattrs was not updated at that time. Is it possible to guarantee that the client inode's xattrs will be updated after early_reply? Otherwise, should I just add some logic in the test case to wait for a few seconds?

If the client has the relevant caps, it should have the latest xattr version. I think you might have run into a race where the client thinks it has the caps and can trust the cached xattr, however, the xattr is stale.

I have seen something similar before and fixed it in a code path. See commit 526e4ee

2024-03-07T22:50:52.520+0000 7f8607400640  4 mds.0.server handle_client_request client_request(client.5210:27 setxattr #0x10000000000 ceph.dir.bal.mask 2024-03-07T22:50:52.519761+0000 caller_uid=1000, caller_gid=1301{6,36,1000,1301,}) v6
2024-03-07T22:50:52.521+0000 7f8607400640 10 mds.0.server setxattr 'ceph.dir.bal.mask' len 1 on [inode 0x10000000000 [...2,head] /1/ auth v4 ap=2 DIRTYPARENT f(v0 m2024-03-07T22:50:51.975767+0000 1=0+1) n(v0 rc2024-03-07T22:50:51.975767+0000 2=0+2) (isnap sync r=1) (inest lock) (ifile excl) (ixattr xlock x=1 by 0x55c05e997a80) (iversion lock w=1 last_client=5210) caps={5210=pAsLsFsx/pAsLsXsFsxcral@2},l=5210 | request=1 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 authpin=1 0x55c05e90a100]
2024-03-07T22:50:52.521+0000 7f8607400640 10 mds.0.server early_reply 0 ((0) Success) client_request(client.5210:27 setxattr #0x10000000000 ceph.dir.bal.mask 2024-03-07T22:50:52.519761+0000 caller_uid=1000, caller_gid=1301{6,36,1000,1301,}) v6


2024-03-07T22:50:52.580+0000 7f8607400640  4 mds.0.server handle_client_request client_request(client.5210:28 ??? #0x10000000000 ceph.dir.bal.mask 2024-03-07T22:50:52.578761+0000 caller_uid=1000, caller_gid=1301{6,36,1000,1301,}) v6
2024-03-07T22:50:52.580+0000 7f8607400640  7 mds.0.server reply_client_request -61 ((61) No data available) client_request(client.5210:28 ??? #0x10000000000 ceph.dir.bal.mask 2024-03-07T22:50:52.578761+0000 caller_uid=1000, caller_gid=1301{6,36,1000,1301,}) v6

inode->xattrs may be not updated.
https://github.com/ceph/ceph/blob/main/src/client/Client.cc#L13757

2024-03-07T22:50:55.379+0000 7f85ff200640  7 mds.0.server reply_client_request 0 ((0) Success) client_request(client.5210:27 setxattr #0x10000000000 ceph.dir.bal.mask 2024-03-07T22:50:52.519761+0000 caller_uid=1000, caller_gid=1301{6,36,1000,1301,}) v6

@vshankar
Copy link
Contributor

https://pulpito.ceph.com/?branch=wip-vshankar-testing-20240307.022905

Will review the run next week.

Jusy FYI - apart from one failures reported here, there is an unrelated test case failure due to a bug in the testing kernel.

qa/tasks/cephfs/test_exports.py Outdated Show resolved Hide resolved
qa/tasks/cephfs/cephfs_test_case.py Outdated Show resolved Hide resolved
qa/tasks/cephfs/cephfs_test_case.py Outdated Show resolved Hide resolved
qa/tasks/cephfs/cephfs_test_case.py Outdated Show resolved Hide resolved
qa/tasks/cephfs/test_exports.py Outdated Show resolved Hide resolved
qa/tasks/cephfs/test_exports.py Outdated Show resolved Hide resolved
qa/tasks/cephfs/cephfs_test_case.py Outdated Show resolved Hide resolved
qa/tasks/cephfs/cephfs_test_case.py Outdated Show resolved Hide resolved
self.assertEqual(value, None)

self.mount_a.setfattr(".", "ceph.dir.bal.mask", "0")
self._wait_subtrees([('', 0)], status=self.status, path='', exclude_path='~mds')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this test is doing that test_unset_rank_mask isn't already testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unset test may be redundant, but '.' The directory is located at rank 0, and the '1' directory is exported at rank 2. The purpose is to test whether the parent directory's export affects the '1' directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the same line. #52373 (comment)

qa/tasks/cephfs/test_exports.py Show resolved Hide resolved
@yongseokoh
Copy link
Contributor Author

@batrick @vshankar Thank you for many helpful comments. My PR wasn't perfect. It has improved a lot. There are two issues that we have to discuss more.

#52373 (comment)
#52373 (comment)

Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
That introduces the ceph.dir.bal.mask vxattr, which
is an option to rebalance a subtree within
specific active MDSs. Similar to the CPU mask,
this feature enables load balancing of specific
directories across multiple MDS ranks. It is especially
useful for fine-tuning and improving performance
in various scenarios. Previously, the bal_rank_mask
in ceph#43284 supports isolating unpinned subtrees under
the root directory ('/') to a specific MDS rank.
However, with this new option vxattr, it becomes
possible to isolate specific subdirectories to
designated MDS ranks. By introducing the
ceph.dir.bal.mask vxattr, this PR empowers
Ceph administrators with enhanced control and
flexibility for optimizing performance and
fine-tuning their deployments.

trakcer: https://tracker.ceph.com/issues/61777
Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
int r = mds->balancer->get_rank_mask_bitset(dir, rank_mask_bitset, false);
bool has_mask = false;
if (r == 0 && rank_mask_bitset.count()) {
has_mask = false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false to true

@@ -186,6 +240,7 @@ void MDBalancer::handle_export_pins(void)
*/
if (dir->get_num_head_items() > 0) {
mds->mdcache->migrator->export_dir(dir, target);
} else {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary

@yongseokoh
Copy link
Contributor Author

@batrick ping? �Could you please review this PR when you get a chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants