New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mds: [TRACKER-58216] quota.max_files check when prepare_new_inode #49326
mds: [TRACKER-58216] quota.max_files check when prepare_new_inode #49326
Conversation
Signed-off-by: Jinmyeong Lee <jinmyeong.lee@linecorp.com>
09a9250
to
c087e2e
Compare
jenkins test make check |
1 similar comment
jenkins test make check |
The MDS should broadcast the configured quotas to clients where the quota would be enforced which could lag a bit and that's expected. Does a |
@vshankar Hello, But when copying a large directory having more files than the quota to the mount point that does not hit the quota limit yet, the lag you mentioned allows it to make more files in the mount point. I want to know why only the client checks the quota, not MDS. Was there any background? |
Which client is this and what version? |
@vshankar We are using ceph-fuse mount and Nautilus |
What are the auth caps for clients? Quota is not enforced when the client has restrictive access to a specific path (e.g. /home/foo) and quota is configured on an ancestor directory (/home). |
Having the auth caps, and we are using the bind mount.
|
Could you share the output of:
|
I am using the bind mount, so the real mountpoint is And when just creating files simply with But If you want, I can test this issue again with the naive ceph-fuse mount (not bind mount) with client.admin keyring (before patching and after, both). |
@vshankar And I want to know why the community had decided the client only enforces the quota checking even having some delay which is expected. |
Ugh! Have you tried this with one of the recent releases (pacific/quincy)?
Since quotas are enforced when using |
No, I didn't. I should prepare my new test cluster with centos8 kind of servers to test with pacific. Do you think this can be related to any patch in the recent version? |
Basically, quotas were introduced in mimic release, so I presume it has stabilized over the years. As far as your question on why quotas are enforced on the client - the mds ensures that clients have a view of the quota realm and can reliably enforce quotas (obviously, with some lag). |
@jinmyeonglee Do you have any file system wide config set? Could you share |
|
Nothing stands out as unusual (suspected inline data being used). I recommend trying quincy (or pacific) to see if you can reproduce it. |
Thanks for checking, I will share the test result with pacific as soon as possible. |
@vshankar Hello, I tested with Pacific (16.2.9), and checked the same issue in this release version. pacific-release branch
pacific-release + MDS Quota Limit
|
if (cur->inode->get_projected_inode()->quota.is_enable()) { | ||
return cur; | ||
} | ||
cur = cur->get_parent_dir(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vshankar Additionally, I confirmed the cur can be null in this line, so I am implementing to fix this issue.
I was on year end PTO. Sorry for the delay in reply. If this is seen in pacific, then I guess you might be hitting a bug somewhat. As far as this change is concerned, the underlying issue is that the client is (for some reason) not enforcing quota which it really should albeit after some delay. Do you have the debug client logs to look at? |
Well, As you said in the early comment, the delay is allowed and file creation more than the quota is expected. But I want to suggest "how about limiting the quota more strictly?". So I added the quota checking logic in MDS. I think this is quite easy to reproduce, so I guess you could check the same issue with the naive version MDS. 😉 Anyway, I will share the raw client-side logs with a file. |
result from |
Thanks for sharing the log. |
@vshankar Hello, I do not want to rush you, but is there any update or discovery after checking the logs? |
@jinmyeonglee @mchangir mentioned that the MDS broadcasts quota (tree) information to all clients except the client doing the quota xattr update (since the client would have the relevant caps anyway). There is probably a bug in the client that is not recording the updated quota setting (max_files in this case) causing the client to not enforce the limits. |
There's a With my tests, if the So, there doesn't seem to be any bug on the client side as far as I can tell. |
@jinmyeonglee provided client logs here - #49326 (comment) Would be interesting to see what's going on there... |
@jinmyeonglee @mchangir from the client log @jinmyeonglee shared:
-EDQUOT is returned for file_193 under big_dir. |
@jinmyeonglee are you using Nautilus 14.2.19 for both: fuse-client as well as server ? because I can see just want to double confirm the versions |
We are using Nautilus in our service, but vshankar requested me to test in Pacific(or Quincy), so I gave the logs from Pacific Cluster & Client. I think there is a little misunderstanding.
I did not say the delay is a bug. (I mentioned it is not a bug in here: #49326 (comment)) As the official documents(https://docs.ceph.com/en/latest/cephfs/quota/), a few delays are expected and allowed.
So, I wanted to suggest "how about limiting the quota more strictly?". So I added the quota checking logic in MDS. I will test the Anyway, it will be good if it could enforce the quota more strictly. |
@jinmyeonglee quick question - you did see EDQUOT errno when creating files although after a delay? |
yes, after a delay, I got the EDQUOT. (#49326 (comment)) My question was about why MDS allows the delay and allows creating more files than the quota, and suggested checking the quota on the MDS-side. |
The tracker you raised does not mention that - https://tracker.ceph.com/issues/58216 (no mention that EDQUOT is seen)
This is as per design as quota is enforced by the client. I don't thoroughly recall why this path was chosen - it was related to quota trees IIRC. |
Well, first of all, if I make you confused, I am really sorry. As the very early comment, I did not say it is a bug or wrong behavior.
For my above question, you did not answer me, and now you are saying you cannot recall the whole history. |
Its a misunderstanding then - not anybody's fault. To me it seemed like no matter how many files you create, the quota is not enforced. But since that's not the case, its not a bug as per the design.
That's because I was not involved with CephFS development when quota was designed and implemented. I could only check the commit history, mail archives or ask around to other devs who were involved. But since my interpretation of this issue was that quotas are not working at all, my priority was to see if bugs are lurking around which needs immediate attention.
Check out this thread - https://www.spinics.net/lists/ceph-devel/msg39432.html Basically, the server-enforced quota restriction was thought about, but given that such a design would benefit from the client having to do the (quota tree) checks anyway. |
Looking the the old discussion thread about quota design and implementation, it looks like the Client based quota enforcement was chosen due to its distributed nature which helps reduce contention and load at the server (MDS) side. @jinmyeonglee Did |
@mchangir Thanks for your suggestion.
I will benchmark and see the result. (I will use fio and smallfile tools.) |
@jinmyeonglee I guess this change can be closed? If you need to continue the discussion, then please use the tracker or the mailing list. |
@jinmyeonglee ping? |
@vshankar Oh, sorry for the late response. |
https://tracker.ceph.com/issues/58216
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows