New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limits for CephFS snapshots #1133
Comments
|
@joscollin @batrick Request what the current CephFS snapshot limits would be. Also, if CSI should limit outstanding clone operations in any manner. |
|
@ShyamsundarR
The |
@ShyamsundarR there is no flattening available as a command for CephFS. Also, the maximum number of snapshots can be configured by below Param. The default value is 100.
|
So total cap for all subvolumes (at the fs layer) is at 400, this seems to be clarified in tracker as well.
If the limits are per filesystem, the best manner for CSI to prevent it is to get an error and process it as above. Thanks. |
Yes, this is noted in the issue, and we should look at errors like RESOURCE_EXHAUSTED to be returned at the CSI layer.
I believe this is a further cap at the per directory level, with the overall limit at 400 per filesystem. |
|
I'm not sure if that 400 snapshot limit applies only to the mount and what it can see. If we have a mount like:
then it will only see snapshots of the @ukernel, can you tell me if that helps with avoiding the 400 snapshot per-file-system limit? |
|
I tested the following and found no errors from the
I did not find the setting None of the settings are modified otherwise, but it was a Rook deployed CephFS instance, so unsure what other setting are changed by default. Ceph version used was 14.2.9 |
the problem is inodes have multiple links, these inodes are in a dummy snaprealm, which contains all snapshots in the filesystem. For cephfs volume, we only create snapshots at volume root. we can disable the special handling for inodes with multiple links. If the special handling is disabled, that can help avoiding the 400 snapshot per-file-system limit |
You mean the "volumes" mgr plugin? We're creating snapshots on each subvolume directory, e.g.
How do we disable this special handling and what are the side-effects for hardlinks? |
yes
need to small patch to disable it. the side effect is: if there are hardlinks (to the same inode) across multiple subvolumes, snapshots have no effect for remote links |
Added a tracker for the above patch: https://tracker.ceph.com/issues/46074 |
as we cannot have more than 400 active snapshots on a single subvolume due to the kernel limitation we need to restrict the users creating more snapshots on a single subvolume during CreateSnapshot fixes ceph#1133 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
|
@ShyamsundarR when i tried to create more than 100 snapshots on a subvolume it stated failing with octopus, even max snapshots I can create is 100 @ShyamsundarR @kotreshhr anything am missing here? |
|
@Madhu-1 the 100 snapshot limit is coming from |
The traceback is during the handling of EMLINK error thrown by cephfs for exceeding per directory snapshot limit. Following is the actual error. The traceback should not have occurred. It should just returned the error message. This issue [1] is fixed by @ajarr in master and backported to natuilus. Octopus backport is still pending. |
Thats great..so to summarize, the limit is imposed by above mentioned configuration ( |
The above are snapshots per-directory limit which CSI doesn't care about. cephcsi need to worry about the kernel limit (which can cause issues in mounting a subvolume which is having 400+ snapshots) |
as we cannot have more than 400 active snapshots on a single subvolume due to the kernel limitation we need to restrict the users creating more snapshots on a single subvolume during CreateSnapshot fixes ceph#1133 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Retested this today, at 100 it errors out (or throws a traceback as per the version in use). This limit handling should hence not need any CSI changes, as the call to CreateSnapshot would error out at these limits. |
|
Considering this has been already addressed and looks like we dont need any other adjsutements in CSI code. I am closing this for now and please feel free to reopen if required. Thanks @ShyamsundarR |
The limit is in krbd (and kernel CephFS) since they only allocate 1 4KiB page to handle all the snapshot ids for an image / file.
The snapshot limit only counts for the image where the snapshot actually exists -- it does not apply to the total number of snapshots in the entire grandparent-parent-child hierarchy.
Originally posted by @dillaman in #1098 (comment)
Teeing off from the above comment, we would need to handle this for CephFS as well.
As we do not have any flatten or related operations to reduce the snapshots for a given subvolume, the determined maximum number of snapshots for CephFS would be a hard limit, post which we would need to return RESOURCE_EXHAUSTED errors till some older snapshots are deleted for CreateSnapshot calls.
Cloning volumes from snapshots should not be a concern, as that is a full copy of the source volume. Thus, limits to the same will not apply.
There maybe a limit on how many clone operations are in flight though, for resource consumption reasons.
The text was updated successfully, but these errors were encountered: