New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow volume ownership to be only set after fs formatting. #69699
Comments
doing recursive group ownership for all files every mount time is excessive. Where are you proposing the flag live ? |
I haven't thought about exactly where in the API we can add it. Maybe in the same place where fsgroup is specified. There are also other issues around volume ownership that may impact this also, and it can benefit from someone taking full ownership of this area to investigate a complete solution. |
To add even more confusion and problems, SELinux labels are applied by a container runtime. CRI does not allow any option to skip it. All files on a volume are labelled by the container runtime every time a container is started. It would be worth adding an option to skip this step somehow. |
@jsafrane yes... I've tried to propose something similar in the past but it was turned down since it was not a complete solution (SELinux...). I have only vague knowledge of CRI, but quickly scanning the API shows some SELinux boolean in the Mount message: https://github.com/kubernetes/kubernetes/blob/release-1.5/pkg/kubelet/api/v1alpha1/runtime/api.proto#L139 I will try to find out how is this being used. Perhaps there might be a way to turn off also the recursive re-labelling using the existing APIs. |
Having issues mounting a filesystem with a bunch of files in it - taking 14 minutes to start a container. Would love to see an option to disable recursive chown. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle-stale |
Hi, We also hit the problem of the recursive chown when we set fsGroup. In our case, I don't think we need to recursively change the file permissions as the files are created by the pod (in ReadWriteOnce mode) so we only need to change the permission for the root folder. Would that be a better fit than trying to find a place where to change the owner only once? Additionally, is there a way to diagnose this issue? Now we have removed the fsGroup we get timeouts in a slightly different place (between container created and started), it seems to be a similar issue but we cannot find any indication of the causes (afaik we don't use SELinux at all so it shouldn't be selinux relabeling); how can we confirm that it is a chmod/chown that is causing the problem? If you knew how to verify whether selinux relabeling is happening, I would also be most thankful. |
There is a metric for volume mount operation. This metric should tell you if it is indeed mount operation(which includes chmod,chmod) that is taking time or something else. Whether selinux relabelling is playing a role in this or not can be verified by checking if selinux is enabled on the node ( |
I know it was mount operation as the error message says 'Unable to mount'... However I don't know if it's chmoding, chowning or doing something different. The volume is an EBS volume, so yes, block storage. I was expecting something in Kube logs or the maybe the API to give me the information as I cannot ssh into the nodes (as most devs I assume, and I'd love to find that information myself rather than finding someone to get it for me who has time the moment I have time...). Also I don't feel that 'SELinux is enabled so it must be that' is the best troubleshooting technique. Is the "getenforce" the best/easiest way to find out if it could be SELinux? Of course, thank you for the information, and it will serve me if there's nothing else, but I was expecting I could get more info from Kube (and I think it would be interesting to improve traceability in this regard otherwise). |
Anyway thank you very much for the super prompt response! |
selinux is disabled on all our nodes. It's not doing chown anymore in the recent rook version, but it's clearly doing something else still, since pods are taking longer time to run under non-root user compared to root user. |
In my case, they are taking a long time even when running under the root user with no fsGroup... It's clearly related to the files inside the EBS volume as the containers that mount it get stuck between "Created container" and "Started container" only if there are files (around 6 Million in this case) in the volume. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Has anyone else experienced the same problem? |
Do you have selinux enabled? |
/remove-lifecycle rotten |
I'm having lots of issues with chown and selinux relabelling in rook ceph volumes, and patiently waiting for a fix from k8s... |
Users who are running jupyter (containers with non-root users) and having 100K+ files in their volumes are unable to run, because jupyter times out before k8s is done chowning every single file |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
/remove-lifecycle stale |
This still doesn't work in k8s 1.20.5. I tried setting Anything else I could try? |
@dimm0 can you provide more details ? Do you see volume being recursively chown/chmod'ed on each mount even if you set Apart from our own testing, other folks have confirmed |
Yes. I'm running jupyterhub with non-root account, and users are having issues starting containers with volumes having many (>400K) files. Also I verified that if I set the fsGroup in pod's securityContext, files get chowned. I tried starting the pod several time with the same volume and manually chowning files inside the volume between restarts. Ok, I'll post logs My pod:
|
vol-pod2 pod |
@dimm0 so what is the bug? Is recursive permissions not being skipped as expected? You appear to be using flexvolume version of rook plugin - not sure if that could be the reason(should not be). But it is very hard to tell based on attached logs. Next steps:
|
Exactly. How can I debug this? There's CSI plugin, but some (older) volumes are still flexVolumes. The volume in question is CSI. I tried creating just a single file in the volume, chown it to 22:22, then kill the pod and create again. The file inside is 22:100 after pod start. I can give you access to the namespace if you want.. I have a minimal example above. Can send configs. Is chowning shown in logs anywhere? |
Is it something to do with the CSI driver, or it's fully on kubernetes side? Should I bug rook guys about this? |
if you increase the log level of KCM to 3, you should be able to see following message whenever applicable: "klog.V(3).InfoS("Skipping permission and ownership change for volume", "path", mounter.GetPath())" |
Is there a big performance impact from this? e.g., increased memory usage on deployments. |
No. It should not result in any difference in memory usage. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
kubernetes/enhancements#695 is beta in 1.20 and targeting GA in 1.23. I think we can close this issue and track progress through the enhancement issue. |
@msau42: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@msau42 thanks for your clarification. However kubernetes/enhancements#695 only fixes kubelet recursive walk, but the recursive |
Yes ideally selinux work will be tracked via - kubernetes/enhancements#1710. The selinux enhancement is being postponed because of lack of time/contributors. |
已测试验证,可解决问题: |
I came across it in the v1.18.1 version, and if I can, I suggest upgrading to v1.18.20 to fix the issue |
Is this a BUG REPORT or FEATURE REQUEST?:
@kubernetes/sig-storage-feature-requests
What happened:
Today, the fsgroup setting is recursively set on every mount. This can make mount very slow if the volume has many files. Most of the time, a pod using a volume should use the same fsgroup everytime and not need to change it across multiple pods.
I'm proposing that we add a flag that will only apply the fsgroup right after initial fs formatting and not on every mount.
The text was updated successfully, but these errors were encountered: