Allow volume ownership to be only set after fs formatting. #69699

msau42 · 2018-10-11T21:04:49Z

Is this a BUG REPORT or FEATURE REQUEST?:
@kubernetes/sig-storage-feature-requests

What happened:
Today, the fsgroup setting is recursively set on every mount. This can make mount very slow if the volume has many files. Most of the time, a pod using a volume should use the same fsgroup everytime and not need to change it across multiple pods.

I'm proposing that we add a flag that will only apply the fsgroup right after initial fs formatting and not on every mount.

krmayankk · 2018-10-14T07:43:28Z

doing recursive group ownership for all files every mount time is excessive. Where are you proposing the flag live ?

msau42 · 2018-10-15T18:13:29Z

I haven't thought about exactly where in the API we can add it. Maybe in the same place where fsgroup is specified. There are also other issues around volume ownership that may impact this also, and it can benefit from someone taking full ownership of this area to investigate a complete solution.

Ref other ownership issues: #2630, #57923

jsafrane · 2018-10-17T11:20:32Z

cc @tsmetana @gnufied

To add even more confusion and problems, SELinux labels are applied by a container runtime. CRI does not allow any option to skip it. All files on a volume are labelled by the container runtime every time a container is started. It would be worth adding an option to skip this step somehow.

tsmetana · 2018-10-18T07:56:41Z

@jsafrane yes... I've tried to propose something similar in the past but it was turned down since it was not a complete solution (SELinux...).

I have only vague knowledge of CRI, but quickly scanning the API shows some SELinux boolean in the Mount message: https://github.com/kubernetes/kubernetes/blob/release-1.5/pkg/kubelet/api/v1alpha1/runtime/api.proto#L139

I will try to find out how is this being used. Perhaps there might be a way to turn off also the recursive re-labelling using the existing APIs.

dimm0 · 2018-11-02T19:58:47Z

Having issues mounting a filesystem with a bunch of files in it - taking 14 minutes to start a container. Would love to see an option to disable recursive chown.

fejta-bot · 2019-01-31T21:02:43Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

gnufied · 2019-01-31T21:04:58Z

/remove-lifecycle-stale

dvlato · 2019-02-11T15:54:11Z

Hi,

We also hit the problem of the recursive chown when we set fsGroup. In our case, I don't think we need to recursively change the file permissions as the files are created by the pod (in ReadWriteOnce mode) so we only need to change the permission for the root folder. Would that be a better fit than trying to find a place where to change the owner only once?

Additionally, is there a way to diagnose this issue? Now we have removed the fsGroup we get timeouts in a slightly different place (between container created and started), it seems to be a similar issue but we cannot find any indication of the causes (afaik we don't use SELinux at all so it shouldn't be selinux relabeling); how can we confirm that it is a chmod/chown that is causing the problem? If you knew how to verify whether selinux relabeling is happening, I would also be most thankful.

gnufied · 2019-02-11T16:34:16Z

There is a metric for volume mount operation. This metric should tell you if it is indeed mount operation(which includes chmod,chmod) that is taking time or something else.

Whether selinux relabelling is playing a role in this or not can be verified by checking if selinux is enabled on the node (getenforce) and then depending on type of volume you are using. What kind of volume you are using? Is that a block storage volume type?

dvlato · 2019-02-11T17:35:02Z

I know it was mount operation as the error message says 'Unable to mount'... However I don't know if it's chmoding, chowning or doing something different.

The volume is an EBS volume, so yes, block storage. I was expecting something in Kube logs or the maybe the API to give me the information as I cannot ssh into the nodes (as most devs I assume, and I'd love to find that information myself rather than finding someone to get it for me who has time the moment I have time...). Also I don't feel that 'SELinux is enabled so it must be that' is the best troubleshooting technique. Is the "getenforce" the best/easiest way to find out if it could be SELinux? Of course, thank you for the information, and it will serve me if there's nothing else, but I was expecting I could get more info from Kube (and I think it would be interesting to improve traceability in this regard otherwise).

dvlato · 2019-02-11T17:52:24Z

Anyway thank you very much for the super prompt response!

dimm0 · 2019-02-11T19:42:37Z

selinux is disabled on all our nodes. It's not doing chown anymore in the recent rook version, but it's clearly doing something else still, since pods are taking longer time to run under non-root user compared to root user.

dvlato · 2019-02-12T16:21:04Z

In my case, they are taking a long time even when running under the root user with no fsGroup... It's clearly related to the files inside the EBS volume as the containers that mount it get stuck between "Created container" and "Started container" only if there are files (around 6 Million in this case) in the volume.

fejta-bot · 2019-03-14T16:57:14Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

dvlato · 2019-03-14T17:10:08Z

Has anyone else experienced the same problem?

gnufied · 2019-03-14T17:38:02Z

Do you have selinux enabled?

msau42 · 2019-04-04T18:45:21Z

/remove-lifecycle rotten

dimm0 · 2019-04-04T18:54:03Z

I'm having lots of issues with chown and selinux relabelling in rook ceph volumes, and patiently waiting for a fix from k8s...

dimm0 · 2019-04-04T18:55:12Z

Users who are running jupyter (containers with non-root users) and having 100K+ files in their volumes are unable to run, because jupyter times out before k8s is done chowning every single file

fejta-bot · 2019-07-03T19:10:11Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

azalio · 2019-07-18T21:44:03Z

/remove-lifecycle stale

unixfox · 2021-04-04T16:50:42Z

/remove-lifecycle stale

dimm0 · 2021-05-15T00:57:36Z

This still doesn't work in k8s 1.20.5. I tried setting fsGroupChangePolicy: "OnRootMismatch" and enabling featureGates in apiserver/controller/kubelet (even though those should be already enabled by default)

Anything else I could try?

gnufied · 2021-05-17T19:00:05Z

@dimm0 can you provide more details ? Do you see volume being recursively chown/chmod'ed on each mount even if you set fsGroupChangePolicy: "OnRootMismatch" ? Can you post your pod spec and logs from kubelet? The OnRootMismatch policy is not designed to fix first time recursive chown of volumes, it only guarantees no recursive chown will be performed if fsgroup already matches.

Apart from our own testing, other folks have confirmed OnRootMismatch to work - longhorn/longhorn#2131 (comment)

dimm0 · 2021-05-17T20:03:06Z

Do you see volume being recursively chown/chmod'ed on each mount even if you set fsGroupChangePolicy: "OnRootMismatch"

Yes. I'm running jupyterhub with non-root account, and users are having issues starting containers with volumes having many (>400K) files. Also I verified that if I set the fsGroup in pod's securityContext, files get chowned. I tried starting the pod several time with the same volume and manually chowning files inside the volume between restarts.

Ok, I'll post logs

My pod:

apiVersion: v1
kind: Pod
metadata:
  name: vol-pod
spec:
  containers:
  - name: vol-container
    image: busybox
    command: ["sleep", "infinity"]
    volumeMounts:
    - mountPath: "/data"
      name: vol
  securityContext:
    fsGroup: 100
    fsGroupChangePolicy: "OnRootMismatch"
  volumes:
  - name: vol
    persistentVolumeClaim:
      claimName: examplevol

dimm0 · 2021-05-17T20:30:12Z

kubelet.log

vol-pod2 pod

gnufied · 2021-05-17T23:38:46Z

@dimm0 so what is the bug? Is recursive permissions not being skipped as expected? You appear to be using flexvolume version of rook plugin - not sure if that could be the reason(should not be). But it is very hard to tell based on attached logs.

Next steps:

Can you try and create a minimal working example and verify mount times by creating lots of files in the volume and see if using fsGroupChangePolicy makes a difference? You can compare the timings using mount metrics. There is also volume_fsgroup_recursive_apply which should not be emitted if recursive permissions are being skipped.
See if flexvolume driver supports fsgroup.

dimm0 · 2021-05-18T00:05:56Z

But it is very hard to tell based on attached logs.

Exactly. How can I debug this?

There's CSI plugin, but some (older) volumes are still flexVolumes. The volume in question is CSI.

I tried creating just a single file in the volume, chown it to 22:22, then kill the pod and create again. The file inside is 22:100 after pod start.

I can give you access to the namespace if you want.. I have a minimal example above. Can send configs.

Is chowning shown in logs anywhere?

dimm0 · 2021-05-18T00:06:44Z

Is it something to do with the CSI driver, or it's fully on kubernetes side? Should I bug rook guys about this?

gnufied · 2021-05-18T14:28:59Z

But it is very hard to tell based on attached logs.

Exactly. How can I debug this?

There's CSI plugin, but some (older) volumes are still flexVolumes. The volume in question is CSI.

I tried creating just a single file in the volume, chown it to 22:22, then kill the pod and create again. The file inside is 22:100 after pod start.

I can give you access to the namespace if you want.. I have a minimal example above. Can send configs.

Is chowning shown in logs anywhere?

if you increase the log level of KCM to 3, you should be able to see following message whenever applicable:

"klog.V(3).InfoS("Skipping permission and ownership change for volume", "path", mounter.GetPath())"

oxr463 · 2021-06-03T19:55:29Z

Is there a big performance impact from this? e.g., increased memory usage on deployments.

gnufied · 2021-06-03T19:56:44Z

No. It should not result in any difference in memory usage.

k8s-triage-robot · 2021-09-01T20:19:13Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

m-yosefpor · 2021-09-01T21:07:58Z

/remove-lifecycle stale

msau42 · 2021-09-01T23:29:36Z

kubernetes/enhancements#695 is beta in 1.20 and targeting GA in 1.23. I think we can close this issue and track progress through the enhancement issue.
/close

k8s-ci-robot · 2021-09-01T23:29:49Z

@msau42: Closing this issue.

In response to this:

kubernetes/enhancements#695 is beta in 1.20 and targeting GA in 1.23. I think we can close this issue and track progress through the enhancement issue.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

m-yosefpor · 2021-09-02T01:42:57Z

@msau42 thanks for your clarification. However kubernetes/enhancements#695 only fixes kubelet recursive walk, but the recursive chcon applied by container runtime is still an issue (see #69699 (comment)). As @jsafrane also mentioned, we need some sort of way to tell runtime not to apply those chcon (via a change in CRI or some other method).

gnufied · 2021-09-02T01:50:55Z

Yes ideally selinux work will be tracked via - kubernetes/enhancements#1710. The selinux enhancement is being postponed because of lack of time/contributors.

JJwangbilin · 2023-09-28T09:10:18Z

已测试验证，可解决问题：
在pod.securityContext.fsGroupChangePolicy设置为“OnRootMismatch”即可。
（1）第一次挂载可能会触发chown -R的操作，是为了保证挂载点根目录的group和pod.securityContext.fsGroup一致；
（2）后续挂载均会跳过chown，因为挂载点根目录的group和pod.securityContext.fsGroup已经一致（只关心挂载点根目录，不会管存储里面的内容）；
（3）OnRootMismatch含义为：根目录group信息一致则跳过chown，不是默认参数，需显式的去设置；
（4）此参数从k8s 1.20开始支持；

aiici · 2024-03-06T03:26:18Z

I came across it in the v1.18.1 version, and if I can, I suggest upgrading to v1.18.20 to fix the issue

k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. kind/feature Categorizes issue or PR as related to a new feature. labels Oct 11, 2018

dimm0 mentioned this issue Nov 2, 2018

Container mounting CephFS with fsGroup stuck in ContainerCreating rook/rook#2254

Closed

gnufied mentioned this issue Jan 30, 2019

Skip Volume Ownership Change KEP kubernetes/enhancements#696

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 31, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 14, 2019

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 4, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 3, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 4, 2021

whaifang mentioned this issue Jul 16, 2021

Fail to add HA masters in kubespher v3.0.0 kubesphere/kubesphere#4066

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 1, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 1, 2021

k8s-ci-robot closed this as completed Sep 1, 2021

dharapvj mentioned this issue Dec 20, 2021

minio pod cannot start in large installation kubermatic/mla#76

Closed

in-journey mentioned this issue Feb 23, 2022

Make pod securityContext optional. atlassian/data-center-helm-charts#389

Merged

dharapvj mentioned this issue Mar 11, 2022

add filesystem group change policy for large minio deployments minio/minio#14528

Merged

7 tasks

pwschuurman mentioned this issue Dec 1, 2022

Emit events to PVC for attachdetach detach workflow #114219

Closed

alexkorotysh mentioned this issue Sep 4, 2023

Unable to attach or mount volumes: unmounted volumes=[zammad-var] zammad/zammad-helm#209

Closed

Preisschild mentioned this issue Nov 16, 2023

add support for securityContext in postgres containers cloudnative-pg/cloudnative-pg#2821

Open

Allow volume ownership to be only set after fs formatting. #69699

Allow volume ownership to be only set after fs formatting. #69699

Comments

msau42 commented Oct 11, 2018

krmayankk commented Oct 14, 2018

msau42 commented Oct 15, 2018

jsafrane commented Oct 17, 2018

tsmetana commented Oct 18, 2018

dimm0 commented Nov 2, 2018

fejta-bot commented Jan 31, 2019

gnufied commented Jan 31, 2019

dvlato commented Feb 11, 2019

gnufied commented Feb 11, 2019

dvlato commented Feb 11, 2019

dvlato commented Feb 11, 2019

dimm0 commented Feb 11, 2019

dvlato commented Feb 12, 2019

fejta-bot commented Mar 14, 2019

dvlato commented Mar 14, 2019

gnufied commented Mar 14, 2019

msau42 commented Apr 4, 2019

dimm0 commented Apr 4, 2019

dimm0 commented Apr 4, 2019

fejta-bot commented Jul 3, 2019

azalio commented Jul 18, 2019

unixfox commented Apr 4, 2021

dimm0 commented May 15, 2021

gnufied commented May 17, 2021 • edited

dimm0 commented May 17, 2021

dimm0 commented May 17, 2021

gnufied commented May 17, 2021

dimm0 commented May 18, 2021

dimm0 commented May 18, 2021

gnufied commented May 18, 2021

oxr463 commented Jun 3, 2021

gnufied commented Jun 3, 2021

k8s-triage-robot commented Sep 1, 2021

m-yosefpor commented Sep 1, 2021

msau42 commented Sep 1, 2021

k8s-ci-robot commented Sep 1, 2021

m-yosefpor commented Sep 2, 2021

gnufied commented Sep 2, 2021

JJwangbilin commented Sep 28, 2023

aiici commented Mar 6, 2024

gnufied commented May 17, 2021 •

edited