Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RBD] parallel PVC creation on a newly created block pool will hang #2521

Closed
Rakshith-R opened this issue Sep 20, 2021 · 22 comments
Closed

[RBD] parallel PVC creation on a newly created block pool will hang #2521

Rakshith-R opened this issue Sep 20, 2021 · 22 comments
Labels
bug Something isn't working component/rbd Issues related to RBD dependency/ceph depends on core Ceph functionality keepalive This label can be used to disable stale bot activiity in the repo Priority-0 highest priority issue

Comments

@Rakshith-R
Copy link
Contributor

Rakshith-R commented Sep 20, 2021

Describe the bug

The following bug in librbd causes parallel pvc creation request on a newly created block
pool to hang.

Concurrent rbd_pool_init() or rbd_create() operations on an unvalidated
(uninitialized) pool trigger a lockup in ValidatePoolRequest state
machine caused by blocking selfmanaged_snap_{create,remove}() calls.

Ceph issue tracker : https://tracker.ceph.com/issues/52537

Ceph pacific backport pr with fix : ceph/ceph#43113

Environment details

  • Image/version of Ceph CSI driver : v3.4.0

Steps to reproduce

  • Create new rbd block pool(with no images) + StorageClass against the CSI provisioner.
  • Create Multiple PVCs in parallel
  • Creation request will stay in pending state indefinitely

Actual results

  • Creation request will stay in pending state indefinitely

Expected behavior

  • Creation request should succeed

Updated Work Around (does not leave stale imap entries, thanks @Madhu-1 )

  • execute rbd pool init <pool_name> directly on cluster or from csi pods.
  • Restart csi rbdplugin provisioner pod
  • and PVCs will go to bound state without leaving any stale resources.

After the above steps, parallel PVC creation requests should work fine.

Work Around (Not recommended, will leave stale omap entries)

  • Delete ongoing PVC creation requests.

  • Restart csi rbdplugin provisioner pod
    Either

    • Issue a single PVC create request which will succeed.
    • or call rbd pool init <pool_name> on the ceph cluster.

After the above steps, parallel PVC creation requests should work fine.

@Rakshith-R
Copy link
Contributor Author

Hey @idryomov, a lot of users seem to be hitting this issue, would you recommend adding rbd_pool_validate = false (which does prevent this problem) in ceph.conf to workaround the issue in cephcsi ?

Will there be any side affects or reason not to do this ?

Thanks !

@idryomov
Copy link
Contributor

idryomov commented Sep 21, 2021

How many reports of this exact issue did you get so far? I see you linked #2520 but it doesn't look related at first sight.

I could be wrong but it feels like the combination of a fresh uninitialized pool and concurrent creation requests should make this rather unlikely to hit for a general user. I'm inclined to just visibly document this as a known issue and wait for the 16.2.7 release.

@Rakshith-R
Copy link
Contributor Author

How many reports of this exact issue did you get so far? I see you linked #2520 but it doesn't look related at first sight.

Quite a few are reported here.

I could be wrong but it feels like the combination of a fresh uninitialized pool and concurrent creation requests should make this rather unlikely to hit for a general user.

@idryomov
For a user using rook (>=v1.7.1), hitting this can be very easy.
Just create a BlockPool and issue more than one RBD PVC request.

I'm inclined to just visibly document this as a known issue and wait for the 16.2.7 release.

I suspect, we will get more such reports and if I am not wrong ceph: v16.2.7 will be out in two months?

Thanks!

@idryomov
Copy link
Contributor

Quite a few are reported here.

I see two reports there.

For a user using rook (>=v1.7.1), hitting this can be very easy.
Just create a BlockPool and issue more than one RBD PVC request.

This is not the first rook release based on pacific, right? The bug is present in all pacific releases (16.2.0 - 16.2.6), if we only got two reports in this time frame I'm not sure it is worth rushing in a workaround instead of just waiting for 16.2.7.

We don't have a set date for 16.2.7 so yes it could take two months but we could also expedite it.

@Rakshith-R
Copy link
Contributor Author

For a user using rook (>=v1.7.1), hitting this can be very easy.
Just create a BlockPool and issue more than one RBD PVC request.

This is not the first rook release based on pacific, right? The bug is present in all pacific releases (16.2.0 - 16.2.6), if we only got two reports in this time frame I'm not sure it is worth rushing in a workaround instead of just waiting for 16.2.7.

This is the first CephCSI release v3.4.0 based on pacific 8fafc42, picked up by rook in v1.7.1.

We don't have a set date for 16.2.7 so yes it could take two months but we could also expedite it.

okay 👍 .
Thanks ilya.

@ceph/ceph-csi-contributors I think we can go ahead with cephcsi v3.4.1 release and pick up the fix in v3.4.2 when its out.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Sep 22, 2021

@idryomov couple of questions

  • We will hit this one only on the new pools or on the existing pool also

Concurrent rbd_pool_init() or rbd_create() operations on an unvalidated (uninitialized) pool trigger a lockup in ValidatePoolRequest

  • Is there any command to check pool is uninitialized/unvalidated?

  • The issue exists on both the ceph cluster version (server) and also in the client library (librbd)?

@Rakshith-R below is the workaround you mentioned in the issue.

Delete ongoing PVC creation requests.
Restart csi rbdplugin provisioner pod
Either
Issue a single PVC create request which will succeed.
or call rbd pool init <pool_name> on the ceph cluster.

cant we just scale down the rbd provisioner pod and call rbd pool init on the ceph cluster and scale up the provisioner pod won't this workaround works?

@Rakshith-R
Copy link
Contributor Author

  • We will hit this one only on the new pools or on the existing pool also

only new pools as per my tests.

@Rakshith-R below is the workaround you mentioned in the issue.

Delete ongoing PVC creation requests.
Restart csi rbdplugin provisioner pod
Either
Issue a single PVC create request which will succeed.
or call rbd pool init <pool_name> on the ceph cluster.

cant we just scale down the rbd provisioner pod and call rbd pool init on the ceph cluster and scale up the provisioner pod won't this workaround works?

yes, that works too but need direct access to ceph cluster to execute the commands and will still leave stale omap entries.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Sep 22, 2021

  • We will hit this one only on the new pools or on the existing pool also

only new pools as per my tests.

@Rakshith-R below is the workaround you mentioned in the issue.

Delete ongoing PVC creation requests.
Restart csi rbdplugin provisioner pod
Either
Issue a single PVC create request which will succeed.
or call rbd pool init <pool_name> on the ceph cluster.

cant we just scale down the rbd provisioner pod and call rbd pool init on the ceph cluster and scale up the provisioner pod won't this workaround works?

yes, that works too but need direct access to ceph cluster to execute the commands and will still leave stale omap entries.

executing a command on the ceph cluster is not an issue and it's better than leaving the stale resources. if the pending PVC is not deleted how this will leave a stale omap?

@Rakshith-R
Copy link
Contributor Author

Rakshith-R commented Sep 22, 2021

executing a command on the ceph cluster is not an issue and it's better than leaving the stale resources. if the pending PVC is not deleted how this will leave a stale omap?

okay, tested it. It will not leave any stale resources.

  • execute rbd pool init <pool_name> directly on cluster or from csi pods.
  • delete provisioner pods

and PVCs will go to bound state without leaving any stale resources.

@idryomov
Copy link
Contributor

@idryomov couple of questions

* We will hit this one only on the new pools or on the existing pool also

Only on the new (empty) and uninitialized pools. It can't occur once even a single image is created or the pool is initialized.

Concurrent rbd_pool_init() or rbd_create() operations on an unvalidated (uninitialized) pool trigger a lockup in ValidatePoolRequest

* Is there any command to check pool is uninitialized/unvalidated?

I don't think so. One possibility is to check if the pool's application is "rbd" but that is a bit wonky -- as part of initialization we set it to "rbd" but it is possible to set the application separately.

OTOH initialization requests are more or less idempotent, so issuing them repeatedly should be safe.

* The issue exists on both the ceph cluster version (server) and also in the client library  (librbd)?

This is a librbd issue.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Sep 24, 2021

This is a librbd issue.

@idryomov Thanks for your reply. till pacific 16.2.7 is released is it good to switch back to octopus release as the issue is in librbd and doesn't exist in octopus? or as suggested earlier documenting it would be good enough?

@idryomov
Copy link
Contributor

Technically the issue exists in octopus librbd too but it is much harder to hit there and rolling back to octopus should fix it.
Up to you -- I still feel that it shouldn't affect that many users and just documenting would be good enough but I could be underestimating it.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Sep 28, 2021

Technically the issue exists in octopus librbd too but it is much harder to hit there and rolling back to octopus should fix it.
Up to you -- I still feel that it shouldn't affect that many users and just documenting would be good enough but I could be underestimating it.

Thanks, @idryomov for your opinion. Much appreciated!. Looks like whoever is trying cephcsi with new pools might hit this issue.

Most of the things are automated like pool creation, PVC creation, etc. As its much harder to hit this in octopus, Instead of documenting am more inclined towards using octopus as the base image for 3.4.1 and switch to the new 16.2.7 in cephcsi 3.4.2

@humblec @nixpanic thoughts?

@Rakshith-R
Copy link
Contributor Author

Most of the things are automated like pool creation, PVC creation, etc. As its much harder to hit this in octopus, Instead of documenting am more inclined towards using octopus as the base image for 3.4.1 and switch to the new 16.2.7 in cephcsi 3.4.2

We cannot use ceph octopus as base image since deep_copy() which is required for thick-provisioning does not work as expected in ceph octopus.
refer : #2202
#2187 (comment)

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Sep 28, 2021

Most of the things are automated like pool creation, PVC creation, etc. As its much harder to hit this in octopus, Instead of documenting am more inclined towards using octopus as the base image for 3.4.1 and switch to the new 16.2.7 in cephcsi 3.4.2

We cannot use ceph octopus as base image since deep_copy() which is required for thick-provisioning does not work as expected in ceph octopus.
refer : #2202
#2187 (comment)

:D looks like we have another problem, Do we have any other option left except documentation to make the life easier?

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the wontfix This will not be worked on label Oct 28, 2021
@github-actions
Copy link

github-actions bot commented Nov 4, 2021

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@github-actions github-actions bot closed this as completed Nov 4, 2021
@Rakshith-R Rakshith-R reopened this Nov 5, 2021
@Rakshith-R Rakshith-R added keepalive This label can be used to disable stale bot activiity in the repo and removed wontfix This will not be worked on labels Nov 5, 2021
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Jan 19, 2022

@Rakshith-R this can be closed?

@Rakshith-R
Copy link
Contributor Author

@Rakshith-R this can be closed?
Thanks madhu for reminding.

Yes, ceph 16.2.7 was released in December https://github.com/ceph/ceph/releases/tag/v16.2.7, it contains the fix and in turn cephcsi v3.5.0 will have the fix for this issue.

Closing the issue for the above reason.

@Rakshith-R
Copy link
Contributor Author

@Rakshith-R this can be closed?
Thanks madhu for reminding.

Yes, ceph 16.2.7 was released in December https://github.com/ceph/ceph/releases/tag/v16.2.7, it contains the fix and in turn cephcsi v3.5.0 will have the fix for this issue.

Closing the issue for the above reason.

dockerhub did not have the latest version of ceph, base images are now being pulled from quay.io
And release v3.5.1 should have the fix for this issue.
refer #2796

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Jan 21, 2022

@Rakshith-R this can be closed?
Thanks madhu for reminding.

Yes, ceph 16.2.7 was released in December https://github.com/ceph/ceph/releases/tag/v16.2.7, it contains the fix and in turn cephcsi v3.5.0 will have the fix for this issue.
Closing the issue for the above reason.

dockerhub did not have the latest version of ceph, base images are now being pulled from quay.io And release v3.5.1 should have the fix for this issue. refer #2796

Thanks for notifying on this one

@Madhu-1 Madhu-1 reopened this Jan 21, 2022
@Madhu-1 Madhu-1 mentioned this issue Jan 21, 2022
3 tasks
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Jan 21, 2022

closed by #2798

@Madhu-1 Madhu-1 closed this as completed Jan 21, 2022
akalenyu added a commit to akalenyu/kubevirtci that referenced this issue May 31, 2022
Seems like we're hitting parallel PVC creation bug in kubevirt/kubevirt:
kubevirt/kubevirt#7783
ceph/ceph-csi#2521

Reproduced locally:
```bash
[root@hera04 kubevirt]# cat ~/manifests/2pvcs.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ceph-block-pvc-1
spec:
  volumeMode: Block
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 3Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ceph-block-pvc-2
spec:
  volumeMode: Block
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 3Gi
[root@hera04 kubevirt]# ./cluster-up/kubectl.sh create -f ~/manifests/2pvcs.yaml
selecting docker as container runtime
persistentvolumeclaim/ceph-block-pvc-1 created
persistentvolumeclaim/ceph-block-pvc-2 created
[root@hera04 kubevirt]# ./cluster-up/kubectl.sh get pvc
selecting docker as container runtime
NAME               STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS      AGE
ceph-block-pvc-1   Pending                                      rook-ceph-block   2s
ceph-block-pvc-2   Pending                                      rook-ceph-block   2s
[root@hera04 kubevirt]# ./cluster-up/kubectl.sh get pvc
selecting docker as container runtime
NAME               STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS      AGE
ceph-block-pvc-1   Pending                                      rook-ceph-block   3s
ceph-block-pvc-2   Pending                                      rook-ceph-block   3s
[root@hera04 kubevirt]# ./cluster-up/kubectl.sh create -f ~/manifests/pvc.yaml
selecting docker as container runtime
persistentvolumeclaim/ceph-block-pvc created
[root@hera04 kubevirt]# ./cluster-up/kubectl.sh get pvc
selecting docker as container runtime
NAME               STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS      AGE
ceph-block-pvc     Pending                                      rook-ceph-block   2s
ceph-block-pvc-1   Pending                                      rook-ceph-block   66s
ceph-block-pvc-2   Pending                                      rook-ceph-block   66s
```

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
akalenyu added a commit to akalenyu/kubevirtci that referenced this issue May 31, 2022
Seems like we're hitting parallel PVC creation bug in kubevirt/kubevirt:
kubevirt/kubevirt#7783
ceph/ceph-csi#2521

Reproduced locally:
```bash
[root@hera04 kubevirt]# cat ~/manifests/2pvcs.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ceph-block-pvc-1
spec:
  volumeMode: Block
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 3Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ceph-block-pvc-2
spec:
  volumeMode: Block
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 3Gi
[root@hera04 kubevirt]# ./cluster-up/kubectl.sh create -f ~/manifests/2pvcs.yaml
selecting docker as container runtime
persistentvolumeclaim/ceph-block-pvc-1 created
persistentvolumeclaim/ceph-block-pvc-2 created
[root@hera04 kubevirt]# ./cluster-up/kubectl.sh get pvc
selecting docker as container runtime
NAME               STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS      AGE
ceph-block-pvc-1   Pending                                      rook-ceph-block   2s
ceph-block-pvc-2   Pending                                      rook-ceph-block   2s
[root@hera04 kubevirt]# ./cluster-up/kubectl.sh get pvc
selecting docker as container runtime
NAME               STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS      AGE
ceph-block-pvc-1   Pending                                      rook-ceph-block   3s
ceph-block-pvc-2   Pending                                      rook-ceph-block   3s
[root@hera04 kubevirt]# ./cluster-up/kubectl.sh create -f ~/manifests/pvc.yaml
selecting docker as container runtime
persistentvolumeclaim/ceph-block-pvc created
[root@hera04 kubevirt]# ./cluster-up/kubectl.sh get pvc
selecting docker as container runtime
NAME               STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS      AGE
ceph-block-pvc     Pending                                      rook-ceph-block   2s
ceph-block-pvc-1   Pending                                      rook-ceph-block   66s
ceph-block-pvc-2   Pending                                      rook-ceph-block   66s
```

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
kubevirt-bot pushed a commit to kubevirt/kubevirtci that referenced this issue May 31, 2022
Seems like we're hitting parallel PVC creation bug in kubevirt/kubevirt:
kubevirt/kubevirt#7783
ceph/ceph-csi#2521

Reproduced locally:
```bash
[root@hera04 kubevirt]# cat ~/manifests/2pvcs.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ceph-block-pvc-1
spec:
  volumeMode: Block
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 3Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ceph-block-pvc-2
spec:
  volumeMode: Block
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 3Gi
[root@hera04 kubevirt]# ./cluster-up/kubectl.sh create -f ~/manifests/2pvcs.yaml
selecting docker as container runtime
persistentvolumeclaim/ceph-block-pvc-1 created
persistentvolumeclaim/ceph-block-pvc-2 created
[root@hera04 kubevirt]# ./cluster-up/kubectl.sh get pvc
selecting docker as container runtime
NAME               STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS      AGE
ceph-block-pvc-1   Pending                                      rook-ceph-block   2s
ceph-block-pvc-2   Pending                                      rook-ceph-block   2s
[root@hera04 kubevirt]# ./cluster-up/kubectl.sh get pvc
selecting docker as container runtime
NAME               STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS      AGE
ceph-block-pvc-1   Pending                                      rook-ceph-block   3s
ceph-block-pvc-2   Pending                                      rook-ceph-block   3s
[root@hera04 kubevirt]# ./cluster-up/kubectl.sh create -f ~/manifests/pvc.yaml
selecting docker as container runtime
persistentvolumeclaim/ceph-block-pvc created
[root@hera04 kubevirt]# ./cluster-up/kubectl.sh get pvc
selecting docker as container runtime
NAME               STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS      AGE
ceph-block-pvc     Pending                                      rook-ceph-block   2s
ceph-block-pvc-1   Pending                                      rook-ceph-block   66s
ceph-block-pvc-2   Pending                                      rook-ceph-block   66s
```

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/rbd Issues related to RBD dependency/ceph depends on core Ceph functionality keepalive This label can be used to disable stale bot activiity in the repo Priority-0 highest priority issue
Projects
None yet
Development

No branches or pull requests

3 participants