Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cephfs: handle cephfs clone limit error #4276

Merged
merged 1 commit into from Nov 24, 2023
Merged

Conversation

karthik-us
Copy link
Collaborator

This is to pre-emptively add check for EAGAIN error returned from ceph as part of ceph/ceph#52670 if all the clone threads are busy and return csi compatible error.

Fixes: #3996

@mergify mergify bot added the component/cephfs Issues related to CephFS label Nov 22, 2023
@karthik-us
Copy link
Collaborator Author

@Rakshith-R @yati1998 @nixpanic PTAL.

@riya-singhal31
Copy link
Contributor

riya-singhal31 commented Nov 22, 2023

Can we add e2e test for this?

@karthik-us
Copy link
Collaborator Author

Can we add e2e test for this?

The ceph PR is still not merged under main or any release branch. I don't think we can test it with an e2e at the moment.

riya-singhal31
riya-singhal31 previously approved these changes Nov 22, 2023
Copy link
Contributor

@riya-singhal31 riya-singhal31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@yati1998 yati1998 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but leaving the approval for Niels or Rakshith .

@@ -134,6 +135,9 @@ func (cs *ControllerServer) createBackingVolumeFromSnapshotSource(
})
if err != nil {
log.ErrorLog(ctx, "failed to create clone from snapshot %s: %v", sID.FsSnapshotName, err)
if errors.Is(err, syscall.EAGAIN) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for the syscall error is not very clean, and is potentially not correct. It would be much better to check for a valid go-ceph error.

Can you explain how you verified that checking against syscall.EAGAIN is correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently go-ceph doesn't have a compatible error code for EAGAIN to check for. Since we get EAGAIN from ceph with the proposed change and that will be formated by go-ceph and sent back to csi, checking it against syscall.EAGAIN should work for the time being.

I will add a TODO note for implementing EAGAIN error handle in go-ceph and use that once available.

@nixpanic
Copy link
Member

@Mergifyio rebase

Copy link
Contributor

mergify bot commented Nov 23, 2023

rebase

✅ Branch has been successfully rebased

@nixpanic
Copy link
Member

@Mergifyio queue

Copy link
Contributor

mergify bot commented Nov 23, 2023

queue

🛑 The pull request has been removed from the queue default

The pull request can't be updated.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

@Rakshith-R
Copy link
Contributor

@Mergifyio rebase

This is to pre-emptively add check for EAGAIN error returned from
ceph as part of ceph/ceph#52670 if all the
clone threads are busy and return csi compatible error.

Fixes: ceph#3996
Signed-off-by: karthik-us <ksubrahm@redhat.com>
Copy link
Contributor

mergify bot commented Nov 24, 2023

rebase

✅ Branch has been successfully rebased

@Rakshith-R
Copy link
Contributor

@Mergifyio queue

Copy link
Contributor

mergify bot commented Nov 24, 2023

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at f666529

@mergify mergify bot added the ok-to-test Label to trigger E2E tests label Nov 24, 2023
@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-cephfs

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.27

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.28

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-rbd

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.27

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.26

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.28

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.27

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.28

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.26

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.26

@ceph-csi-bot ceph-csi-bot removed the ok-to-test Label to trigger E2E tests label Nov 24, 2023
@mergify mergify bot merged commit f666529 into ceph:devel Nov 24, 2023
34 checks passed
@karthik-us karthik-us deleted the clone-error branch November 24, 2023 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/cephfs Issues related to CephFS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for cephfs clone limit error
6 participants