-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rbd: use DeepCopy() to create thick-provisioned volumes from a snapshot #2184
Conversation
c15a6de
to
d843ff1
Compare
internal/rbd/controllerserver.go
Outdated
if rbdVol.ThickProvision { | ||
err = parentVol.DeepCopy(rbdVol) | ||
if err != nil { | ||
return status.Errorf(codes.Internal, "failed to deep copy %q into %q: %w", parentVol, rbdVol, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return status.Errorf(codes.Internal, "failed to deep copy %q into %q: %w", parentVol, rbdVol, err) | |
return status.Errorf(codes.Internal, "failed to deep copy %q into %q: %v", parentVol, rbdVol, err) |
Using %w
inside status.Errorf throws internal/rbd/controllerserver.go:499:11: printf: Errorf call has error-wrapping directive %w (govet)
govet linter error as status.Errorf
internally calls fmt.Sprintf()
https://github.com/grpc/grpc-go/blob/0257c8657362b76f24e7a8cfb61df48d4cb735d3/status/status.go#L62
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its also the reason for go-test
failure. @nixpanic ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! I'll test it out and verify that make containerized-test
passes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yesh.. its true, we dont need to wrap it.
d843ff1
to
bbc67ec
Compare
@nixpanic I see the DNM here, are you still testing it ? |
I think it still needs a way to restart in case the provisioner aborted the |
Manually tested with container image The error messages are quite nice in the describe PVC output:
|
Looks like something wrong. I see PVC and PVC-restore in bound state but i don't see any backend volume associated with it?
|
This is also leaving stale images in trash
|
|
It does, this is documented iic. |
to fix this @nixpanic during CreateSnapshot operation if it's a thick PVC don't create a final snapshot on the clone and do a deep copy during CreateVolume operation? |
AFAIK this is not the way it works, We need to remove the snapshots |
Fine, then we have to find a different method here to remove the snap references then : Btw here is the documentation of
|
That is weird. I do have one? Starting a pod that consumes the
And in the toolbox pod:
@Madhu-1, can you tell me exactly what steps you did? |
it happened only one time. I did create PVC, snapshot, create new PVC from the snapshot. i have attached the logs. if you want i can share more logs i have setup but the ceph backend not in the same state as i delete pvc,snapshot and pvc-restore |
Right, a thick-provisioned VolumeSnapshot does not need an RBD-snapshot in the end. That should make it possible to delete the restored images. |
Please update the document of the process we are doing here with the thick and other cases at time of snapshot and clone operation. It helps to get evaluated later from ceph team or refer while we revisit these code paths/process. |
2be8a37
to
d7e910b
Compare
Now addressed the following points:
|
internal/rbd/controllerserver.go
Outdated
} | ||
err = rbdVol.setThickProvisioned() | ||
if err != nil { | ||
return status.Errorf(codes.Internal, "failed mark %q thick-provisioned: %s", rbdVol, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failed marking or failed to mark
|
Its on the snapshot creation, but applicable only when the source is 'thick provisioned' image and |
Yes only if the source is thick provisioned |
d7e910b
to
f95a420
Compare
Keeping it open, this PR closes an other one. |
f95a420
to
ff3e1b8
Compare
/retest ci/centos/mini-e2e/k8s-1.19 |
ff3e1b8
to
8511e26
Compare
/retest ci/centos/upgrade-tests-cephfs |
some infrastructure issue (logs) |
/retest ci/centos/mini-e2e-helm/k8s-1.20 |
/retest ci/centos/mini-e2e/k8s-1.20 |
failed due to #1969 :
|
/retest ci/centos/mini-e2e-helm/k8s-1.20 |
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In case restoring a snapshot of a thick-PVC failed during DeepCopy(), the image will exist, but have partial contents. Only when the image has the thick-provisioned metadata set, it has completed DeepCopy(). When the metadata is missing, the image is deleted, and an error is returned to the caller. Kubernetes will automatically retry provisioning on the ABORTED error, and the restoring will get restarted from the beginning. Signed-off-by: Niels de Vos <ndevos@redhat.com>
When cloning a volume from a (CSI) snapshot, we use DeepCopy() and do not need an RBD snapshot as source. Suggested-by: Madhu Rajanna <madhupr007@gmail.com> Signed-off-by: Niels de Vos <ndevos@redhat.com>
8511e26
to
f3a3368
Compare
Describe what this PR does
While restoring a volume from a snapshot, the new volume should be thick-provisioned in case the parent volume was thick-provisioned too. This was not the case, as cloning from snapshots always makes volumes thin-provisioned.
Is there anything that requires special attention
Writing an e2e test case for this is not trivial. The existing validation functions used for the snapshot test cases are not small/modular enough. Hopefully #2178 can bring improvements there.
Related issues
Fixes: #2181
Depends-on: #2134 ([DNM] until this is merged)Show available bot commands
These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:
/retest ci/centos/<job-name>
: retest the<job-name>
after unrelatedfailure (please report the failure too!)
/retest all
: run this in case the CentOS CI failed to start/report any testprogress or results