Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iscsi zvol leak when reclaimPolicy is Retain #289

Closed
d-uzlov opened this issue Apr 5, 2023 · 31 comments
Closed

iscsi zvol leak when reclaimPolicy is Retain #289

d-uzlov opened this issue Apr 5, 2023 · 31 comments

Comments

@d-uzlov
Copy link

d-uzlov commented Apr 5, 2023

Context

I want to use the Retain policy to avoid accidentally deleting important data.

I also use a stable human-friendly naming for iscsi iqn:
nameTemplate: "{{ parameters.[csi.storage.k8s.io/pvc/namespace] }}.{{ parameters.[csi.storage.k8s.io/pvc/name] }}"

What I did:

  1. Deploy PVC test-iscsi
  2. Delete PVC test-iscsi
  3. Delete associated PV
  4. Re-deploy PVC test-iscsi

What I expected

When you create a new PVC, democratic-csi creates corresponding PV, iqn and zvol.
When you delete PVC, PV is released but not deleted.
When you manually delete PV, corresponding iqn and zvol are also deleted.
When you re-deploy PVC, democratic-csi creates a new PV, a new zvol, and re-creates the same iqn (because of the template).

What actually happened

When you first deploy PVC, everything works as expected.
When you delete PVC, PV is not deleted (also as expected).

But when you delete PV, iqn and zvol are not deleted.

And when you re-deploy PVC which have the same name pattern for iqn, the following happens:

  1. a new PV is created
  2. a new zvol is created, that matches PV name
  3. iqn is reused
  4. the old iqn points to the old zvol
  5. when mounted, new PVC still contains data from the old PVC

If I do this N times, I will have N empty zvols, only one of which was ever used.

Without the name template the leak would also happen, but there would also be N iqns, and the volume will not be reused.

If I switch reclaimPolicy to Delete, new zvols are deleted correctly but the old one are still there.

More context

Here is my full helm values file (except connection details):

freenas-iscsi.yaml
csiDriver:
  # should be globally unique for a given cluster
  name: "org.democratic-csi.truenas-iscsi"

# add note here about volume expansion requirements
storageClasses:
- name: iscsi
  defaultClass: false
  # reclaimPolicy: Delete
  reclaimPolicy: Retain
  volumeBindingMode: Immediate
  allowVolumeExpansion: true
  parameters:
    fsType: ext4

  mountOptions: []

# if your cluster supports snapshots you may enable below
volumeSnapshotClasses: []

controller:
  driver:
    logLevel: debug
    image: docker.io/democraticcsi/democratic-csi:v1.8.1
    imagePullPolicy: IfNotPresent

node:
  driver:
    logLevel: debug
    image: docker.io/democraticcsi/democratic-csi:v1.8.1
    imagePullPolicy: IfNotPresent

driver:
  config:
    driver: freenas-iscsi
    instance_id:
    httpConnection:
      ...
    sshConnection:
      ...

    zfs:
      cli:
        sudoEnabled: true

      # can be used to set arbitrary values on the dataset/zvol
      # can use handlebars templates with the parameters from the storage class/CO
      datasetProperties:
        org.freenas:description: "k8s/{{ parameters.[csi.storage.k8s.io/pvc/namespace] }}/{{ parameters.[csi.storage.k8s.io/pvc/name] }}"
      datasetParentName: main-pool/k8s/block/main
      detachedSnapshotsDatasetParentName: main-pool/k8s/block/snapshots
      # "" (inherit), lz4, gzip-9, etc
      zvolCompression: ''
      # "" (inherit), on, off, verify
      zvolDedup: ''
      zvolEnableReservation: false
      # 512, 1K, 2K, 4K, 8K, 16K, 64K, 128K default is 16K
      zvolBlocksize:

    iscsi:
      targetPortal: "truenas.danil"
      # for multipath
      targetPortals: [] # [ "server[:port]", "server[:port]", ... ]
      # leave empty to omit usage of -I with iscsiadm
      interface:

      # MUST ensure uniqueness
      # full iqn limit is 223 bytes, plan accordingly
      # default is "{{ name }}"
      nameTemplate: "{{ parameters.[csi.storage.k8s.io/pvc/namespace] }}.{{ parameters.[csi.storage.k8s.io/pvc/name] }}"
      namePrefix: csi.
      # nameSuffix: "-clustera"
      nameSuffix: ''

      # add as many as needed
      targetGroups:
        # get the correct ID from the "portal" section in the UI
        - targetGroupPortalGroup: 1
          # get the correct ID from the "initiators" section in the UI
          targetGroupInitiatorGroup: 1
          # None, CHAP, or CHAP Mutual
          targetGroupAuthType: None
          # get the correct ID from the "Authorized Access" section of the UI
          # only required if using Chap
          targetGroupAuthGroup:

      #extentCommentTemplate: "{{ parameters.[csi.storage.k8s.io/pvc/namespace] }}/{{ parameters.[csi.storage.k8s.io/pvc/name] }}"
      extentInsecureTpc: true
      extentXenCompat: false
      extentDisablePhysicalBlocksize: true
      # 512, 1024, 2048, or 4096,
      extentBlocksize: 4096
      # "" (let FreeNAS decide, currently defaults to SSD), Unknown, SSD, 5400, 7200, 10000, 15000
      extentRpm: "SSD"
      # 0-100 (0 == ignore)
      extentAvailThreshold: 0
@d-uzlov
Copy link
Author

d-uzlov commented Apr 5, 2023

My thoughts on this:

The only thing that actually bothers me here is the zvol list pollution.
They don't even consume any storage because of zvolEnableReservation: false option (though, it would be an issue if zvols weren't sparse).

I really like the ability to reuse volumes in nfs-subdir-external-provisioner, and I was a bit disappointed that csi plugins don't seem to allow this.
But this bug kinda solves it.

For me an ideal fix would look like this:

Add an option actionOnPvDeletion with possible values keep and delete.
It would be convenient if this option was in the Storage Class.
Behavior should be the same as in the old nfs provisioner:

  • reclaimPolicy: Delete + actionOnPvDeletion: keep
    • zvols and iqns are only cleared manually
    • re-deployment of the same PVC gives you the same zvol
    • requires stable nameTemplate for iqn (most likely namespace+pvc-name to avoid collisions)
    • doesn't make sense if nameTemplate contains PV name
    • when requested iqn already exists, zvol creation is skipped. But existing zvol should be updated to new parametes if required (resize, properties update)
  • reclaimPolicy: Delete + actionOnPvDeletion: delete
    • same as current behavior of reclaimPolicy: Delete
    • resources are cleared automatically
    • new PVC means new zvol
    • wokrs with any valid nameTemplate (either pv-name, or namespace+pvc-name, or both)
  • reclaimPolicy: Retain + actionOnPvDeletion: delete
    • nameTemplate must contain pvc name. If nameTemplate doesn't contain pvc name, when PVC is re-deployed there would be several PVs pointing to the same iqn, which is unsafe
    • PV and zvol could be reused by manually clearing claimRef field on PV
    • k8s resources require manual deletion, but truenas resources are synchronized automatically
  • reclaimPolicy: Retain + actionOnPvDeletion: keep
    • same as current behavior of reclaimPolicy: Retain
    • could cause zvol collisions and resource leaks
    • doesn't make sense in my opinion. I don't even know what the correct behavior could theoretically be like
    • should be declared unsupported, unless someone else could suggest a use-case for this

Also, since we already know that we can use the list of iqns in truenas as a database, we could also have a nameTemplate for zvol name (separate from iqn name template).


Would it be possible to implement?
I hope it's not too much to ask for.

@travisghansen
Copy link
Member

Welcome! Most of your desired outcome is not really the responsibility of the csi driver but rather k8s csi tooling itself. Without getting into the weeds too much I think your problem will be solved by simply setting the reclaim policy to delete before actually deleting the PV.

Is that viable for your use-case and can you give it a try and let me know the end result?

@travisghansen
Copy link
Member

Regarding the templating of the zvol path you can read through this (and likely several other issues created over the years): #255

@d-uzlov
Copy link
Author

d-uzlov commented Apr 6, 2023

your problem will be solved by simply setting the reclaim policy to delete before actually deleting the PV

I mean, it would technically work.
But I wouldn't call it a good solution. Seems like a hack to me.
Also it would require a lot of manual management.

Most of your desired outcome is not really the responsibility of the csi driver but rather k8s csi tooling itself

From what I understand, k8s way of dealing with this is "never delete a PV". Maybe even "never delete PVC".
K8s can't possibly manage a resource that no longer exists.

Provisioner, on the other hand, can manage them, because a provisioner isn't constrained by k8s, and still has access to "real" underlying resources (if it doesn't delete them from remote storage server).

Regarding the templating of the zvol path you can read through this (and likely several other issues created over the years): #255

Yeah, I have seen this and several other discussions, all of which boil down to "CSI API doesn't provide a convenient way to do this, too hard to implement, doesn't worth the time investment".

I understand that adding a whole database to work around CSI issues would be hard and time-consuming, so please don't treat this as a feature request. It's a thing that would be nice to have but I could totally live without it.

The only reason I brought this up is that it seems to me that iscsi already has a database for it, and provisioner could simply use it.
From what I can see, provisioner seems to always have access to iqn that a PV is using. At least for "create" and "delete" actions.

This is obviously iscsi-specific, and will not work for NFS.

@d-uzlov
Copy link
Author

d-uzlov commented Apr 6, 2023

Let me make this clear.
This issue isn't about "add this option to make my use-case easier to use".
This issue is about resource (zvol and iqn) leak.

I added my thoughts in hope to provide some context and possible solutions.

I'll try to rephrase my thoughts:

  • I believe that resource leaks should never happen. This is a bug. Manually clearing resources is not a good solution
  • actionOnPvDeletion: keep will generally cause resource leaks, unless we define some stable name for the resource
  • reclaimPolicy: Retain is incompatible with actionOnPvDeletion: keep
    • If we use a stable name for iqn, there would be several volumes pointing to the same iqn
    • if we use a unique name for iqn, we would never be able to reuse the old volume after we delete PV (unless we re-create the old PV manually)
  • I see 2 options to fix the behavior:
    • Option 1: delete zvol when deleting PV manually. In my post above this is described as reclaimPolicy: Retain + actionOnPvDeletion: delete
      • I don't really like this option but I think it's better than resource leaks
    • Option 2: add a special option to allow to use actionOnPvDeletion: keep with reclaimPolicy: Delete
      • I [realy] like this option but I don't insist on implementing this
      • This requires using truenas iqn/target/extent list as a database, to avoid creating many resource for the same PVC name
  • I believe that resource leaks should never happen. This is a bug. Manually clearing resources is not a good solution

If you disagree with this statement feel free to close this issue, I don't insist on solving this.
I'm kinda OK with current a bit leaky behavior, I just thought that some of my ideas could improve experience for other users.

@d-uzlov
Copy link
Author

d-uzlov commented Apr 6, 2023

Though, if you would like to just leave this as is, please add a note somewhere in documentation about this behavior.
To make it clear that reclaimPolicy: Retain will never clear resources on the truenas server. And that reclaimPolicy: Retain requires {{ name }} somewhere in the iscsi nameTemplate to avoid collisions with not just other existing resources, but also past resources.
I think config examples would be a good place for this info.

@travisghansen
Copy link
Member

Yes, the scenario you are hitting is not something I had considered (and is frankly why I hesitated to allow a nameTemplate for anything). It certainly needs to be documented what the ramifications are! It needs to be documented as 'do not do this in production environments' IMO as there are too many weird/bad things that could occur :(

I'm slightly confused what your expectations are regarding Retain and the 'leaky' behavior. If that value is set to Retain my expectation is that the volume is not deleted...'leaky'. If set to Delete then the volume would be removed...not 'leaky'.

I do want to digest this issue a bit more so I have no intention of closing this currently simply because we may not agree entirely atm :) I'm always open to improvements, and I think there are definitely some things to consider here!

@travisghansen
Copy link
Member

As an aside, this may be of interest to you: https://github.com/democratic-csi/democratic-csi/blob/master/bin/k8s-csi-cleaner

@d-uzlov
Copy link
Author

d-uzlov commented Apr 6, 2023

I'm slightly confused what your expectations are regarding Retain and the 'leaky' behavior. If that value is set to Retain my expectation is that the volume is not deleted...'leaky'. If set to Delete then the volume would be removed...not 'leaky'.

From my understanding the Retain option is here to provide an option to recover old PVs.
If I accidentally delete PVC, I could still recover the data by clearing the claimRef and remounting it with a new PVC.

If cluster admin decides that this PV is no longer needed, then they delete it, and the data is gone.

I look at it from a perspective of a public cloud (like GKE or AKS), where I need to pay for every volume I have, regardless of whether they are listed in the k8s or not.
I would find it unexpected if I create a resource using k8s, then delete the resource using k8s, but I still need to pay for this resource.
I would expect that only manually created volumes survive the deletion of PV.


That being said, truenas is not a part of some public cloud (at least my instance isn't).
I already paid for for all the storage, so I have the luxury to just keep all my volumes indefinitely.
Therefore, I can sacrifice some storage for the convenience of not having to care about managing dynamically-created volumes.

I think this is not an intended use for k8s. It isn't designed for home installations, so it doesn't really have a native way to communicate the "create resource automatically, but keep indefinitely" approach.
This can only be an extension of a csi, like my proposed actionOnPvDeletion or similar option in the old nfs provisioner.

P.S. Disclaimer: while I'm relatively experienced with k8s overall, since I develop apps for k8s on work, I have almost no experience with public clouds, so this is mostly my logical thoughts. It's possible I understand something incorrectly.

@travisghansen
Copy link
Member

Public clouds do not delete the volume either if the PV has Retain set when you delete it from k8s. The csi drivers from/for the clouds are under the same constraints as any other csi driver (which there is quite a bit of confusion around to be honest). Many people don't understand that csi drivers only do what they are told and do not trigger any actions on their own.

Can you send a link to the doc for the behavior of the old provisioner you are referring to?

@d-uzlov
Copy link
Author

d-uzlov commented Apr 6, 2023

As an aside, this may be of interest to you: https://github.com/democrati

Thank you for pointing it out :-)
Currently I don't have that many unused zvols but it could be useful later.

Public clouds do not delete the volume either if the PV has Retain set when you delete it from k8s

Oh.
My bad. I never really checked this.
Actually, I never found any documentation or usage tips for persistent storage, except for "cloud provider already created storage classes for you, use them", which isn't helpful for home deployment.

Can you send a link to the doc for the behavior of the old provisioner you are referring to?

https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner

There isn't too much of documentation.
The option I'm talking about is onDelete.
You can search for text If it exists and has a delete value in the main readme.

@travisghansen
Copy link
Member

Yeah, all the public clouds now use csi drivers. If Retain is set the csi driver never has any idea the volume is deleted (again, they only do what they are told, and if retain is set they are never told anything) whether it's in the cloud, on-prem, or anything else.

@d-uzlov
Copy link
Author

d-uzlov commented Apr 6, 2023

Well, I see that my initial assumption was wrong.
The "leak" is apparently the expected behavior for a csi plugin.
Though, I can't still understand what is the intended use-case for this from the point of view of k8s and csi developers (I mean, csi specification developers, not plugin developers). It doesn't make any sense to me. Why would they ever do this.

I guess, the only thing left here is a feature request to add an option to keep the actual data on automatic PV deletion.
Though, it doesn't match the original issue. And it would require stable iqn name pattern. And it will be iscsi-specific (though, maybe smb is also possible).
So, it's probably not feasible.
You can probably just close this issue.

Thank you for the explanation.

@d-uzlov
Copy link
Author

d-uzlov commented Apr 6, 2023

By the way, apparently csi-driver-nfs also added onDelete option a few days ago.

https://github.com/kubernetes-csi/csi-driver-nfs/blob/fafa015708046086ede931381437692db7b3880b/docs/driver-parameters.md

Just for the sake of completeness :-)
And maybe this could help someone who finds this issue in the future.

@travisghansen
Copy link
Member

travisghansen commented Apr 6, 2023

Interesting, I'll take a peek.

The reason there may be a disconnect between the Retain logic and csi logic is that csi is used beyond the bounds of k8s. There are no k8s-isms in the csi spec and therefore the notion of Retain does not exist...only Create and Delete exist and it's up to the CO to invoke those when it feels appropriate. In this sense csi drivers are completely 'dumb'.

I don't think we should close this yet, I think it warrants further thought.

@d-uzlov
Copy link
Author

d-uzlov commented Apr 8, 2023

I spent some time thinking about this.

To reiterate on the previous discussion in this issue:

  • My original idea was to use the Retain option to make sure that I don't delete my data accidentally.
  • It turned out that the Retain is a bit broken by itself. It's technically usable but very inconvenient.
  • It is not the fault of the CSI driver, it's an issue with k8s itself. However, it doesn't make the situation better for me as a user.
  • When I created this issue, I was hoping that solving it would help me with my use-case.
  • While it's impossible to fix the Retain option itself, it should be possible to work around it.
  • And at this point this is a just a feature request (+1 for reusable volumes).

Now that I know a bit more about how CSI API works, I think I can make a better feature request.
My assumptions:

  • When creating the volume the CSI driver has access to namespace and pvcName
    • It seems so because we can already use them for iscsi name template
  • When resizing, creating snapshots, deleting volumes or doing anything else the CSI driver has access only to volume ID (or volume name, or PVC UID)
  • CSI driver can not query k8s for metadata about PVC or PV based on the ID

Based on these assumptions, I feel like it should be possible to implement reusable volumes for iscsi (and maybe SMB).

Setup:

  1. Add onDelete option to driver settings, similar to NFS provisioners that I linked previously
  2. User sets onDelete: retain with reclaimPolicy: Delete
  3. User sets nameTemplate to namespace+pvcName

Use case:

  1. User creates a new PVC
  2. CSI driver creates a volume with generated name and stable share name based on nameTemplate (iqn, iscsi target name)
  3. User deletes PVC
  4. CSI driver deletes PV but keeps the data
  5. User re-creates PVC with the same name
  6. CSI driver checks the list of existing shared volumes
  7. CSI driver finds that share with requested name already exists
  8. Instead of creating a new volume, CSI driver renames the volume associated with the selected share name to a new name
  9. Maybe it's also required to resize the volume after renaming and update its zfs properties

Everything will remain the same as it is now, except for the create operation.

It looks to me that this should be possible to implement and should not require any changes in API or configs (except for adding a new option, which doesn't break compatibility).

The only concern I see is snapshots. I never used them (neither in k8s, nor in zfs), so I'm not sure about their requirements. I think detachedSnapshots: true should work fine, but the "non-detached" snapshots may be broken after volume renaming.

@travisghansen
Copy link
Member

Does the scenario above not implicitly leave you leaked volumes? a noop delete is leaky.

I'm considering allowing an option to use a template (in an unsupported giving you the gun fashion). In such a case the template would be used to generate a volume_id during the CreateVolume sequence. If invalid template is used you will end up with scenarios where volumes will fail to provision.

If I am to add such functionality, you will end up with deterministic names for volume_ids and therefore much of the headaches above simply go away. I have no idea what the behavior would be in scenarios where you:

  1. set Retain on the PV
  2. delete the PVC
  3. create a new PVC (with same name in same ns)

I'm not sure if k8s would find the existing PV with the same volume_id and just rebind it or not. Again not really planning on supporting it but if you want to run through some tests I would welcome any feedback.

@travisghansen
Copy link
Member

As an FYI, with the idTemplate approach described it would work across all drivers, snapshots really shouldn't be an issue etc.

@d-uzlov
Copy link
Author

d-uzlov commented Apr 8, 2023

Does the scenario above not implicitly leave you leaked volumes? a noop delete is leaky.

In my suggestion in the previous message I assumed that share name could be the thing with the stable name, so the leak would be temporary, until the same PVC is re-created.
Kinda the same as with stable volume id.

If the name is generated, then yeah, it would work the same as current Retain behavior.

If I am to add such functionality, you will end up with deterministic names for volume_ids and therefore much of the headaches above simply go away.

Oh, this sounds promising.
I kinda assumed that this would be somehow hard to implement and didn't consider this, but it would probably solve all the issues.
Also "works with all drivers" is fantastic. Much better than my proposal.

I have no idea what the behavior would be in scenarios where you:

  1. set Retain on the PV

Currently Retain with a stable share name technically works, but there are leaks and I suspect that resize and snapshots would operate on a wrong dataset.

With deterministic volume id it would likely just fail to provision, in some way or another.
I would see this as an improvement.

I don't see how this could be properly supported. I think Retain is just incompatible with name templates of any kind.

@travisghansen
Copy link
Member

Try a deployment with next image tag (be sure pull policy is Always).

https://github.com/democratic-csi/democratic-csi/blob/next/examples/private.yaml#L22

Add something like this to your config:

_private:
  csi:
    volume:
      # THIS IS UNSUPPORTED, BAD THINGS WILL HAPPEN IF NOT CONFIGURED PROPERLY
      #
      # note the volume length must *always* be the same for every call for the same volume by the CO
      # the length must NOT execeed 128 characters
      # must start with an alphanumeric character
      # must only contain alphnumeric characters or `-` or `_`
      idTemplate: "{{ parameters.[csi.storage.k8s.io/pvc/namespace] }}-{{ parameters.[csi.storage.k8s.io/pvc/name] }}"

Run it through crazy tests/scenarios and please document the outcomes and share here.

@travisghansen
Copy link
Member

Note, I haven't implemented a noop on delete yet, but you can mimic the same behavior by setting the reclaim policy to Retain and then delete the PV...outcome is the same.

@d-uzlov
Copy link
Author

d-uzlov commented Apr 8, 2023

I mean.
Maybe we misunderstood each other.

Retain leaks PVs and physical volumes:

  1. Reclaiming leaked PV is hard. It requires you to either patch the PV or kubectl edit it manually. Both don't work with simple kubectl apply that I usually use.
  2. Reclaiming leaked physical volume is easy: you just create a new PV that points to the same volume. Well, it is as easy as knowing the name of the leaked volume.

So my suggestion was to avoid Retain altogether.
You can set the reclaimPolicy to Delete, enable noop delete, and automatically clear the PVs. It is the same as Retain, except PVs are cleared automatically.
If PVs are deleted automatically, we solve any name collisions in k8s that are caused by leaked PVs.

Just to make sure we are talking about the same thing here.

@d-uzlov
Copy link
Author

d-uzlov commented Apr 8, 2023

Run it through crazy tests/scenarios and please document the outcomes and share here.

Sure

@travisghansen
Copy link
Member

So my suggestion was to avoid Retain altogether. You can set the reclaimPolicy to Delete, enable noop delete, and automatically clear the PVs. It is the same as Retain, except PVs are cleared automatically. If PVs are deleted automatically, we solve any name collisions in k8s that are caused by leaked PVs.

Doing this leaks actual volumes/storage/shares. I'm unclear why you're more concerned about some leaked k8s metadata more than actual storage.

I can likely add an onDelete policy, but again setting the PV to Retain and then deleting is good enough for running through some tests as the state is exactly the same.

It also gives us an opportunity to discover the behavior for the 3-step scenario I described previously (don't delete the PV, recreate the PVC) and see/document what happens.

@d-uzlov
Copy link
Author

d-uzlov commented Apr 8, 2023

Doing this leaks actual volumes/storage/shares. I'm unclear why you're more concerned about some leaked k8s metadata more than actual storage.

My concern with leaked PVs is name collisions. I'm uncomfortable with the thought that I can delete claimRef and have several completely different PVCs pointing to the same volume via different PVs.

I guess it doesn't really matter if I never reuse old PVs. It would just be more safe if it was technically impossible to do, rather than "I promise that I won't reuse them".

Leaking real storage is less of a concern because usually I just re-create the same PVC. It's not often than I want to delete app data completely.

setting the PV to Retain and then deleting is good enough for running through some tests as the state is exactly the same

Yep.

I ran a few tests:

  1. Initial deployment.

Result: It works!
At first I found it unexpected that PV name was still generated, I thought that volume_id would affect it, but whatever, doesn't really matter.

  1. Re-deploying properly, with deletion of old PV

Result: It still works!
PVC was successfully rebound to the same zvol.

  1. Re-deploying without deleting old PV

Result: It still works!
PVC was successfully rebound to the same zvol.
There were no issues from k8s side. Apparently, it doesn't check for volumeHandle collisions. But maybe this was because the old volume was in "Released" state?

  1. Deploying several PVC which point to the same volume

I created 2 PVCs: test-iscsi in namespace default and iscsi in namespace default-test, which both generate default-test-iscsi, with still having the old "Released" PV for a good measure.
Result: It still works!
Both pods that used these PVCs were on the same node, so there were no issues with running them.
Apparently, k8s just doesn't care about this.
I guess we need a more robust idTemplate than namespace-pvcName.

  1. Try to use a different idTemplate

idTemplate: namespace_name
Result: error received error creating iscsi target - code: 422 body: {"iscsi_target_create.name":[{"message":"Lowercase alphanumeric characters plus dot (.), dash (-), and colon (:) are allowed.","errno":22}]}

idTemplate: namespace.name
Result: error generated volume_id 'default.test-iscsi' contains invalid characters: '.'.

idTemplate: namespace_name + iscsi nameTemplate: namespace.name
Result Works fine!


I checked the restrictions on the volumeHandle values, and it seems like there are none. Other plugins happily use ., /, # and other special symbols in it.
I also checked zfs limitations. Don't see any.

Is the error about "invalid characters" produced by the democratic-csi?
If so, what is the reason for it? Can you lift the restriction?

@d-uzlov
Copy link
Author

d-uzlov commented Apr 8, 2023

One more note about onDelete behavior.

I think that onDelete: retain should leave only the volume behind.
iscsi share (target, extent, etc) should still be deleted.
Same probably goes for SMB.

There are several reasons:

  1. It will work. Share parameters aren't required to reuse the volume
  2. Share nameTemplate can be changed. The old share name will be leaked
  3. Does iscsi even allow for several shares pointing to the same zvol?
  4. It's just annoying to delete shares when I finally decide that I no longer need the volume

@travisghansen
Copy link
Member

My concern with leaked PVs is name collisions. I'm uncomfortable with the thought that I can delete claimRef and have several completely different PVCs pointing to the same volume via different PVs.

If you turn off the iscsi name template this issue goes away. That of course assumes you don't want to rebind to the same volume in some future scenario.

At first I found it unexpected that PV name was still generated, I thought that volume_id would affect it, but whatever, doesn't really matter.

This is expected yes.

  1. Re-deploying properly, with deletion of old PV

I'm hoping the zvol was deleted and recreated (same name, but different zvol).

  1. Re-deploying without deleting old PV

Did you end up with 2 PVs with the same volume handle?

  1. Deploying several PVC which point to the same volume

Yeah, that's a nasty situation just waiting to blow up in your face :( An idTemplate like this I think would do the trick: "{{ parameters.[csi.storage.k8s.io/pvc/namespace] }}--{{ parameters.[csi.storage.k8s.io/pvc/name] }}" (ie: I don't think k8s allows resources to have names ending with -). Of note, the same problem exists for the iscsi assets.

  1. Try to use a different idTemplate

The 422 is an error from the TrueNAS api so I guess that's a constraint over there.

Restrictions on zfs here: https://docs.oracle.com/cd/E26505_01/html/E37384/gbcpt.html
So while zfs technically allows : and . I don't want to introduce for any future drivers. Part of what makes it so easy to manage the project (with all the various drivers) is all the shared code across drivers so I want to keep the 'lowest common denominator' for now.

I think that onDelete: retain should leave only the volume behind.

Unfortunately that would render the logic/project extremely unmaintainable :( I can probably work something for a noop but trying to handle all the possible nuances beyond that is highly unlikely.

@d-uzlov
Copy link
Author

d-uzlov commented Apr 10, 2023

If you turn off the iscsi name template this issue goes away. That of course assumes you don't want to rebind to the same volume in some future scenario.

Well, yeah. This is the whole issue. Retain is incompatible with name templates of any kind.
I think nameTemplate and/or idTemplate options require something like onDelete because of it.

3. Re-deploying properly, with deletion of old PV

I'm hoping the zvol was deleted and recreated (same name, but different zvol).

I used Retain, and manually deleted the PV for this test.
But later I did do a test with the Delete policy, and, as expected, the volume was deleted and re-created with the same name.

5. Re-deploying without deleting old PV

Did you end up with 2 PVs with the same volume handle?

Yes.

An idTemplate like this I think would do the trick: "{{ parameters.[csi.storage.k8s.io/pvc/namespace] }}--{{ parameters.[csi.storage.k8s.io/pvc/name] }}" (ie: I don't think k8s allows resources to have names ending with -)
Of note, the same problem exists for the iscsi assets.

K8s does allow for several consecutive hyphens in the name, though. qwe--rty is a valid namespace name.
I think _ as a divider gets the job done, so it's technically already solved. I didn't check PVC naming rules, but at least namespaces don't allow it in the name, which is enough to solve collisions.
The issue with iscsi is solved by using nameTemplate with . as a separator.

Ideally I would like to use . for volume name. It would allow to unify volume name with iscsi share name (which doesn't support _). And it looks nicer in my opinion.
Alternatively it would be nice to use / and place volumes from different namespaces in different datasets.
But if there are some incompatibilities with other drivers then I could live without it.

The 422 is an error from the TrueNAS api so I guess that's a constraint over there.

Error 422 was caused by iscsi naming restrictions, because of _.
The . error was different. I don't remember if there were any codes for it.

Unfortunately that would render the logic/project extremely unmaintainable :( I can probably work something for a noop but trying to handle all the possible nuances beyond that is highly unlikely.

That is sad to hear.
The volume_id, a "noop delete", and the whole "reusable volumes" thing is already a huge UX improvement for me, though, so I'm really grateful for the changes anyways.
I could probably instead use that script for deleting unused volumes that you linked above to simplify cleanup.

@travisghansen
Copy link
Member

While k8s allows multiple hyphens, that part isn’t actually important. The important bit is that resources cannot end with a hyphen. You should be able to use the example I sent for both volume and iscsi templates withouts worrying about collisions.

Preventing collisions doesn’t solve the issue of length limits which will eventually get hit unless very careful :)

@d-uzlov
Copy link
Author

d-uzlov commented Apr 18, 2023

Do you have plans to make a release with idTemplate and "noop delete" features in the foreseeable future?

@travisghansen
Copy link
Member

travisghansen commented Mar 3, 2024

6ada268

I added a param:

6ada268#diff-e9988367425a07eac915dac8f1a6578376ede76ca884fbb6c4dad09662f30e84R33

Setting the param will be entirely unsupported but you have been given the metaphorical gun spoke of previously. Setting the value will prevent both the shares (nfs, iscsi, etc) and the backing storage (zvol, datasets, etc) from being deleted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants