Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically use hostpath if pod is on same host as zfs pool #85

Open
morganchristiansson opened this issue Sep 29, 2022 · 7 comments
Open
Labels
question Further information is requested

Comments

@morganchristiansson
Copy link
Contributor

morganchristiansson commented Sep 29, 2022

This has been bugging me.

I'm currently using hostpath-provisioner and have /nfs/hostpath on all nodes. It's a nfs mount on all nodes except the zfs node and works elegantly. But I would rather the provisioner did zfs create for every pv/pvc.

Not sure if performance is worse when using nfs on localhost but nonetheless it would nice if it automatically switched to hostpath.

Maybe using type: auto (default?)

kind: StorageClass
apiVersion: storage.k8s.io/v1
parameters:
  type: auto
@ccremer ccremer added the question Further information is requested label Sep 29, 2022
@ccremer
Copy link
Owner

ccremer commented Sep 29, 2022

Hi.
I'm not sure I understand your problem fully.
Am I reading correctly you have a single ZFS host that is also part of the cluster as a worker node? And all other worker nodes don't have ZFS, but instead mount the NFS export that the single ZFS host exposes?

In general, the provisioner has no idea of any pods that might or might not use a PVC. It doesn't know on which node a pod gets scheduled, rather it creates PVs (when PVCs get created) that may add restrictions where pods can even be scheduled to begin with (the case with hostpath).

I don't know if there's a performance impact if you mount a volume over NFS instead of bind mount on the same machine. You'd have to benchmark it yourselves with your application.

Why is it not good enough to just create a storageclass of type nfs? Is it just for the performance concern?

@morganchristiansson
Copy link
Contributor Author

Yes you understand correctly. the ZFS host is the kubernetes master and also runs pods. The workers are diskless Raspberry Pis with nfs.

I guess it would be specific to pods mounting the PV/PVC, creation shouldn't be different depending on type.

Yes just performance and maybe cleaner to mount directly without nfs.

@ccremer
Copy link
Owner

ccremer commented Sep 29, 2022

Thanks for the explanation.
Unfortunately I don't think that's possible. A PVC doesn't provide information on specific nodes, so it can't really determine if a volume should be hostpath or nfs or something else, so a provisioner will have a hard time to differentiate. Sure, PVCs can have annotations about the node they're supposed to be provisioned on, but then you might as well use a different storage class.
If the mount approach is a concern for you, you're left with having a specific PVC and deployment that are only schedulable on the master, while the other pods use the NFS type.

Besides, I'm a bit hesitant to implement a feature out of a performance concern when I don't have any actual numbers.
Please try it out with NFS on the same machine. It's possible that we are talking about a non-issue from a practical PoV,

@morganchristiansson
Copy link
Contributor Author

morganchristiansson commented Sep 30, 2022

Fair enough.

Maybe the info is available at the point where the pod is mounting the pvc? At pvc creation time it wouldn't be available agreed.

Some quick googling suggests there has been problems but may be working fine now. I'll need to test it and come back to be sure...

Quote:

Traditionally, the practice of nfs loopback mounting has not been recommended or supported in any Linux environment. There are known problems with nfs loopback mounts. The problems deal with deadlocks which can occur due to conflicts that arise between memory allocation, memory freeing, and memory write out. Because of the potential for deadlocks, loopback mounting has been generally considered unsupported by all of the Linux community.

On SLES 12 and 15, improvements to NFSv3 allow loopback mounts to be supported. Note that this support does not apply to NFSv4
https://www.suse.com/support/kb/doc/?id=000018709

@ccremer
Copy link
Owner

ccremer commented Oct 3, 2022

So I was quickly looking at the code again to see available options.
It seems the provision controller library may actually pass the scheduled node:

https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/a2f2cebc05acc2a003772096c46965cf7ad2ee4e/controller/volume.go#L115-L133

I'm not sure how that's going to help in practice, but it may be worth testing out.

@morganchristiansson
Copy link
Contributor Author

morganchristiansson commented Oct 4, 2022

I did not realise that no code runs when creating new pods. Interesting. I found createVolumeSource() and see that there is no code execution at point of pod mounting the pvc https://github.com/ccremer/kubernetes-zfs-provisioner/blob/master/pkg/provisioner/provision.go#L84-L105

I'm using a simple solution where I have /nfs/hostpath mounted on all nodes in my cluster and and using https://github.com/rimusz/hostpath-provisioner/ (based on sig-storage-lib-external-provisioner/examples/hostpath-provisioner) which creates directory per pv under a root path and uses hostPath. But I want zfs create dataset per pv for stats, snapshotting and management. A similar solution could work here by mounting parentDataset on every node and then use hostPath for all PVs? If it fits with this project..

Wish I could suggest more... 😄

@ccremer
Copy link
Owner

ccremer commented Oct 4, 2022

I did not realise that no code runs when creating new pods.

Yes, that's what I tried to explain in earlier comments, but apparently I didn't do a good job :)

A similar solution could work here by mounting parentDataset on every node and then use hostPath for all PVs

From what I know, this one doesn't fit. It means that the provisioner has to connect to the node and mount the parentDataset via NFS first, and then provide a VolumeSource that has hostpath. Kubernetes would be "unaware" that the path is actually mounted via NFS. If for any reason, after a reboot, the NFS mount cannot be re-established, Kubernetes just creates the new hostpath and you start with an empty directory...

This is a mechanism that I honestly don't want to maintain in this project with my already limited spare time. There's just too many moving parts.

The only thing I could consider is my earlier suggestion: Check if we have the node information in the options, and if it matches the node in the storage class, return a hostpath instead of NFS volume source.

jp39 pushed a commit to jp39/zfs-provisioner that referenced this issue Aug 2, 2024
Automatically use hostpath if pod is on same host as zfs pool.

This addresses ccremer#85. When storage class type is set to auto,
automatically create a hostpath volume when the scheduler selects the
specified node to run the pod, otherwise fallback to using NFS.

Note this only works when volumeBindingMode is set to
WaitForFirstConsumer in the storage class. Otherwise, when set to
Immediate, volumes will be pre-provisioned before the scheduler selects a
node for the pod consuming the volume, and options.SelectedNode will be
unset.

Note that there could be unintended side-effects if multiple pods using
the volume claim are scheduled on different nodes, depending on which
pods gets scheduled first. If the first pod gets scheduled on the ZFS
host, it will automatically use a hostpath volume and node affinity will
be set so that other pods will be prevented from running on other nodes.
On the other hand, if the second pod gets scheduled on a node which is
not the ZFS host, it will use a NFS volume, and subsequent pods will
also use NFS, even if scheduled to run on the ZFS host.
jp39 pushed a commit to jp39/zfs-provisioner that referenced this issue Aug 3, 2024
Automatically use hostpath if pod is on same host as zfs pool.

This addresses ccremer#85. When storage class type is set to auto,
automatically create a hostpath volume when the scheduler selects the
specified node to run the pod, otherwise fallback to using NFS.

Note this only works when volumeBindingMode is set to
WaitForFirstConsumer in the storage class. Otherwise, when set to
Immediate, volumes will be pre-provisioned before the scheduler selects a
node for the pod consuming the volume, and options.SelectedNode will be
unset.

Note that there could be unintended side-effects if multiple pods using
the volume claim are scheduled on different nodes, depending on which
pods gets scheduled first. If the first pod gets scheduled on the ZFS
host, it will automatically use a hostpath volume and node affinity will
be set so that other pods will be prevented from running on other nodes.
On the other hand, if the second pod gets scheduled on a node which is
not the ZFS host, it will use a NFS volume, and subsequent pods will
also use NFS, even if scheduled to run on the ZFS host.
jp39 pushed a commit to jp39/zfs-provisioner that referenced this issue Aug 3, 2024
Automatically use hostpath if pod is on same host as zfs pool.

This addresses ccremer#85. When storage class type is set to auto,
automatically create a hostpath volume when the scheduler selects the
specified node to run the pod, otherwise fallback to using NFS.

Note this only works when volumeBindingMode is set to
WaitForFirstConsumer in the storage class. Otherwise, when set to
Immediate, volumes will be pre-provisioned before the scheduler selects a
node for the pod consuming the volume, and options.SelectedNode will be
unset.

Note that there could be unintended side-effects if multiple pods using
the volume claim are scheduled on different nodes, depending on which
pods gets scheduled first. If the first pod gets scheduled on the ZFS
host, it will automatically use a hostpath volume and node affinity will
be set so that other pods will be prevented from running on other nodes.
On the other hand, if the second pod gets scheduled on a node which is
not the ZFS host, it will use a NFS volume, and subsequent pods will
also use NFS, even if scheduled to run on the ZFS host.
jp39 pushed a commit to jp39/zfs-provisioner that referenced this issue Aug 3, 2024
Automatically use hostpath if pod is on same host as zfs pool.

This addresses ccremer#85. When storage class type is set to auto,
automatically create a hostpath volume when the scheduler selects the
specified node to run the pod, otherwise fallback to using NFS.

Note this only works when volumeBindingMode is set to
WaitForFirstConsumer in the storage class. Otherwise, when set to
Immediate, volumes will be pre-provisioned before the scheduler selects a
node for the pod consuming the volume, and options.SelectedNode will be
unset.

Note that there could be unintended side-effects if multiple pods using
the volume claim are scheduled on different nodes, depending on which
pods gets scheduled first. If the first pod gets scheduled on the ZFS
host, it will automatically use a hostpath volume and node affinity will
be set so that other pods will be prevented from running on other nodes.
On the other hand, if the second pod gets scheduled on a node which is
not the ZFS host, it will use a NFS volume, and subsequent pods will
also use NFS, even if scheduled to run on the ZFS host.
jp39 pushed a commit to jp39/zfs-provisioner that referenced this issue Aug 3, 2024
Automatically use hostpath if pod is on same host as zfs pool.

This addresses ccremer#85. When storage class type is set to auto,
automatically create a hostpath volume when the scheduler selects the
specified node to run the pod, otherwise fallback to using NFS.

Note this only works when volumeBindingMode is set to
WaitForFirstConsumer in the storage class. Otherwise, when set to
Immediate, volumes will be pre-provisioned before the scheduler selects a
node for the pod consuming the volume, and options.SelectedNode will be
unset.

Note that there could be unintended side-effects if multiple pods using
the volume claim are scheduled on different nodes, depending on which
pods gets scheduled first. If the first pod gets scheduled on the ZFS
host, it will automatically use a hostpath volume and node affinity will
be set so that other pods will be prevented from running on other nodes.
On the other hand, if the second pod gets scheduled on a node which is
not the ZFS host, it will use a NFS volume, and subsequent pods will
also use NFS, even if scheduled to run on the ZFS host.
jp39 pushed a commit to jp39/zfs-provisioner that referenced this issue Aug 5, 2024
Automatically use hostpath if pod is on same host as zfs pool.

This addresses ccremer#85. When storage class type is set to auto,
automatically create a hostpath volume when the scheduler selects the
specified node to run the pod, otherwise fallback to using NFS.

Note this only works when volumeBindingMode is set to
WaitForFirstConsumer in the storage class. Otherwise, when set to
Immediate, volumes will be pre-provisioned before the scheduler selects a
node for the pod consuming the volume, and options.SelectedNode will be
unset.

Note that there could be unintended side-effects if multiple pods using
the volume claim are scheduled on different nodes, depending on which
pods gets scheduled first. If the first pod gets scheduled on the ZFS
host, it will automatically use a hostpath volume and node affinity will
be set so that other pods will be prevented from running on other nodes.
On the other hand, if the second pod gets scheduled on a node which is
not the ZFS host, it will use a NFS volume, and subsequent pods will
also use NFS, even if scheduled to run on the ZFS host.
jp39 pushed a commit to jp39/zfs-provisioner that referenced this issue Aug 5, 2024
Automatically use hostpath if pod is on same host as zfs pool.

This addresses ccremer#85. When storage class type is set to auto,
automatically create a hostpath volume when the scheduler selects the
specified node to run the pod, otherwise fallback to using NFS.

Note this only works when volumeBindingMode is set to
WaitForFirstConsumer in the storage class. Otherwise, when set to
Immediate, volumes will be pre-provisioned before the scheduler selects a
node for the pod consuming the volume, and options.SelectedNode will be
unset.

Note that there could be unintended side-effects if multiple pods using
the volume claim are scheduled on different nodes, depending on which
pods gets scheduled first. If the first pod gets scheduled on the ZFS
host, it will automatically use a hostpath volume and node affinity will
be set so that other pods will be prevented from running on other nodes.
On the other hand, if the second pod gets scheduled on a node which is
not the ZFS host, it will use a NFS volume, and subsequent pods will
also use NFS, even if scheduled to run on the ZFS host.
jp39 pushed a commit to jp39/zfs-provisioner that referenced this issue Aug 9, 2024
Automatically use hostpath if pod is on same host as zfs pool.

This addresses ccremer#85. When storage class type is set to auto,
automatically create a hostpath volume when the scheduler selects the
specified node to run the pod, otherwise fallback to using NFS.

Note this only works when volumeBindingMode is set to
WaitForFirstConsumer in the storage class. Otherwise, when set to
Immediate, volumes will be pre-provisioned before the scheduler selects a
node for the pod consuming the volume, and options.SelectedNode will be
unset.

Note that there could be unintended side-effects if multiple pods using
the volume claim are scheduled on different nodes, depending on which
pods gets scheduled first. If the first pod gets scheduled on the ZFS
host, it will automatically use a hostpath volume and node affinity will
be set so that other pods will be prevented from running on other nodes.
On the other hand, if the second pod gets scheduled on a node which is
not the ZFS host, it will use a NFS volume, and subsequent pods will
also use NFS, even if scheduled to run on the ZFS host.
jp39 pushed a commit to jp39/zfs-provisioner that referenced this issue Aug 12, 2024
Automatically use hostpath if pod is on same host as zfs pool.

This addresses ccremer#85. When storage class type is set to auto,
automatically create a hostpath volume when the scheduler selects the
specified node to run the pod, otherwise fallback to using NFS.

Note this only works when volumeBindingMode is set to
WaitForFirstConsumer in the storage class. Otherwise, when set to
Immediate, volumes will be pre-provisioned before the scheduler selects a
node for the pod consuming the volume, and options.SelectedNode will be
unset.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants