Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: topolvm CSI driver (supporting resize, limits) as replacement of kubevirt-hostpath-provisioner #854

Closed
anjannath opened this issue Feb 21, 2024 · 17 comments

Comments

@anjannath
Copy link
Member Author

Was able to deploy the lvms operator following the instructions from https://docs.openshift.com/container-platform/4.15/storage/persistent_storage/persistent_storage_local/persistent-storage-using-lvms.html

for testing this:

  • first need to create a separate partition to be used by the lvms operator
  • and partition should not be used to create physical volume, lvms operator will manage it

currently in the OCP preset, the root partition has 12G of free space:

/dev/vda4        31G   20G   12G  64% /sysroot

we can shrink the root partition by few gbs and create another partition out of the remaining free space, suppose /dev/vda4 of 26G and /dev/vda5 of 6G which we can set as the device/partition to be used by the lvms operator

with the partition created, we can apply the following manifests to deploy the lvms operator:

apiVersion: lvm.topolvm.io/v1alpha1
kind: LVMCluster
metadata:
  name: my-lvmcluster
  namespace: openshift-storage
spec:
  storage:
    deviceClasses:
    - name: vg1
      fstype: xfs
      default: true
      deviceSelector:
        paths:
        - /dev/vda5
        forceWipeDevicesAndDestroyAllData: true
      thinPoolConfig:
        name: thin-pool-1
        sizePercent: 90
        overprovisionRatio: 10
---
apiVersion: v1
kind: Namespace
metadata:
  labels:
    openshift.io/cluster-monitoring: "true"
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/warn: privileged
  name: openshift-storage
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-storage-operatorgroup
  namespace: openshift-storage
spec:
  targetNamespaces:
  - openshift-storage
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: lvms
  namespace: openshift-storage
spec:
  installPlanApproval: Automatic
  name: lvms-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace

and to test it's working can test by applying:

apiVersion: v1
kind: Pod
metadata:
  name: testpod
spec:
  containers:
  - image: httpd
    name: testpod
    securityContext:
      capabilities:
        drop:
        - ALL
      runAsUser: 1001
      allowPrivilegeEscalation: false
    volumeMounts:
    - name: testtopo
      mountPath: /data
  volumes:
  - name: testtopo
    persistentVolumeClaim:
      claimName: lvm-file-1
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: "RuntimeDefault"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: lvm-file-1
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 1Gi
  storageClassName: lvms-vg1

resource consumption wise, deploying the lvms operator takes ~450mb of more RAM (some of it will be recovered after removing the hostpath-provisioner)

without lvms operator

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                2351m (40%)   0 (0%)
  memory             7857Mi (51%)  0 (0%)

with lvms operator deployed

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                2390m (41%)   0 (0%)
  memory             8337Mi (55%)  0 (0%)
  ephemeral-storage  0 (0%)        0 (0%)

@praveenkumar
Copy link
Member

Other thing we need to keep in mind as of now everything (images/PVs ..etc) part of root partition for OCP bundles so when user use bigger disk size they don't care about if that extra size is used for images or for PV. But this is not the case with microshift bundle where we are using the topolvm and user have to think before expanding the disk if that extra space is going to used by PVs or images. Current hostpath-provisioner is not going to deprecated since it is extensively used in kubevirt side. What we want to achieve by changing it?

@anjannath
Copy link
Member Author

Other thing we need to keep in mind as of now everything (images/PVs ..etc) part of root partition for OCP bundles so when user use bigger disk size they don't care about if that extra size is used for images or for PV. But this is not the case with microshift bundle where we are using the topolvm and user have to think before expanding the disk if that extra space is going to used by PVs or images

if we add topolvm for openshift also, it'll be same for both the preset, which should be good i think

Current hostpath-provisioner is not going to deprecated since it is extensively used in kubevirt side. What we want to achieve by changing it?

the hostpath-provisioner doesn't support resize and limits, this is the main reason for the switch i think it'd be good to give users the ability to experiment with these features.

@praveenkumar
Copy link
Member

the hostpath-provisioner doesn't support resize and limits, this is the main reason for the switch i think it'd be good to give users the ability to experiment with these features.

@anjannath do we have some issue where user asked for those features or we think they will ask in future?

@anjannath
Copy link
Member Author

the hostpath-provisioner doesn't support resize and limits, this is the main reason for the switch i think it'd be good to give users the ability to experiment with these features.

@anjannath do we have some issue where user asked for those features or we think they will ask in future?

there were questions about the size of the PV and how to have smaller PVs on our slack channel, but i haven't seen github issues for it no

@praveenkumar
Copy link
Member

there were questions about the size of the PV and how to have smaller PVs

With hostpath-provisioner do we fix the size of the PV's? I thought user can define the required size and it would be created automatic?

@anjannath
Copy link
Member Author

With hostpath-provisioner do we fix the size of the PV's? I thought user can define the required size and it would be created automatic?

no its a limitation of the hostpath-provisioner since its just creating directories in the host it has no mechanism to ensure the size, it just takes as much free space is available. see: kubevirt/hostpath-provisioner#164 (comment)

@praveenkumar
Copy link
Member

With hostpath-provisioner do we fix the size of the PV's? I thought user can define the required size and it would be created automatic?

no its a limitation of the hostpath-provisioner since its just creating directories in the host it has no mechanism to ensure the size, it just takes as much free space is available. see: kubevirt/hostpath-provisioner#164 (comment)

Thanks for sharing, so now we have to make decision around resource limit side since you mentioned it takes around ~450mb . As part of hostpath-provisioner we don't put a Mem/CPU request so even we remove it it will not minimize the resource limitation. Since for 4.15 we are already increasing around 1.5G resource I am not sure should we increase ~.5G more?

  Namespace                                         Name                                                       CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                                         ----                                                       ------------  ----------  ---------------  -------------  ---
  hostpath-provisioner                              csi-hostpathplugin-fs4s6                                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         3d3h

@anjannath
Copy link
Member Author

I've been trying to modify the partition table on the ocp bundle through ignition to create a separate partition for use by topolvm, but it seems the disk gets re-partitioned in the second boot.. other idea that came up while talking to @gbraad was to use a second disk and no change the partition on the existing disk

using the following butane config the disks gets partitioned during first boot (during install) but after reboot it gets overwritten:

disks:
    - device: /dev/vda
      wipe_table: true
      partitions:
      - number: 1
        label: BIOS-BOOT
        size_mib: 1
        start_mib: 0
        type_guid: 21686148-6449-6E6F-744E-656564454649
      - number: 2
        size_mib: 127
        start_mib: 0
        label: EFI-SYSTEM
        type_guid: C12A7328-F81F-11D2-BA4B-00A0C93EC93B
      - number: 3
        label: boot
        size_mib: 384
        start_mib: 0
      - number: 4
        label: root
        size_mib: 24000
        start_mib: 0
      - number: 5
        label: pv-storage
        start_mib: 0

also tried to add a filesystems block and then the VM doesn't even boot:

filesystems:
    - device: /dev/disk/by-partlabel/BIOS-BOOT
      wipe_filesystem: true
      format: none
    - device: /dev/disk/by-partlabel/EFI-SYSTEM
      wipe_filesystem: true
      format: vfat
    - path: /boot
      device: /dev/disk/by-partlabel/boot
      format: ext4
      wipe_filesystem: true
      with_mount_unit: true
    - path: /root
      device: /dev/disk/by-partlabel/root
      format: xfs
      wipe_filesystem: true
      with_mount_unit: true
    - device: /dev/disk/by-partlabel/pv-storage
      format: ext4
      wipe_filesystem: true

@anjannath
Copy link
Member Author

working on doing this in crc itself, made some progress that can be tested from this branch: https://github.com/anjannath/crc/tree/extradisk (only for macOS currently)

it is currently doing the following things:

  1. during vm creation (crc start) create a second disk image in the machine instance dir (named: crc-second-disk.img)
  2. during crc start once kube-apiserver is up, the topolvm operator group, subscription and the openshift-storage namespace is created
  3. once the installation of the operator succeeds, the LVMCluster resource is created which creates the lvm based storage class that can be used in the pvc definitions

after step 2 we have to wait ~2mins for the installation of the operator to succeed and only after that the LVMCluster custom resource becomes available for use. therefore i think if we install the operator during the snc phase and only create the LVMCluster resource during crc start that'd not increase the start time

@cfergeau
Copy link
Contributor

cfergeau commented Mar 20, 2024

2 minutes is very long, I agree that moving this logic to snc should help. Not clear why all of it can't be done in snc? If it doesn't work from ignition, this could still be done after the install is done as part of all the tweaks we are doing to the cluster?

@anjannath
Copy link
Member Author

yes, we could do all of it in snc, what we need is on the disk we need a separate partition to be used by the lvms/topolvm operator and when the re-partitioning attempt with ignition failed and the idea of using a second disk came up i focused on doing it in crc as to use a second disk we would need changes to the libmachine drivers code.

but since now you mention it, if we don't use a second disk, and since we can modify the one disk image using guestfs tools we can do this entirely in snc..

  1. extend the disk image by some amount (5GB)
  2. shrink the root partition by some amount (from 31 GB to 26 GB so decrease by 5GB)
  3. create a new partition after the root partition which will be ~10GB
  4. apply all topolvm related manifests

@cfergeau
Copy link
Contributor

cfergeau commented Mar 25, 2024

For what it's worth, there is this enhancement open against microshift: openshift/enhancements#1601 « MicroShift: Replacing upstream TopoLVM with a minified version of LVMS »

@anjannath
Copy link
Member Author

From what i understand going through that enhancement doc is microshift is moving to use the the LVMS operator instead of the modified topolvm deployment that is there now

but what is not clear to me is that, the minified version of LVMS (called microLVMS in the doc) is going to be a separate thing or its a new feature in the LVMS operator itself and then microLVMS is going to be used in both openshift as well as microshift

@anjannath
Copy link
Member Author

shrink the root partition by some amount (from 31 GB to 26 GB so decrease by 5GB)

the root partition filesystem is xfs and guestfish needs the filesystem to be resized before the partition can be resized, i think doing everything on snc will not be possible.. as i couldn't find something equivalent to resize2fs for xfs filesystem

@cfergeau
Copy link
Contributor

shrink the root partition by some amount (from 31 GB to 26 GB so decrease by 5GB)

the root partition filesystem is xfs and guestfish needs the filesystem to be resized before the partition can be resized, i think doing everything on snc will not be possible.. as i couldn't find something equivalent to resize2fs for xfs filesystem

You can grow xfs filesystems, but you cannot shrink them.

@anjannath
Copy link
Member Author

anjannath commented Apr 2, 2024

to summaries we are going to use LVMS operator for the dynanic PV provisioning in CRC, for this we need to:

  1. enhance the drivers code to create a second disk during vm creation (create LVMCluster resource on crc start crc#4097)
  2. create the operatorgroup and subscription resources in the cluster during snc (install LVMS operator for openshift preset #867)
  3. create the LVMCluster resource during crc start (create LVMCluster resource on crc start crc#4097)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants